Many of you noticed that something bad happened two days ago. Some of you may even heard screams of ultimate suffering as those responsible were beat hardily with wet noodles.
Despite my name, I'm all for full disclosure when there's no legal red tape preventing me from doing so: we had a system failure. The restoration from the system failure resulted in data loss on the live forums.
The tech stuff:
We run our servers on virtual machines. This gives us two major advantages - first, should one system become a bottleneck, we seamlessly give it more power or move it to a faster box. Second, should something go hideously wrong, such as a meteorite hit our servers, we can restore entire system images from backup in about an hour.
In fact, in two weeks we are planned to simulate critical failure on various machines of our test network to see how our tolerance works in reality. Unfortunately, murphy's law kicked in and decided to break one of our servers before our tolerance tests were completed. Losing posts on the forums was the result
Going forward:
In regards to the servers, we are getting things ready for public testing, hardening the environment and doing internal smoke testing. The SQL farm is being upgraded as we speak. We are pushing additional fail-over servers online on Monday.
The real improvement will come next week when we will try our darnedest to destroy our test network then see what flaws still exist that we can improve upon.
Summary
We lost a few days of data on the forums due to a server failure. The chance of this happening again is possible but unlikely. In two weeks we are undergoing a series of tests and simulations so we can change "unlikely" to "neigh impossible". For now your posts should be safe.



