keyspace server properly recovers after first "real world" server crash
Friday, January 20, 2012 at 11:27AM |
Jeff Lunt From the beginning, keyspace was designed to automatically recover from server and application crashes, hangs, and other events that are often outside your control, and certainly outsite the scope of the keyspace service itself. During development I tested a number of scenarios that covered events like forced application kills, to server/system crashes, and other events that would cause the service to stop due to an unknown outside event. As a result of these tests, I was fairly confident that it would survive in a real world crash, but you never really know until it actually happens.
Enter an email I received this week from Amazon's AWS service, stating that the EC2 instance upon which the public keyspace server runs had experienced hardware failure, and the need to migrate my instance to another physical host. Unfortunately, I missed this email until after the physical hardware was already down, and the service stopped working.
The instructions from AWS were to stop, then start the EC2 instance. When I entered the AWS console, I saw that my instance appeared to still be running, but I could tell, from the fact that I could neither successfully retrieve key sets nor ssh into the box that things weren't normal.
So, I did as AWS instructed, stopped, then started the EC2 instance. In just a couple minutes the keyspace service had come back to life. Not only that, I discovered that the recovery feature (which automatically skips a pre-defined number of keysets when a crash is detected) worked flawlessly. Then next key set I got from keyspace.karmanebula.com was several thousand key sets later than the last one issued before the crash, which indicated that the server had skipped key sets that it couldn't know whether or not they had been issued, and simply resumed functioning at a safe point in the key set generator's life. This automatic recovery is handled in the loadOrResetKeyGeneratorState() method.
Nothing makes you sleep better at night than knowing that automated recovery systems, when put to the test, actually work!
crash recovery,
keyspace,
put to the test in
keyspace 







