This is something I didn't expect. One of the greatest things about Amazon is that you can start and stop instances at will. That's great, but what happens when you can't stop or kill a server. Can't connect, can't reboot, yet it is still in the background running and doing stuff. I can't unplug it, I can't press off. I can't smash it with a sledgehammer.
This happened to me today. I thought TKL had ultimate low level control over server instances, and if push came to shove, we can pull the plug. And with a script running on that server that is designed to perform real life connection tasks to services controlling phones, SMS, Emails, etc., it was truly like the computer Jushua in the 1983 movie, WAR GAMES, the computer that was asking Matthew Broderick, "DO YOU WANT TO PLAY A GAME?"
In the movie, Matthew Broderick got the idea to get Joshua to play chess with himself until it slowed the computer down enough that it stopped playing the GLOBAL THERMONUCLEAR WAR game (That was interfaced into the US Military nuclear arsenal.) But I couldn't even do that, because I couldn't get a Shell prompt, no webmin, only a runaway server responding to ping only, but still interfacing with the world through automation.
It was a harrowing experience and it finally did shut down, but it took about 30 minutes. The 5 reboots I tried in the hub interface (before I sent the STOP command) told me "the server is rebooting", but a continuous ping was showing that was not actually happening. While it was "stopping", it continued to respond to pings and make phone calls and send sms messages while I had time to build another server from an image, change the DNS entries, get the new server up and running.
I'm glad this happened, because with a server running a script that is connecting with real world devices like telephony and SMS, I realize I need to add another layer of control, maybe a relay on another server hosted by another company? so I have a way of stopping its activity if TKL's interface fails again.
Has anyone had a similar experience? Does anyone know what could possibly have happened to make a instance freeze like that to be unresponsive on the lowest level available to my control?
P.S. UPDATE: It's about 90 minutes after my first attempt at reboot and about an hour after I initiated the STOP. It is still 'stopping' in the TKL hub. I have a good level of confidence that it may be stopped because I'm not getting direct ping responses from the IP, but I have no way to verify for sure that it is actually stopped.