Secure, flexible and scalable Amazon EC2 instance preseeding

I'd like to introduce Joe. He is a good looking, experienced sys-admin and like all good sysadmins, he has more stuff to do than time to do it.

Joe wants to get up and running on Amazon EC2 with a Wordpress installation, and chooses to do so with a pre-configured appliance. These are the steps Joe performs:

  • Joe logs into his favourite Amazon EC2 console, specifies a Wordpress appliance, and other configurations.
  • Clicks launch.
  • Once the instance is running, he logs in using his SSH public key and changes the root password (it was set randomly on firstboot, right?).
  • He then proceeds to change the MySQL root password as well (also set randomly on firstboot, hopefully!). Joe knows how to do this as he's an experienced sys-admin, do you?
  • Finally, Joe logs into Wordpress using the default admin password (he noted the default password in the release notes before launching), resets the password and specifies his own email for the account.

While performing the above, Joe was holding his breath and working as fast as he could because he was previously hit by a botnet looking for random systems using default passwords and was compromised. Luckily this time he came out unscaved.

Does this sound familiar? Well, it should because that's how it's mostly being done.

You might be thinking to yourself "but I used the TurnKey Hub to set the root password for my instances, which also set the database password". True, that has been a feature of the Hub from day one, but with the release of TurnKey 11.0 and the end to default passwords, we've extended the Hub to support preseeding as well.

The idea behind this was not only to make cloud deployments more secure, but to make it much easier. We wanted to simplify the process for Joe from the above to this:

  • Joe logs into the Hub, selects Wordpress and preseeds the configuration.
  • Clicks launch.

The above is not a mock-up of a future implementation, it's live on the Hub.
So how does it work? Read on...

Brainstorming a solution

The problem in preseeding an instance is sending the information (securely) to the instance.
So how do you do it?

Idea #1: pass it through Amazon EC2 user-data?

If you know a little about Amazon EC2 you'll know that when launching an instance you can specify user-data which is accessible from the instance via Amazon's API.
But wait, do you really want to store authentication credentials in user-data? 
You could, but because any process on the instance that can open a network socket can access the user-data as it never expires, you'll probably want to firewall off the Amazon API as soon as it's not required anymore during instance initialization. But maybe the user of the instance needs access to the Amazon API? Crippling the service by design isn't a good solution in my honest opinion.

Idea #2: store it in the Hub's database, and let the server query the API

So, instead of sending authentication credentials via user-data, why not send a unique identifier (e.g., SERVER_ID), so the instance can use the Hub's API to pull the credentials?
Well, you could, but that would mean the Hub service needs to store the instance's configuration, passwords and all, in its database and delete it when it's no longer needed. Storing an item in a database for just one use is inelegant. But it's a natural solution if you only have a database, as I dicussed in a previous blog post, "when all you have is a hammer, everything looks like a nail".
In my opinion, it ultimately comes down to separation of concerns. For this type of pattern, the most natural solution would be some sort of messaging service. The Hub publishes a message to a queue, which the instance consumes.

Idea #3: pass it as messages using the Advanced Message Queuing Protocol (AMQP)

So whats wrong with messaging? Nothing really, so long as you take care when designing the system for confidentiality and integrity - we don't want others eavesdropping on messages, or sending spoofed messages.
Messages that fail a CRC or cannot be decrypted successfully should be discarded, and removed from the queue so not to block it.

Designing infrastructure that is secure, scalable and extendible

The solution we came up with is designed to be secure, scalable and extendible. Eventually it will support other cloud hosting providers, as well as provide bi-directional secure communication for future Hub-based services still under development.
The solution uses each of the brainstormed solutions above for what they were designed for, and no more.
Data Flow Diagram (DFD) explained:
  1. The user specifies preseeding data.
  2. The Hub tells Amazon EC2 to launch the instance with user-data which includes the SERVER_ID.
  3. The Hub creates a direct message exchange and queue for the server, which is configured to only receive messages sent from the Hub.
  4. The Hub publishes symmetrically encrypted messages (incl. a CRC) to the server queue with preseeding data that only the server can decrypt.
  5. The instance pulls user-data from the Amazon EC2 API (SERVER_ID).
  6. The Instance registers itself with the Hub via an SSL secured API using the SERVER_ID, which responds back with the server subkey and messaging secret. Note that this can only be done once for security.
    • subkey: A one way hash generated from the user's APIKEY. It is unique for each server registered in the Hub, and is used as part of the exchange and queue naming structure.
    • secret: A secure unique hash used for message encryption and decryption.
  7. The instance consumes messages from the queue. Messages are decrypted and passed to the callback for processing (preseeding messages appends the arg=value to inithooks.conf).
  8. During inithooks execution, inithooks.conf is sourced so preseeding will happen. Once inithooks.conf is no longer needed, it is deleted.

In addition to authentication related preseeding, TKLBAM is also preseeded with the HUB_APIKEY and is initialized, so performing the initial backup is as easy as executing tklbam-backup or using the TKLBAM webmin module.

As always, the client side code that implements the above is open source, and can be found in the following projects: hubclient, tklamq, inithooks, as well as the above mentioned blog posts.

Take the Hub for a spin and let us know what you think.

You can get future posts delivered by email or good old-fashioned RSS.
TurnKey also has a presence on Google+, Twitter and Facebook.

Post new comment