You are here
Amazon EC2 metadata - Python library and CLI
Each Amazon EC2 instance has associated metadata, as well as user data supplied when launching the instance. The meta and user data is instance-specific, and therefore only accessible to the instance.
The data is useful on several levels, such as configuring SSH public keys, programmatically configuring the instance according to certain criteria, or even executing user supplied initialization scripts.
Retrieving the data
Retrieving the data is done by querying an Amazon web server with the base URI of http://169.254.169.254/API-VERSION. The available API versions can be queried by performing a GET request on http://169.254.169.254/. The latest version of the API is always available using the URI http://169.254.169.254/latest.
There is quite a lot of information available through the API, some more useful than others. For example, ami-id, ami-launch-index, availability-zone, instance-id, public-ipv4, user-data, ... (see below for the full list).
Some notes on user data
One of the most useful pieces of data is user-data, which can be used to pass configuration information or even initialization scripts to the instance upon launch.
User data must be base64 encoded, and is limited to 16k (pre-encoding). The popular API tools usually handle the encoding transparently, so you shouldn't have to worry about it. The data is also decoded before presented to the instance, so again, you shouldn't need to worry.
What you do need to worry about though, or at least be aware of, is security.
The user-data (and all metadata for that matter) can be accessed by any user or process on the instance. So please, please, do not specify any secret information in user-data unless you are absolutely sure what you are doing. Even then, I'd think twice.
But, you say, I trust all my users and processes. OK, how about this (thanks go to Eric Hammond for this example). You run a website that allows users to upload files by specifying a URL. The user specifies http://169.254.169.254/latest/user-data, and lo-and-behold, your user-data and any secrets included have been divulged.
Do you still want to include secrets in user-data?
The simple way
The simplest way of retrieving metadata is by use of a command line network tool, such as curl, for example:
curl http://169.254.169.254/latest/meta-data/public-ipv4
The more programmatic way
Usually you need a more programmatic type interface, and there are a couple of libraries for different languages available. I didn't find one that met my needs, so I wrote one in Python called ec2metadata.py.
I licensed the copyright over to Canonical so it could be included in Ubuntu's ec2-init package.
ec2metadata.py has a CLI interface, as well as a Pythonic interface:
$ ec2metadata.py # all options will be displayed $ ec2metadata.py --instance-id # displays the instance id
import ec2metadata
instanceid = ec2metadata.get('instance-id')
print instanceid
It can be very useful when coupled with inithooks, for example, setting of the SSH public keys on first boot.
#!/usr/bin/python
#
# Query and display EC2 metadata related to the AMI instance
# Copyright (c) 2009 Canonical Ltd. (Canonical Contributor Agreement 2.5)
#
# Author: Alon Swartz <alon@turnkeylinux.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
Query and display EC2 metadata
If no options are provided, all options will be displayed
Options:
-h --help show this help
--kernel-id display the kernel id
--ramdisk-id display the ramdisk id
--reservation-id display the reservation id
--ami-id display the ami id
--ami-launch-index display the ami launch index
--ami-manifest-path display the ami manifest path
--ancestor-ami-id display the ami ancestor id
--product-codes display the ami associated product codes
--availability-zone display the ami placement zone
--instance-id display the instance id
--instance-type display the instance type
--local-hostname display the local hostname
--public-hostname display the public hostname
--local-ipv4 display the local ipv4 ip address
--public-ipv4 display the public ipv4 ip address
--block-device-mapping display the block device id
--security-groups display the security groups
--public-keys display the openssh public keys
--user-data display the user data (not actually metadata)
"""
import sys
import time
import getopt
import urllib
import socket
METAOPTS = ['ami-id', 'ami-launch-index', 'ami-manifest-path',
'ancestor-ami-id', 'availability-zone', 'block-device-mapping',
'instance-id', 'instance-type', 'local-hostname', 'local-ipv4',
'kernel-id', 'product-codes', 'public-hostname', 'public-ipv4',
'public-keys', 'ramdisk-id', 'reserveration-id', 'security-groups',
'user-data']
class Error(Exception):
pass
class EC2Metadata:
"""Class for querying metadata from EC2"""
def __init__(self, addr='169.254.169.254', api='2008-02-01'):
self.addr = addr
self.api = api
if not self._test_connectivity(self.addr, 80):
raise Error("could not establish connection to: %s" % self.addr)
@staticmethod
def _test_connectivity(addr, port):
for i in range(6):
s = socket.socket()
try:
s.connect((addr, port))
s.close()
return True
except socket.error, e:
time.sleep(1)
return False
def _get(self, uri):
url = 'http://%s/%s/%s/' % (self.addr, self.api, uri)
value = urllib.urlopen(url).read()
if "404 - Not Found" in value:
return None
return value
def get(self, metaopt):
"""return value of metaopt"""
if metaopt not in METAOPTS:
raise Error('unknown metaopt', metaopt, METAOPTS)
if metaopt == 'availability-zone':
return self._get('meta-data/placement/availability-zone')
if metaopt == 'public-keys':
data = self._get('meta-data/public-keys')
keyids = [ line.split('=')[0] for line in data.splitlines() ]
public_keys = []
for keyid in keyids:
uri = 'meta-data/public-keys/%d/openssh-key' % int(keyid)
public_keys.append(self._get(uri).rstrip())
return public_keys
if metaopt == 'user-data':
return self._get('user-data')
return self._get('meta-data/' + metaopt)
def get(metaopt):
"""primitive: return value of metaopt"""
m = EC2Metadata()
return m.get(metaopt)
def display(metaopts, prefix=False):
"""primitive: display metaopts (list) values with optional prefix"""
m = EC2Metadata()
for metaopt in metaopts:
value = m.get(metaopt)
if not value:
value = "unavailable"
if prefix:
print "%s: %s" % (metaopt, value)
else:
print value
def usage(s=None):
"""display usage and exit"""
if s:
print >> sys.stderr, "Error:", s
print >> sys.stderr, "Syntax: %s [options]" % sys.argv[0]
print >> sys.stderr, __doc__
sys.exit(1)
def main():
"""handle cli options"""
try:
getopt_metaopts = METAOPTS[:]
getopt_metaopts.append('help')
opts, args = getopt.gnu_getopt(sys.argv[1:], "h", getopt_metaopts)
except getopt.GetoptError, e:
usage(e)
if len(opts) == 0:
display(METAOPTS, prefix=True)
return
metaopts = []
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
metaopts.append(opt.replace('--', ''))
display(metaopts)
if __name__ == "__main__":
main()
Comments
Hi - The boto library
Hi -
The boto library (http://boto.googlecode.com/) also provides a couple of methods to access instance metadata and userdata. They are boto.utils.get_instance_metadata and boto.utils.get_instance_userdata. I implement a retry mechanism because, as you note, sometimes the interface is available yet if you try to run this on startup.
Mitch
Boto is excellent
With regards to the above code, we needed a simple way for instances to access metadata, both from the CLI (mostly for testing and debugging), and from other Python scripts. I could have hooked into boto.utils, but I wanted to keep it simple, with a clear cut interface for other projects (e.g., ec2-init).
Regarding retries, that's the point of the _test_connectivity function.
user data security is not as bad as stated
User data security is not as bad as stated above. It can be made to be as secure as a root owned file with 400 permissions on it by routing the data service off once you've collected the information you need. This can be done with:
Once that is done, no user space process can get at the service. In order to do so, it would have to have compromised root and run:
The Ubuntu lucid images can do this for you if you use 'cloud-config' syntax. A user-data with the following will have cloud-init route the service off for you early in boot.
For more information, see [1], or Try a lucid image [2]
Doing this, obviously breaks things that depend on the instance data being there, but that can be overcome by caching the data to root-owned files. I'm not saying that user-data is excellent way to store important credentials, but it isn't as bad as it is often made out to be.
--
[1] http://bazaar.launchpad.net/%7Ecloud-init-dev/cloud-init/trunk/files/hea...
[2] http://uec-images.ubuntu.com/releases/lucid/beta-1/
Seems like a good idea
Thanks for the link to cloud-init (I see ec2-init has changed its name).
I thought about blocking the metadata IP, it seems like a good idea, but I'm just not sure how Amazon have setup its security (is it an actual machine? running on the host?). I wonder how easy it would be to bypass using some sort of mitm and IP spoofing.
Chances are, that blocking the 169.254 IP for incoming and outgoing would be sufficient, but, as you mentioned, user-data is not the ideal place to store secret information.
IAM Roles for EC2 Instances
I had been using this suggestion of blocking meta data as part of my user-data script, but it seems like you have to expose the meta data now if you want to use IAM Roles for EC2 Instances, which is now highly recommended by AWS (instead of embedding long-term persisted credentials on the instance).
One of the security recommendations AWS proposed during the re:Invent conference was to use a bastion host for all your EC2 instances, and log all activity.
Finally on github and in the package archive
I finally got around to creating an ec2metadata package (uploaded to turnkey archive) and uploading it to github (only took me 3 years). While I was at it I did some refactoring and split the cli and lib.
Pages
Add new comment