You are here
Amazon EC2 metadata - Python library and CLI
Each Amazon EC2 instance has associated metadata, as well as user data supplied when launching the instance. The meta and user data is instance-specific, and therefore only accessible to the instance.
The data is useful on several levels, such as configuring SSH public keys, programmatically configuring the instance according to certain criteria, or even executing user supplied initialization scripts.
Retrieving the data
Retrieving the data is done by querying an Amazon web server with the base URI of http://169.254.169.254/API-VERSION. The available API versions can be queried by performing a GET request on http://169.254.169.254/. The latest version of the API is always available using the URI http://169.254.169.254/latest.
There is quite a lot of information available through the API, some more useful than others. For example, ami-id, ami-launch-index, availability-zone, instance-id, public-ipv4, user-data, ... (see below for the full list).
Some notes on user data
One of the most useful pieces of data is user-data, which can be used to pass configuration information or even initialization scripts to the instance upon launch.
User data must be base64 encoded, and is limited to 16k (pre-encoding). The popular API tools usually handle the encoding transparently, so you shouldn't have to worry about it. The data is also decoded before presented to the instance, so again, you shouldn't need to worry.
What you do need to worry about though, or at least be aware of, is security.
The user-data (and all metadata for that matter) can be accessed by any user or process on the instance. So please, please, do not specify any secret information in user-data unless you are absolutely sure what you are doing. Even then, I'd think twice.
But, you say, I trust all my users and processes. OK, how about this (thanks go to Eric Hammond for this example). You run a website that allows users to upload files by specifying a URL. The user specifies http://169.254.169.254/latest/user-data, and lo-and-behold, your user-data and any secrets included have been divulged.
Do you still want to include secrets in user-data?
The simple way
The simplest way of retrieving metadata is by use of a command line network tool, such as curl, for example:
curl http://169.254.169.254/latest/meta-data/public-ipv4
The more programmatic way
Usually you need a more programmatic type interface, and there are a couple of libraries for different languages available. I didn't find one that met my needs, so I wrote one in Python called ec2metadata.py.
I licensed the copyright over to Canonical so it could be included in Ubuntu's ec2-init package.
ec2metadata.py has a CLI interface, as well as a Pythonic interface:
$ ec2metadata.py # all options will be displayed $ ec2metadata.py --instance-id # displays the instance id
import ec2metadata
instanceid = ec2metadata.get('instance-id')
print instanceid
It can be very useful when coupled with inithooks, for example, setting of the SSH public keys on first boot.
#!/usr/bin/python
#
# Query and display EC2 metadata related to the AMI instance
# Copyright (c) 2009 Canonical Ltd. (Canonical Contributor Agreement 2.5)
#
# Author: Alon Swartz <alon@turnkeylinux.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
Query and display EC2 metadata
If no options are provided, all options will be displayed
Options:
-h --help show this help
--kernel-id display the kernel id
--ramdisk-id display the ramdisk id
--reservation-id display the reservation id
--ami-id display the ami id
--ami-launch-index display the ami launch index
--ami-manifest-path display the ami manifest path
--ancestor-ami-id display the ami ancestor id
--product-codes display the ami associated product codes
--availability-zone display the ami placement zone
--instance-id display the instance id
--instance-type display the instance type
--local-hostname display the local hostname
--public-hostname display the public hostname
--local-ipv4 display the local ipv4 ip address
--public-ipv4 display the public ipv4 ip address
--block-device-mapping display the block device id
--security-groups display the security groups
--public-keys display the openssh public keys
--user-data display the user data (not actually metadata)
"""
import sys
import time
import getopt
import urllib
import socket
METAOPTS = ['ami-id', 'ami-launch-index', 'ami-manifest-path',
'ancestor-ami-id', 'availability-zone', 'block-device-mapping',
'instance-id', 'instance-type', 'local-hostname', 'local-ipv4',
'kernel-id', 'product-codes', 'public-hostname', 'public-ipv4',
'public-keys', 'ramdisk-id', 'reserveration-id', 'security-groups',
'user-data']
class Error(Exception):
pass
class EC2Metadata:
"""Class for querying metadata from EC2"""
def __init__(self, addr='169.254.169.254', api='2008-02-01'):
self.addr = addr
self.api = api
if not self._test_connectivity(self.addr, 80):
raise Error("could not establish connection to: %s" % self.addr)
@staticmethod
def _test_connectivity(addr, port):
for i in range(6):
s = socket.socket()
try:
s.connect((addr, port))
s.close()
return True
except socket.error, e:
time.sleep(1)
return False
def _get(self, uri):
url = 'http://%s/%s/%s/' % (self.addr, self.api, uri)
value = urllib.urlopen(url).read()
if "404 - Not Found" in value:
return None
return value
def get(self, metaopt):
"""return value of metaopt"""
if metaopt not in METAOPTS:
raise Error('unknown metaopt', metaopt, METAOPTS)
if metaopt == 'availability-zone':
return self._get('meta-data/placement/availability-zone')
if metaopt == 'public-keys':
data = self._get('meta-data/public-keys')
keyids = [ line.split('=')[0] for line in data.splitlines() ]
public_keys = []
for keyid in keyids:
uri = 'meta-data/public-keys/%d/openssh-key' % int(keyid)
public_keys.append(self._get(uri).rstrip())
return public_keys
if metaopt == 'user-data':
return self._get('user-data')
return self._get('meta-data/' + metaopt)
def get(metaopt):
"""primitive: return value of metaopt"""
m = EC2Metadata()
return m.get(metaopt)
def display(metaopts, prefix=False):
"""primitive: display metaopts (list) values with optional prefix"""
m = EC2Metadata()
for metaopt in metaopts:
value = m.get(metaopt)
if not value:
value = "unavailable"
if prefix:
print "%s: %s" % (metaopt, value)
else:
print value
def usage(s=None):
"""display usage and exit"""
if s:
print >> sys.stderr, "Error:", s
print >> sys.stderr, "Syntax: %s [options]" % sys.argv[0]
print >> sys.stderr, __doc__
sys.exit(1)
def main():
"""handle cli options"""
try:
getopt_metaopts = METAOPTS[:]
getopt_metaopts.append('help')
opts, args = getopt.gnu_getopt(sys.argv[1:], "h", getopt_metaopts)
except getopt.GetoptError, e:
usage(e)
if len(opts) == 0:
display(METAOPTS, prefix=True)
return
metaopts = []
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
metaopts.append(opt.replace('--', ''))
display(metaopts)
if __name__ == "__main__":
main()
Comments
Boto is excellent
With regards to the above code, we needed a simple way for instances to access metadata, both from the CLI (mostly for testing and debugging), and from other Python scripts. I could have hooked into boto.utils, but I wanted to keep it simple, with a clear cut interface for other projects (e.g., ec2-init).
Regarding retries, that's the point of the _test_connectivity function.
Seems like a good idea
I thought about blocking the metadata IP, it seems like a good idea, but I'm just not sure how Amazon have setup its security (is it an actual machine? running on the host?). I wonder how easy it would be to bypass using some sort of mitm and IP spoofing.
Chances are, that blocking the 169.254 IP for incoming and outgoing would be sufficient, but, as you mentioned, user-data is not the ideal place to store secret information.
Finally on github and in the package archive
I finally got around to creating an ec2metadata package (uploaded to turnkey archive) and uploading it to github (only took me 3 years). While I was at it I did some refactoring and split the cli and lib.
Pages
Add new comment