Blog Tags: 

Amazon EC2 metadata - Python library and CLI

Each Amazon EC2 instance has associated metadata, as well as user data supplied when launching the instance. The meta and user data is instance-specific, and therefore only accessible to the instance.

The data is useful on several levels, such as configuring SSH public keys, programmatically configuring the instance according to certain criteria, or even executing user supplied initialization scripts.
 

Retrieving the data

Retrieving the data is done by querying an Amazon web server with the base URI of http://169.254.169.254/API-VERSION. The available API versions can be queried by performing a GET request on http://169.254.169.254/. The latest version of the API is always available using the URI http://169.254.169.254/latest.

There is quite a lot of information available through the API, some more useful than others. For example, ami-id, ami-launch-index, availability-zone, instance-id, public-ipv4, user-data, ... (see below for the full list).

Some notes on user data

One of the most useful pieces of data is user-data, which can be used to pass configuration information or even initialization scripts to the instance upon launch.

User data must be base64 encoded, and is limited to 16k (pre-encoding).  The popular API tools usually handle the encoding transparently, so you shouldn't have to worry about it. The data is also decoded before presented to the instance, so again, you shouldn't need to worry.

What you do need to worry about though, or at least be aware of, is security.

The user-data (and all metadata for that matter) can be accessed by any user or process on the instance. So please, please, do not specify any secret information in user-data unless you are absolutely sure what you are doing. Even then, I'd think twice.

But, you say, I trust all my users and processes. OK, how about this (thanks go to Eric Hammond for this example). You run a website that allows users to upload files by specifying a URL. The user specifies http://169.254.169.254/latest/user-data, and lo-and-behold, your user-data and any secrets included have been divulged.

Do you still want to include secrets in user-data?

The simple way

The simplest way of retrieving metadata is by use of a command line network tool, such as curl, for example:

curl http://169.254.169.254/latest/meta-data/public-ipv4

The more programmatic way

Usually you need a more programmatic type interface, and there are a couple of libraries for different languages available. I didn't find one that met my needs, so I wrote one in Python called ec2metadata.py.

I licensed the copyright over to Canonical so it could be included in Ubuntu's ec2-init package.

ec2metadata.py has a CLI interface, as well as a Pythonic interface:

$ ec2metadata.py                # all options will be displayed
$ ec2metadata.py --instance-id  # displays the instance id
import ec2metadata
instanceid = ec2metadata.get('instance-id')
print instanceid


It can be very useful when coupled with inithooks, for example, setting of the SSH public keys on first boot.

#!/usr/bin/python
#
#    Query and display EC2 metadata related to the AMI instance
#    Copyright (c) 2009 Canonical Ltd. (Canonical Contributor Agreement 2.5)
#
#    Author: Alon Swartz <alon@turnkeylinux.org>
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.

"""
Query and display EC2 metadata

If no options are provided, all options will be displayed

Options:
    -h --help               show this help

    --kernel-id             display the kernel id
    --ramdisk-id            display the ramdisk id
    --reservation-id        display the reservation id

    --ami-id                display the ami id
    --ami-launch-index      display the ami launch index
    --ami-manifest-path     display the ami manifest path
    --ancestor-ami-id       display the ami ancestor id
    --product-codes         display the ami associated product codes
    --availability-zone     display the ami placement zone

    --instance-id           display the instance id
    --instance-type         display the instance type

    --local-hostname        display the local hostname
    --public-hostname       display the public hostname

    --local-ipv4            display the local ipv4 ip address
    --public-ipv4           display the public ipv4 ip address

    --block-device-mapping  display the block device id
    --security-groups       display the security groups

    --public-keys           display the openssh public keys
    --user-data             display the user data (not actually metadata)

"""

import sys
import time
import getopt
import urllib
import socket

METAOPTS = ['ami-id', 'ami-launch-index', 'ami-manifest-path',
            'ancestor-ami-id', 'availability-zone', 'block-device-mapping',
            'instance-id', 'instance-type', 'local-hostname', 'local-ipv4',
            'kernel-id', 'product-codes', 'public-hostname', 'public-ipv4',
            'public-keys', 'ramdisk-id', 'reserveration-id', 'security-groups',
            'user-data']

class Error(Exception):
    pass

class EC2Metadata:
    """Class for querying metadata from EC2"""

    def __init__(self, addr='169.254.169.254', api='2008-02-01'):
        self.addr = addr
        self.api = api

        if not self._test_connectivity(self.addr, 80):
            raise Error("could not establish connection to: %s" % self.addr)

    @staticmethod
    def _test_connectivity(addr, port):
        for i in range(6):
            s = socket.socket()
            try:
                s.connect((addr, port))
                s.close()
                return True
            except socket.error, e:
                time.sleep(1)

        return False

    def _get(self, uri):
        url = 'http://%s/%s/%s/' % (self.addr, self.api, uri)
        value = urllib.urlopen(url).read()
        if "404 - Not Found" in value:
            return None

        return value

    def get(self, metaopt):
        """return value of metaopt"""

        if metaopt not in METAOPTS:
            raise Error('unknown metaopt', metaopt, METAOPTS)

        if metaopt == 'availability-zone':
            return self._get('meta-data/placement/availability-zone')

        if metaopt == 'public-keys':
            data = self._get('meta-data/public-keys')
            keyids = [ line.split('=')[0] for line in data.splitlines() ]

            public_keys = []
            for keyid in keyids:
                uri = 'meta-data/public-keys/%d/openssh-key' % int(keyid)
                public_keys.append(self._get(uri).rstrip())

            return public_keys

        if metaopt == 'user-data':
            return self._get('user-data')

        return self._get('meta-data/' + metaopt)

def get(metaopt):
    """primitive: return value of metaopt"""

    m = EC2Metadata()
    return m.get(metaopt)

def display(metaopts, prefix=False):
    """primitive: display metaopts (list) values with optional prefix"""

    m = EC2Metadata()
    for metaopt in metaopts:
        value = m.get(metaopt)
        if not value:
            value = "unavailable"

        if prefix:
            print "%s: %s" % (metaopt, value)
        else:
            print value

def usage(s=None):
    """display usage and exit"""

    if s:
        print >> sys.stderr, "Error:", s
    print >> sys.stderr, "Syntax: %s [options]" % sys.argv[0]
    print >> sys.stderr, __doc__
    sys.exit(1)

def main():
    """handle cli options"""

    try:
        getopt_metaopts = METAOPTS[:]
        getopt_metaopts.append('help')
        opts, args = getopt.gnu_getopt(sys.argv[1:], "h", getopt_metaopts)
    except getopt.GetoptError, e:
        usage(e)

    if len(opts) == 0:
        display(METAOPTS, prefix=True)
        return

    metaopts = []
    for opt, val in opts:
        if opt in ('-h', '--help'):
            usage()

        metaopts.append(opt.replace('--', ''))

    display(metaopts)


if __name__ == "__main__":
   main()

Comments

Mitch Garnaat's picture

Hi -

The boto library (http://boto.googlecode.com/) also provides a couple of methods to access instance metadata and userdata.  They are boto.utils.get_instance_metadata and boto.utils.get_instance_userdata.  I implement a retry mechanism because, as you note, sometimes the interface is available yet if you try to run this on startup.

Mitch 

Alon Swartz's picture

Thanks for link Mitch, I've been using boto since 2007, keep up the great work.

With regards to the above code, we needed a simple way for instances to access metadata, both from the CLI (mostly for testing and debugging), and from other Python scripts. I could have hooked into boto.utils, but I wanted to keep it simple, with a clear cut interface for other projects (e.g., ec2-init).

Regarding retries, that's the point of the _test_connectivity function.
Scott Moser's picture

User data security is not as bad as stated above. It can be made to be as secure as a root owned file with 400 permissions on it by routing the data service off once you've collected the information you need. This can be done with:

route add -host 169.254.169.254 reject 

Once that is done, no user space process can get at the service. In order to do so, it would have to have compromised root and run:

route del -host 169.254.169.254 reject 

The Ubuntu lucid images can do this for you if you use 'cloud-config' syntax. A user-data with the following will have cloud-init route the service off for you early in boot.

#cloud-config

disable_ec2_metadata: true

For more information, see [1], or Try a lucid image [2]

Doing this, obviously breaks things that depend on the instance data being there, but that can be overcome by caching the data to root-owned files. I'm not saying that user-data is excellent way to store important credentials, but it isn't as bad as it is often made out to be.

--

[1] http://bazaar.launchpad.net/%7Ecloud-init-dev/cloud-init/trunk/files/hea...

[2] http://uec-images.ubuntu.com/releases/lucid/beta-1/

Alon Swartz's picture

Thanks for the link to cloud-init (I see ec2-init has changed its name).

I thought about blocking the metadata IP, it seems like a good idea, but I'm just not sure how Amazon have setup its security (is it an actual machine? running on the host?). I wonder how easy it would be to bypass using some sort of mitm and IP spoofing.

Chances are, that blocking the 169.254 IP for incoming and outgoing would be sufficient, but, as you mentioned, user-data is not the ideal place to store secret information.

Rob Oliver's picture

I had been using this suggestion of blocking meta data as part of my user-data script, but it seems like you have to expose the meta data now if you want to use IAM Roles for EC2 Instances, which is now highly recommended by AWS (instead of embedding long-term persisted credentials on the instance).

One of the security recommendations AWS proposed during the re:Invent conference was to use a bastion host for all your EC2 instances, and log all activity.

Alon Swartz's picture

I finally got around to creating an ec2metadata package (uploaded to turnkey archive) and uploading it to github (only took me 3 years). While I was at it I did some refactoring and split the cli and lib. 

Pages

Add new comment