TurnKey Linux Virtual Appliance Library

Python iterators considered harmful

I just tracked down a nasty bug in my code to a gotcha with Python iterators.

Consider the following code...


class Numbers(list):
    def even(self):
        for val in self:
            if val % 2 == 0:
                yield val
    even = property(even)

    def odd(self):
        for val in self:
            if val % 2 != 0:
                yield val
    odd = property(odd)

nums = Numbers(range(10))

# [0, 2, 4, 6, 8]
print `list(nums.even)`

# [0, 2, 4, 6, 8]
print `list(nums.even)`

even = nums.even

# [0, 2, 4, 6, 8]
print `list(even`)

# GOTCHA!
# []
print `list(even`)

This is evil. Watch out for it. In this example, if you don't actually need iterators you could just as easily rewrite Numbers using list comprehension:


class Numbers(list):
    def even(self):
        return [ val for val in self if val % 2 == 0 ]
    even = property(even)

    def odd(self):
        return [ val for val in self if val % 2 != 0 ]
    odd = property(odd)

No subtle gotchas and the code is even shorter!

You can get future posts delivered by email or good old-fashioned RSS.
TurnKey also has a presence on Google+, Twitter and Facebook.

Comments

How about...

... subclassing built-in types considered harmful?  Or generator-properties considered harmful?  Or self-iterables (iterables which are their own iterators) considered harmful?

Iterator != generator

Mixing up generators with iterators considered harmful.  Also, mixing property decorators with generator functions considered stupid.

try this

property of type iterator does not make any sense to me

try this:

class Even(object):
  def __init__(self, list):
    self.list = list

  def __iter__(self):
    for val in self.list:
      if val % 2 == 0: yield val

class Odd(object):
  def __init__(self, list):
    self.list = list

  def __iter__(self):
    for val in self.list:
      if val % 2 != 0: yield val

class Numbers(list):
  even = property(lambda self: Even(self))
  odd = property(lambda self: Odd(self))

stateful properties considered harmful?

seriously, instantiating a finite generator and then wondering that it can be exhausted?

You should read about

You should read about descriptors.

The difference between iterables and iterators

Yes, it is essential to know the difference between iterables and iterators in Python. Iterators are stateful objects - they know how far through their sequence they are. Once they reach the end (if they have one), that's it. Iterables (or, more accurately, reiterables) instead are able to create iterators on demand. Since each iteration operation on a reiterable object implicitly creates a new iterator, they automatically start again from the beginning.

When all you have is an iterator, and you need to guarantee reiterability, then you need to either store the values in a list, or else use an appropriate tool (such as itertools.tee or a size limited deque) to retain access to the earlier values that you need.

Your "gotcha" is just a natural consequence of the stateful nature of iterators. You can get exactly the same effect by doing the following:

with open("a_file") as f:
   for line in f:
     print "A line from the file: %s" % line
   for line in f:
     print "This will never be printed"

The reason being, of course, that the first loop moved the read pointer to the end of the file, so you need to do an f.seek() to get back to the start (or somewhere else in the file) before iterating again will produce any output.

However, the huge advantage of iterators (and generators in particular) is precisely this point - because each value is kept around for only as long as it is needed, they can save vast amounts of memory compared to actual container objects. And, of course, some iterators produce infinite sequences that simply *cannot* be created as a finite list:

def evens(start=2):
  if start <= 0 or start % 2 != 0:
    raise ValueError("Start value %d is not positive and even" % start)
  i = start
  while 1:
    yield i
    i += 2


def odds(start=1):
  if start <= 0 or start % 2 != 1:
    raise ValueError("Start value %d is not positive and odd!" % start)
  i = start
  while 1:
    yield i
    i += 2

Thanks

Nick Coghlan

Thanks a lot
 

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)