You are here
Python iterators considered harmful
Liraz Siri - Mon, 2011/07/11 - 14:01 -
7 comments
I just tracked down a nasty bug in my code to a gotcha with Python iterators.
Consider the following code...
class Numbers(list):
def even(self):
for val in self:
if val % 2 == 0:
yield val
even = property(even)
def odd(self):
for val in self:
if val % 2 != 0:
yield val
odd = property(odd)
nums = Numbers(range(10))
# [0, 2, 4, 6, 8]
print `list(nums.even)`
# [0, 2, 4, 6, 8]
print `list(nums.even)`
even = nums.even
# [0, 2, 4, 6, 8]
print `list(even`)
# GOTCHA!
# []
print `list(even`)
This is evil. Watch out for it. In this example, if you don't actually need iterators you could just as easily rewrite Numbers using list comprehension:
class Numbers(list):
def even(self):
return [ val for val in self if val % 2 == 0 ]
even = property(even)
def odd(self):
return [ val for val in self if val % 2 != 0 ]
odd = property(odd)
No subtle gotchas and the code is even shorter!
Comments
How about...
... subclassing built-in types considered harmful? Or generator-properties considered harmful? Or self-iterables (iterables which are their own iterators) considered harmful?
Iterator != generator
Mixing up generators with iterators considered harmful. Also, mixing property decorators with generator functions considered stupid.
try this
property of type iterator does not make any sense to me
try this:
stateful properties considered harmful?
seriously, instantiating a finite generator and then wondering that it can be exhausted?
You should read about
You should read about descriptors.
The difference between iterables and iterators
Yes, it is essential to know the difference between iterables and iterators in Python. Iterators are stateful objects - they know how far through their sequence they are. Once they reach the end (if they have one), that's it. Iterables (or, more accurately, reiterables) instead are able to create iterators on demand. Since each iteration operation on a reiterable object implicitly creates a new iterator, they automatically start again from the beginning.
When all you have is an iterator, and you need to guarantee reiterability, then you need to either store the values in a list, or else use an appropriate tool (such as itertools.tee or a size limited deque) to retain access to the earlier values that you need.
Your "gotcha" is just a natural consequence of the stateful nature of iterators. You can get exactly the same effect by doing the following:
The reason being, of course, that the first loop moved the read pointer to the end of the file, so you need to do an f.seek() to get back to the start (or somewhere else in the file) before iterating again will produce any output.
However, the huge advantage of iterators (and generators in particular) is precisely this point - because each value is kept around for only as long as it is needed, they can save vast amounts of memory compared to actual container objects. And, of course, some iterators produce infinite sequences that simply *cannot* be created as a finite list:
Thanks
Nick Coghlan
Thanks a lot
Pages
Add new comment