Generators in Python
Python generators are functions that allows lazy evaluation of data, .i.e. data is evaluated on-need basis. Unlike functions, which return the value and then stop executing, generators simply pause their execution and during the next call continue the execution where they left, return another value and pause again.
The simplest way of creating generators is using generator comprehensions, which follow the format of list comprehensions but are surrounded with round brackets:
# list comprehension
lc = [x for x in range(2)]
# [1, 2]
# generator comprehension
gc = (x for x in range(2))
# <generator object <genexpr> at 0x23f8123>
A generator object is an iterable, i.e. it provides a method next
which calls
the next object of the iteration, or raises StopIteration
exception if there
isn’t any value left. Taking the example of gc
object that we created above:
gc.__next__()
# prints 0
gc.__next__()
# prints 1
gc.__next__()
# raises StopIteration
Since it’s an iterable, it can be used directly in the loop constructs. To show
this example, we’ll have to create a new generator object. This is because once
the generator object raises StopIteration
, it can’t be used again, we have to
create a new object.
gc1 = (x for x in range(2))
for value in gc1:
print(value)
Generator comprehensions are one way of creating generators, the other one is to
create the generator functions. The generator function creates (and sends back)
a sequence of results instead of a single value. This is done by using the
yield
statement instead of return
.
from math import sin
def iterate_sine():
current_value = 0
while current_value < 90:
yield sin(current_value)
current_value += 1
for sine_value in iterate_sine():
print(sine_value)
The confusing point in the code snippet above can be how it returns the value more than once?
Well, the idea behind generators is that once it hits the yield
statement, it
returns that value, and pauses the execution of code. Next time the generator is
called, it starts executing the code once again.
Notice the while loop above, since the code is only paused and not stopped, it
restarts in the next iteration, increments the value of current_value
, starts
next iteration of the loop yields value and pauses.
If we had used return
instead of yield
, the loop would have stopped either
after returning only one value, or we’d have to collect all the values in an
object and return that object.
def try_to_iterate_sine_with_return():
current_value = 0
while current_value < 90:
return sin(current_value)
# Keeping this for cosmetic purposes, it won't be hit anyway
current_value += 1
def iterate_sine_with_list():
current_value = 0
value_list = []
while current_value < 90:
value_list.append(sin(current_value))
current_value += 1
return value_list
While it won’t matter if the object is small and the values can be easily calculated but the use of generators is clearly advantageous when we have to defer the calculation of values, or the object returned will turn out to be huge.
For instance, reading a file which has millions of lines of text into a list is going to have large space requirements, but reading the same file into a generator, we can simply return one line of file at a time, taking down the memory required by a large amount.
file_obj = open('some_file.log')
for line in file_obj:
...