4.3. Common Comprehension Patterns for a Single Sequence¶
List comprehensions can replace for
loops in many (most) situations. In
the following sections, we will highlight some useful techniques for describing
a new list using comprehensions.
4.3.1. Combine information in tuples¶
We start with useful pattern for list comprehensions involves combining information into a tuple. For example, lets compute the cube-root of all the numbers between 1 and 10 and use a tuple to store both the original number and the value of the cube-root.
In [1]: cube_root = lambda n: n**(1/3)
In [2]: L = [(val, cube_root(val)) for val in range(1,10)]
In [3]: L
Out[3]:
[(1, 1.0),
(2, 1.2599210498948732),
(3, 1.4422495703074083),
(4, 1.5874010519681994),
(5, 1.7099759466766968),
(6, 1.8171205928321397),
(7, 1.912931182772389),
(8, 2.0),
(9, 2.080083823051904)]
When applying a list comprehension to a list of tuples, we can save each of the
values to a separate variable by providing a comma-separated sequence of
variables between for
and in
, as illustrated below.
In [4]: cube_root_less_than_2 = [val for val, cube_root in L if cube_root < 2]
In [5]: cube_root_less_than_2
Out[5]: [1, 2, 3, 4, 5, 6, 7]
This approach will work for any list of tuples regardless of the length of the tuples, provided that the number of variables matches the length of the tuples.
4.3.2. Use built-in helper functions¶
The functions enumerate
and zip
both exhibit the pattern from the last
section. The enumerate
function can be used to
return both the index and value of each element of a sequence.
In [6]: L = [1,2,3,4,5,6]
In [7]: pairs = [(ind, val) for ind, val in enumerate(L)]
In [8]: pairs
Out[8]: [(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
This can be useful if we are describing a transformation that involves both the
value and the position of the value in the sequence. For example, suppose we
want to add 3 to the first 3 values of L
.
In [9]: new_list = [val + 3 if ind < 3 else val for ind, val in enumerate(L)]
In [10]: new_list
Out[10]: [4, 5, 6, 4, 5, 6]
Without enumerate
, we would have had to focus on the index and used the
indexing operator to access the value.
In [11]: new_list = [L[ind] + 3 if ind <= 3 else L[ind] for ind in range(len(L))]
In [12]: new_list
Out[12]: [4, 5, 6, 7, 5, 6]
Clearly using enumerate
led to a simpler and easier to read construction.
Be sure to use this function whenever you need both the index and value of a
sequence.
Check your Understanding
- (A) [(1, 0), (2, 1), (3, 2)]
- Remember that Python starts counting at 0! So does enumerate.
- (B) [(0, 1), (1, 2), (2, 3)]
- Remember that Python starts counting at 0! So does range.
- (C) [(0, 0), (1, 1), (2, 2)]
- (D) [(0, 1, 2), (0, 1, 2)]
- Enumerate pairs the index with the original value.
rec-5-43: What will be printed by executing the following code block?
L = list(enumerate(range(3)))
print(L)
- (A) [(1, 0), (2, 1), (3, 2)]
- Enumerate returns pairs with the index in the first entry and the value in the second.
- (B) [(0, 1), (1, 2), (2, 3)]
- (C) [(0, 0), (1, 1), (2, 2)]
- This range function starts at 1 and goes up to (but not including) 4.
- (D) [(0, 1, 2), (1, 2, 3)]
- Enumerate pairs the index with the original value.
rec-5-44: What will be printed by executing the following code block?
L = list(enumerate(range(1,4)))
print(L)
The zip
function combines two sequences into one sequence
of pairs.
In [13]: L = [1, 2, 3, 4]
In [14]: M = ["a", "b", "c"]
In [15]: new_list = [(Lval, Mval) for Lval, Mval in zip(L,M)]
In [16]: new_list
Out[16]: [(1, 'a'), (2, 'b'), (3, 'c')]
Notice that the length of the new list is the same as the shorter list. One
example of an application of zip
comes from probability. Suppose that
\(X\) and \(Y\) are random variables that are the results of rolling
fair 6 an 20 sided die, respectively. We wish to simulate the distribution of
the sum of these two values. We can accomplish this by generating a number of
trials for each the dice separately, then computing the sum using a list
comprehension and zip
.
In [17]: from random import randint
In [18]: N_trials = 10
In [19]: six_sided = [randint(1,6) for i in range(N_trials)]
In [20]: six_sided
Out[20]: [1, 5, 2, 1, 1, 2, 5, 2, 2, 2]
In [21]: twenty_sided = [randint(1,20) for i in range(N_trials)]
In [22]: twenty_sided
Out[22]: [5, 17, 10, 3, 11, 6, 18, 7, 2, 3]
In [23]: sums = [r6 + r20 for r6, r20 in zip(six_sided, twenty_sided)]
In [24]: sums
Out[24]: [6, 22, 12, 4, 12, 8, 23, 9, 4, 5]
To generalize this process, we use lambda
expressions to create general
functions for creating each sequence such that the value of the number of trials
N
can be adjusted.
In [25]: six_sided = lambda N: [randint(1,6) for i in range(N)]
In [26]: twenty_sided = lambda N: [randint(1,20) for i in range(N)]
In [27]: sums = lambda N: [r6 + r20 for r6, r20 in zip(six_sided(N), twenty_sided(N))]
In [28]: mean = lambda L: sum(L)/len(L)
In [29]: mean(sums(1000000))
Out[29]: 14.004244
Above we illustrate this refactoring of the original code and use the newly
constructed functions to simulate the average of 1 million rolls. As both
six_sided
and twenty_sided
are now functions, the value of N
need to
be passed along in the definition of sums
.
This is another example of how lambda expressions can be used to transform a
specific example into a more general solution. This is done by identifying the
variable(s) we would like to change and adding them as formal parameters to a
lambda expression. This is as simple as appending the lambda N:
to the
front of our expressions and changing some variable references to function
calls.
Check Your Understanding
- (A) [(1, 0), (2, 1), (3, 2)]
- Zip preserves the order of the original arguments. In this case the values from range(3) will preceed the values from range(1,4).
- (B) [(0, 1), (1, 2), (2, 3)]
- (C) [(0, 0), (1, 1), (2, 2)]
- The second range function starts at 1 and goes up to (but not including) 4.
- (D) [(0, 1, 2), (1, 2, 3)]
- Zip combines the sequences in pairs based on index (1st with 1st, 2nd with 2nd, ...)
rec-5-45: What will be printed by executing the following code block?
L = list(zip(range(3), range(1,4)))
print(L)
- (A) [(1, 0), (2, 1), (3, 2)]
- The first range starts at 0 and counts up to 2.
- (B) [(0, 1), (1, 2), (2, 3)]
- The second range starts at 0 and counts up to 3.
- (C) [(0, 0), (1, 1), (2, 2)]
- If given sequences of different length, zip will stop at the end of the shorted sequence.
- (D) Error, you can't zip lists of different length.
- If given sequences of different length, zip will stop at the end of the shorted sequence.
rec-5-46: What will be printed by executing the following code block?
L = list(zip(range(3), range(4)))
print(L)
Finally, we highlight the reversed
function, which allows us to
iterate through a sequence from back to front.
In [30]: L = [1,2,3,4,5,6]
In [31]: new_list = [i for i in reversed(L)]
In [32]: new_list
Out[32]: [6, 5, 4, 3, 2, 1]
4.3.3. Use built-in functions to reduce a list to a value¶
There are a number of built-in Python functions that help us reduce a list to a
value, including sum
, len
, max
, and min
. Remember to use these
functions along with a list comprehension to describe a computation on a
sequence of values.
For example, suppose that we want to compute the sum of squares for a small set of numbers. Let’s give this a try using the regular definition, shown below.
For each value in the list we must subtract the mean and the square this difference. Finally, we add up all of these values. We will create a function for computing the mean and then another for computing the sum of squares.
In [33]: mean = lambda L: sum(L)/len(L)
In [34]: ss = lambda L: sum([(i - mean(L))**2 for i in L])
In [35]: my_list = [1,2,3,4,5]
In [36]: mean(my_list)
Out[36]: 3.0
In [37]: ss(my_list)