4.4. Common Comprehension Patterns for Two Sequences¶
List comprehensions are also useful for combining two sequences into a single list. In this section, we will look at a few common patterns that involve two sequence expressions in one comprehension.
4.4.1. Use for
twice to get all combinations¶
Suppose that we have two list L
and M
and we want to perform some
computation on all combinations of elements from these two lists. This can be
accomplished using two (independent) for
twice, once for each list. The code
shown below creates a tuple of all combinations.
In [1]: L = [1,2,3]
In [2]: M = ["a", "b"]
In [3]: new_list = [(i,j) for i in L for j in M]
In [4]: new_list
Out[4]: [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]
Note the order of the tuples in new_list
. When using a two sequence
expressions in a list comprehension, the right most comprehension will iterate
through its sequence once for each element in the left most sequence. Readers
familiar with a double for loop should note that the right most for
expression corresponds to the inner loop.
Check you understanding
- (A) [(1, "a"), (1,"b"), (2, "a"), (2, "b")]
- The comprehension cycles through each element in the second sequence before moving the the next item in the first.
- (B) [(1, "a"), (2,"a"), (1, "b"), (2, "b")]
- The comprehension cycles through each element in the second sequence before moving the the next item in the first.
- (C) [(1, 2), ("a","b")]
- A comprehension with two sequences with cycle through all combination of items, not each sequence separately.
rec-5-53: Which list is generated by the following list comprehension?
[(n, ch) for n in (1,2) for ch in ("a", "b")]
4.4.2. The second for
can depend on the first¶
The last example illustrated a comprehension pattern for constructing all combinations of elements from two list. Consider the following program, which applies this technique to one list versus itself.
In [5]: L = [1,2,3]
In [6]: new_list = [(i,j) for i in L for j in L]
In [7]: new_list
Out[7]: [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
Notice that some combinations are repeated, e.g. (1,2)
and (2,1)
.
Suppose that instead we want all unique pairs of values. In math, this
collection is often described as all \(i,j\) such that \(i \le j\). We
can use this approach with comprehensions as well. Two methods are shown below.
Method 1 - Filter with ``if``
In [8]: L = [1,2,3]
In [9]: new_list = [(i,j) for i in L for j in L if i <= j]
In [10]: new_list
Out[10]: [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
The first approach matches the mathematical definition exactly and ensures that
no value is matched with another value more than once by keeping the pairs
sorted. One of the reasons that comprehensions are useful is they mirror the
mathematical language of describing a collection. The only drawback to this
approach is that it requires cycling through all combinations. Because that are
\(n\) values to pair with each of \(n\) values, this operation is
\(O(n*n)=O(n^2)\). Suppose that we know that there are no repeated values
in L
. In this case, we can filter by indices instead of values. The
resulting method, while still \(O(n^2)\), will be close to twice as fast.
Method 2 - Use indices and ``range`` to filter
In [11]: L = [1,2,3]
In [12]: n = len(L)
In [13]: new_list = [(L[i],L[j]) for i in range(n) for j in range(i,n)]
In [14]: new_list
Out[14]: [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
This approach ensures that no index is matched with another index more than
once. The key point here is that we made the second range
depend on the
first, shortening the length of the sequence. Provided that all the elements of
L
are unique, this second method iterates over \(\frac{n(n-1)}{2}\)
elements instead of \(n^2\). In the long run, this approach will be a
little less than twice as fast but approach exactly twice as fast
asymptotically. Unfortunately, \(O\left(\frac{n(n-1)}{2}\right)=O\left(\frac{n^2}{2} -
\frac{n}{2}\right)=O(n^2)\), so this will still be expensive for large list.
Check your understanding
- (A) [(0, 0), (0, 1), (1, 0), (1, 1)]
- The second range will start at the current value of the first range.
- (B) [(0, 0), (0, 1), (1, 1)]
- The second range will start at the current value of the first range.
- (C) [(0, 1), (1,)]
- A comprehension with two sequences with cycle through all combination of items, not each sequence separately.
rec-5-54: Which list is generated by the following list comprehension?
[(i, j) for i in range(2) for j in range(i,2)]
4.4.3. Sometimes it is easier to describe what you don’t want¶
Suppose that we want to use list comprehensions to describe all the prime
numbers up to n=120
. A well-known algorithm for describing primes is the
Sieve of Eratosthenes,
For each number i
between 2 and \(\sqrt{n}\), this algorithm crosses out
all the multiples of i
. The remaining numbers will be prime. Notice that
this algorithm describes numbers that are prime by decribing all numbers that
are not. The following two list comprehensions perform the task of finding all
primes up to 120.
In [15]: from math import sqrt
In [16]: n = 120
In [17]: not_prime = [j for i in range(2, int(sqrt(n) + 1)) for j in range(2*i,n + 1, i)]
In [18]: prime = [i for i in range(2,n+1) if i not in not_prime]
In [19]: prime
Out[19]:
[2,
3,
5,
7,
11,
13,
17,
19,
23,
29,
31,
37,
41,
43,
47,
53,
59,
61,
67,
71,
73,
79,
83,
89,
97,
101,
103,
107,
109,
113]
This is another application of the last two patterns. We want all multiples of
i
and thus the second sequence expression depends on the first.
Furthermore, we want all multiples of i
for all i
up to the square root
of our cut-off, which requires two sequence expressions.
Illustration of the Sieve of Eratosthenes
n
by
crossing out all multiples of numbers from 2 to \(\sqrt{n}\). This image was
copied from Wikipedia
and is covered by the GNU Free Documentation License.As always, we can clean this code up with a little refactoring. Let’s use two
lambda expressions to give meaning two applications of range in not_prime
.
In [20]: from math import sqrt
In [21]: n = 120
In [22]: possible_factors = lambda n: range(2, int(sqrt(n) + 1))
In [23]: multiples_of = lambda i, n: range(2*i,n + 1, i)
In [24]: not_prime = [j for i in possible_factors(n) for j in multiples_of(i, n)]
In [25]: prime = [i for i in range(2,n+1) if i not in not_prime]
In [26]: prime
Out[26]:
[2,
3,
5,
7,
11,
13,
17,
19,
23,
29,
31,
37,
41,
43,
47,
53,
59,
61,
67,
71,
73,
79,
83,
89,
97,
101,
103,
107,
109,
113]
question
How might you make the last example more general?