4.4. Common Comprehension Patterns for Two Sequences

List comprehensions are also useful for combining two sequences into a single list. In this section, we will look at a few common patterns that involve two sequence expressions in one comprehension.

4.4.1. Use for twice to get all combinations

Suppose that we have two list L and M and we want to perform some computation on all combinations of elements from these two lists. This can be accomplished using two (independent) for twice, once for each list. The code shown below creates a tuple of all combinations.

In [1]: L = [1,2,3]

In [2]: M = ["a", "b"]

In [3]: new_list = [(i,j) for i in L for j in M]

In [4]: new_list
Out[4]: [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]

Note the order of the tuples in new_list. When using a two sequence expressions in a list comprehension, the right most comprehension will iterate through its sequence once for each element in the left most sequence. Readers familiar with a double for loop should note that the right most for expression corresponds to the inner loop.

Check you understanding

    rec-5-53: Which list is generated by the following list comprehension?

    [(n, ch) for n in (1,2) for ch in ("a", "b")]
    
  • (A) [(1, "a"), (1,"b"), (2, "a"), (2, "b")]
  • The comprehension cycles through each element in the second sequence before moving the the next item in the first.
  • (B) [(1, "a"), (2,"a"), (1, "b"), (2, "b")]
  • The comprehension cycles through each element in the second sequence before moving the the next item in the first.
  • (C) [(1, 2), ("a","b")]
  • A comprehension with two sequences with cycle through all combination of items, not each sequence separately.

4.4.2. The second for can depend on the first

The last example illustrated a comprehension pattern for constructing all combinations of elements from two list. Consider the following program, which applies this technique to one list versus itself.

In [5]: L = [1,2,3]

In [6]: new_list = [(i,j) for i in L for j in L]

In [7]: new_list
Out[7]: [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]

Notice that some combinations are repeated, e.g. (1,2) and (2,1). Suppose that instead we want all unique pairs of values. In math, this collection is often described as all \(i,j\) such that \(i \le j\). We can use this approach with comprehensions as well. Two methods are shown below.

Method 1 - Filter with ``if``

In [8]: L = [1,2,3]

In [9]: new_list = [(i,j) for i in L for j in L if i <= j]

In [10]: new_list
Out[10]: [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

The first approach matches the mathematical definition exactly and ensures that no value is matched with another value more than once by keeping the pairs sorted. One of the reasons that comprehensions are useful is they mirror the mathematical language of describing a collection. The only drawback to this approach is that it requires cycling through all combinations. Because that are \(n\) values to pair with each of \(n\) values, this operation is \(O(n*n)=O(n^2)\). Suppose that we know that there are no repeated values in L. In this case, we can filter by indices instead of values. The resulting method, while still \(O(n^2)\), will be close to twice as fast.

Method 2 - Use indices and ``range`` to filter

In [11]: L = [1,2,3]

In [12]: n = len(L)

In [13]: new_list = [(L[i],L[j]) for i in range(n) for j in range(i,n)]

In [14]: new_list
Out[14]: [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

This approach ensures that no index is matched with another index more than once. The key point here is that we made the second range depend on the first, shortening the length of the sequence. Provided that all the elements of L are unique, this second method iterates over \(\frac{n(n-1)}{2}\) elements instead of \(n^2\). In the long run, this approach will be a little less than twice as fast but approach exactly twice as fast asymptotically. Unfortunately, \(O\left(\frac{n(n-1)}{2}\right)=O\left(\frac{n^2}{2} - \frac{n}{2}\right)=O(n^2)\), so this will still be expensive for large list.

Check your understanding

    rec-5-54: Which list is generated by the following list comprehension?

    [(i, j) for i in range(2) for j in range(i,2)]
    
  • (A) [(0, 0), (0, 1), (1, 0), (1, 1)]
  • The second range will start at the current value of the first range.
  • (B) [(0, 0), (0, 1), (1, 1)]
  • The second range will start at the current value of the first range.
  • (C) [(0, 1), (1,)]
  • A comprehension with two sequences with cycle through all combination of items, not each sequence separately.

4.4.3. Sometimes it is easier to describe what you don’t want

Suppose that we want to use list comprehensions to describe all the prime numbers up to n=120. A well-known algorithm for describing primes is the Sieve of Eratosthenes, For each number i between 2 and \(\sqrt{n}\), this algorithm crosses out all the multiples of i. The remaining numbers will be prime. Notice that this algorithm describes numbers that are prime by decribing all numbers that are not. The following two list comprehensions perform the task of finding all primes up to 120.

In [15]: from math import sqrt

In [16]: n = 120

In [17]: not_prime = [j for i in range(2, int(sqrt(n) + 1)) for j in range(2*i,n + 1, i)]

In [18]: prime = [i for i in range(2,n+1) if i not in not_prime]

In [19]: prime
Out[19]: 
[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97,
 101,
 103,
 107,
 109,
 113]

This is another application of the last two patterns. We want all multiples of i and thus the second sequence expression depends on the first. Furthermore, we want all multiples of i for all i up to the square root of our cut-off, which requires two sequence expressions.

Illustration of the Sieve of Eratosthenes

Illustration of the Sieve of Eratosthenes
The Sieve of Eratosthenes finds all the primes up to some number n by crossing out all multiples of numbers from 2 to \(\sqrt{n}\). This image was copied from Wikipedia and is covered by the GNU Free Documentation License.

As always, we can clean this code up with a little refactoring. Let’s use two lambda expressions to give meaning two applications of range in not_prime.

In [20]: from math import sqrt

In [21]: n = 120

In [22]: possible_factors = lambda n: range(2, int(sqrt(n) + 1))

In [23]: multiples_of = lambda i, n: range(2*i,n + 1, i)

In [24]: not_prime = [j for i in possible_factors(n) for j in multiples_of(i, n)]

In [25]: prime = [i for i in range(2,n+1) if i not in not_prime]

In [26]: prime
Out[26]: 
[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97,
 101,
 103,
 107,
 109,
 113]

question

How might you make the last example more general?

Next Section - 4.5. Common Comprehension Patterns for Tables