5.1. Dictionaries

All of the compound data types we have studied in detail so far — strings, lists, and tuples — are sequential collections. This means that the items in the collection are ordered from left to right and they use integers as indices to access the values they contain.

Dictionaries and Sets are Python’s two built-in associative data types. The dictionary associates one value with another, whereas the set associates a value with membership in some collection or group. In this chapter, we will introduce these two data structures and discuss their application in data wrangling and analysis.

5.1.1. Mapping One Value to Another with Dictionaries

The dictionary is a mapping type. A map is an unordered, associative collection. The association, or mapping, is from a key, which can be any immutable type, to a value, which can be any Python data object. As an example, we will create a dictionary to translate English words into Spanish. For this dictionary, the keys are strings and the values will also be strings.

We can create a dictionary by providing a list of key-value pairs separated by : between the keys and values and , between each pair.

In [1]: eng2sp = {'three': 'tres', 'one': 'uno', 'two': 'dos'}

In [2]: eng2sp
Out[2]: {'one': 'uno', 'three': 'tres', 'two': 'dos'}

It doesn’t matter what order we write the pairs. The values in a dictionary are accessed with keys, not with indices, so there is no need to care about ordering.

Here is how we use a key to look up the corresponding value.

In [3]: eng2sp = {'three': 'tres', 'one': 'uno', 'two': 'dos'}

In [4]: eng2sp['two']
Out[4]: 'dos'

The key 'two' yields the value 'dos'.

You may recall that we introduced get function from the toolz module in the chapter on sequences. A functional alternative to using the get operator (name[key]) is applying the same get function to a dictionary.

In [5]: from toolz import get

In [6]: get('two', eng2sp)
Out[6]: 'dos'

Using the get function has a couple of advantages. First, we can use a common API for getting values from both sequences and lists.

In [7]: L = [1,2,3]

In [8]: get(1, L)
Out[8]: 2

In [9]: D = {1:"a", 2:"b"}

In [10]: get(1, D)
Out[10]: 'a'

Second, get allows us to get the values for a number of keys simultaneously.

In [11]: get([2,1], L)
Out[11]: (3, 2)

In [12]: get(['one', 'two'], eng2sp)
Out[12]: ('uno', 'dos')

Finally, we can assign an optional default value for keys that are not in the dictionary.

In [13]: get(['one', 'two', 'four'], eng2sp, default=None)
Out[13]: ('uno', 'dos', None)

This flexibility makes get our go-to method for accessing data from a dictionary.

Note

This workspace is provided for your convenience. You can use this activecode window to try out anything you like.

Check your understanding

    Q-5: A dictionary is an unordered collection of key-value pairs.
  • (A) False
  • Dictionaries associate keys with values but there is no assumed order for the entries.
  • (B) True
  • Yes, dictionaries are associative collections meaning that they store key-value pairs.

    Q-6: what is printed by the following statements?

    from toolz import get
    mydict = {"cat":12, "dog":6, "elephant":23}
    print(get("dog",mydict))
    
  • (A) 12
  • 12 is associated with the key cat.
  • (B) 6
  • yes, 6 is associated with the key dog.
  • (C) 23
  • 23 is associated with the key elephant.
  • (D) error, you cannot use the index operator with a dictionary.
  • the [ ] operator, when used with a dictionary, will look up a value based on its key.

5.1.2. Dictionary Keys Must Be Immutable

The dictionary uses the built-in function hash to quickly look-up the value for a given key (discussed in detail in _dict_complexity). A requirement of this process is that all keys be unique.

In [14]: hash('a')
Out[14]: -2755768832532532667

In [15]: hash(1)
Out[15]: 1

In [16]: hash(2.5)
Out[16]: 1152921504606846978

In [17]: hash((1,2))
Out[17]: 3713081631934410656

If we allowed mutable object to be keys, how would we handle changes that confuse the hash function? For example, if [1] was a key and this value was mutated by appending another value, say 2, the new list [1, 2] would have a different hash value. This could lead to the value being lost or two values in the dictionary having the same hash. Consequently, mutable items are not hashable, meaning that hash will throw an exception when given a mutable object.

In [18]: hash([1,2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-ead01a573561> in <module>()
----> 1 hash([1,2])

TypeError: unhashable type: 'list'

In [19]: hash({1:'z'})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-11a6980f1fd6> in <module>()
----> 1 hash({1:'z'})

TypeError: unhashable type: 'dict'

For this reason, all keys in a dictionary much be immutable object. Examples of eligible values are strings, number or tuples.

In [20]: d = {1:'int', 2.5:'float', 's':'string', (1,2):'tuple'}

In [21]: d
Out[21]: {(1, 2): 'tuple', 1: 'int', 2.5: 'float', 's': 'string'}

Mutable objects, like lists and dictionaries, cannot be dictionary keys.

In [22]: from toolz import assoc

In [23]: assoc(d, [1, 2], 'list')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-4cf2bde721f7> in <module>()
----> 1 assoc(d, [1, 2], 'list')

/Users/bn8210wy/.pyenv/versions/3.5.2/envs/runestone/lib/python3.5/site-packages/toolz/dicttoolz.py in assoc(d, key, value, factory)
    193     """
    194     d2 = factory()
--> 195     d2[key] = value
    196     return merge(d, d2, factory=factory)
    197 

TypeError: unhashable type: 'list'

In [24]: assoc(d, {'a':5}, 'list')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-c797888df932> in <module>()
----> 1 assoc(d, {'a':5}, 'list')

/Users/bn8210wy/.pyenv/versions/3.5.2/envs/runestone/lib/python3.5/site-packages/toolz/dicttoolz.py in assoc(d, key, value, factory)
    193     """
    194     d2 = factory()
--> 195     d2[key] = value
    196     return merge(d, d2, factory=factory)
    197 

TypeError: unhashable type: 'dict'

Note that the error message is pointing out that a list and dictionary are not hashable (can’t be converted to a number using hash). This is due to the fact that they are both mutable.

Check your understanding

    Q-7: Which of the following calls to assoc WILL NOT cause an exception?
  • (A) assoc({}, (1, 2), 0)
  • Tuples are immutable and work as dictionary keys.
  • (B) assoc({}, [1, 2], 0)
  • Lists are mutable and cannot be used as a dictionary key. This call will cause an exception.
  • (C) assoc({}, {1:"one"}, 0)
  • Dictionaries are mutable and cannot be used as a dictionary key. This line will cause an exception.

5.1.3. assoc and disoc

Dictionaries are mutable. As mentioned earlier, the functional style of programming involves returning new copies of a data structure instead of mutating a data structure in place. The easiest way to accomplish this is using two functions from the toolz.dicttoolz module, namely assoc and dissoc.

The function assoc is used to associate a value with a key, which can be a new key or updating the value of an existing key. In the next example, we will update the value associated with 'pears' and 'peaches' to 0. We verify that the dictionary that is returned is not the same as the old dictionary.

In [25]: from toolz.dicttoolz import assoc, dissoc

In [26]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [27]: new_inventory = assoc(inventory, 'pears', 0)

In [28]: new_inventory = assoc(new_inventory, 'peaches', 0)

In [29]: new_inventory
Out[29]: {'apples': 430, 'bananas': 312, 'oranges': 525, 'peaches': 0, 'pears': 0}

In [30]: inventory is new_inventory
Out[30]: False

Similarly, a new shipment of 200 bananas arriving could be handled like this.

In [31]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [32]: updated_bananas = get('bananas',inventory) + 200

In [33]: inventory = assoc(inventory, 'bananas', updated_bananas)

In [34]: inventory
Out[34]: {'apples': 430, 'bananas': 512, 'oranges': 525, 'pears': 217}

In [35]: len(inventory)
Out[35]: 4

Notice that there are now 512 bananas—the dictionary has been modified. Note also that the len function also works on dictionaries. It returns the number of key-value pairs:

The function dissoc is used to remove a key (and its value) from a dictionary. As with assoc, a new dictionary is returned.

In [36]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [37]: new_dict = dissoc(inventory, 'pears')

In [38]: new_dict
Out[38]: {'apples': 430, 'bananas': 312, 'oranges': 525}

In [39]: inventory is new_dict
Out[39]: False

Check your understanding

    Q-8: What is printed by the following statements?

    mydict = {"cat":12, "dog":6, "elephant":23}
    mouse_val = sum(get(["cat", "dog"], mydict))
    mydict = assoc(mydict, "mouse", mouse_val)
    print(get("mouse", mydict)
    
  • (A) 12
  • 12 is associated with the key cat.
  • (B) 0
  • The key mouse will be associated with the sum of the two values.
  • (C) 18
  • Yes, add the value for cat and the value for dog (12 + 6) and create a new entry for mouse.
  • (D) Error, there is no entry with mouse as the key.
  • Since the new key is introduced on the left hand side of the assignment statement, a new key-value pair is added to the dictionary.

    Q-9: What is printed by the following statements?

    mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
    answer = get( "cat", mydict) // get("dog",mydict)
    print(answer)
    
  • (A) 2
  • get returns the value associated with a given key so this divides 12 by 6.
  • (B) 0.5
  • 12 is divided by 6, not the other way around.
  • (C) bear
  • Take another look at the example for get above. get returns the value associated with a given key.
  • (D) Error, divide is not a valid operation on dictionaries.
  • The integer division operator is being used on the values returned from the get method, not on the dictionary.

5.1.4. Dictionary Methods

Dictionaries have a number of useful built-in methods. The following table provides a summary and more details can be found in the Python Documentation.

Method Parameters Description
keys none Returns a view of the keys in the dictionary
values none Returns a view of the values in the dictionary
items none Returns a view of the key-value pairs in the dictionary

The keys method returns what Python 3 calls a view over its underlying keys. A view is an example of an iterable.

In [40]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [41]: ks = inventory.keys()

In [42]: type(ks)
Out[42]: dict_keys

In [43]: ks
Out[43]: dict_keys(['bananas', 'apples', 'oranges', 'pears'])

We can turn the iterable item into an iterator using the iter function. An iterator is a lazy construction that allows through the keys, one after the other using the next function.

In [44]: iter_ks = iter(ks)

In [45]: type(iter_ks)
Out[45]: dict_keyiterator

In [46]: next(iter_ks)
Out[46]: 'bananas'

In [47]: next(iter_ks)
Out[47]: 'apples'

In [48]: next(iter_ks)
Out[48]: 'oranges'

In [49]: next(iter_ks)
Out[49]: 'pears'

In [50]: next(iter_ks)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-50-a86c12757b9e> in <module>()
----> 1 next(iter_ks)

StopIteration: 

An exception will be thrown once we reach the end of the iterator. After the iterator has stepped through the sequence once, it is empty and can’t be used again.

While understanding this gives a deeper insight into how Python works, Python takes care of all of the details for us. When we process an iterable value in a list comprehension, Python hides the detail of making the iterator, using next to step through the sequence and stopping when the exception is thrown.

Note

It is important to note that there is no guarentee the order of the keys will be returned! Dictionaries are unordered sets. If order is important either sort or use a sequence data structure.

We can use the iterator to iterate through the keys using a list comprehension or by converting the view into a list.

In [51]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [52]: keys = [akey for akey in inventory.keys()]

In [53]: keys
Out[53]: ['bananas', 'apples', 'oranges', 'pears']

In [54]: ks = list(inventory.keys())

In [55]: ks
Out[55]: ['bananas', 'apples', 'oranges', 'pears']

In [56]: [akey for akey in inventory]
Out[56]: ['bananas', 'apples', 'oranges', 'pears']

As we saw earlier with strings and lists, dictionary methods use dot notation, which specifies the name of the method to the right of the dot and the name of the object on which to apply the method immediately to the left of the dot. The empty parentheses in the case of keys indicate that this method takes no parameters. Finally, we note that iterating over the dictionary name iterates over the keys.

The values and items methods are similar to keys. They return view objects which can be turned into lists or iterated over directly. Note that the items are shown as tuples containing the key and the associated value.

In [57]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [58]: list(inventory.values())
Out[58]: [312, 430, 525, 217]

In [59]: pairs = [(k,v) for k,v in inventory.items()]

Note that tuples are often useful for getting both the key and the value at the same time while you are looping.

The in and not in operators can test if a key is in the dictionary:

In [60]: inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

In [61]: 'apples' in inventory
Out[61]: True

In [62]: 'cherries' in inventory
Out[62]: False

In [63]: 'bananas'.upper()+3*'!' if 'bananas' in inventory else "We have no bananas"
Out[63]: 'BANANAS!!!'

This operator can be very useful since looking up a non-existent key in a dictionary causes a runtime error.

Check your understanding

    Q-10: What is printed by the following statements?

    from toolz import get
    mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
    keylist = list(mydict.keys())
    keylist.sort()
    print(get(3, keylist))
    
  • (A) cat
  • keylist is a list of all the keys which is then sorted. cat would be at index 1.
  • (B) dog
  • keylist is a list of all the keys which is then sorted. dog would be at index 2.
  • (C) elephant
  • Yes, the list of keys is sorted and the item at index 3 is printed.
  • (D) bear
  • keylist is a list of all the keys which is then sorted. bear would be at index 0.

    Q-11: What is printed by the following statements?

    mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
    print("dog" in mydict)
    
  • (A) True
  • Yes, dog is a key in the dictionary.
  • (B) False
  • The in operator returns True if a key is in the dictionary, False otherwise.

    Q-12: What is printed by the following statements?

    mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
    print(23 in mydict)
    
  • (A) True
  • 23 is a value in the dictionary, not a key.
  • (B) False
  • Yes, the in operator returns True if a key is in the dictionary, False otherwise.

    Q-13: What is printed by the following statements?

    from toolz import get
    mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
    total = sum([get(akey, mydict) for akey in mydict if len(akey) > 3])
    print(total)
    
  • (A) 18
  • Add the values that have keys greater than 3, not equal to 3.
  • (B) 43
  • Yes, the for statement iterates over the keys. It adds the values of the keys that have length greater than 3.
  • (C) 0
  • This is the accumulator pattern. total starts at 0 but then changes as the iteration proceeds.
  • (D) 61
  • Not all the values are added together. The if statement only chooses some of them.

Aliasing and Copying

Because dictionaries are mutable, you need to be aware of aliasing when mutating values (as we saw with lists). Whenever two variables refer to the same dictionary object, changes to one affect the other. For example, opposites is a dictionary that contains pairs of opposites.

In [64]: opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}

In [65]: alias = opposites

In [66]: alias is opposites
Out[66]: True

In [67]: alias['right'] = 'left'

In [68]: opposites['right']
Out[68]: 'left'

As you can see from the is operator, alias and opposites refer to the same object. This illustrates another reason that we prefer to use assoc and dissoc to update a dictionary: returning fresh copies of a dictionary eliminates any problems cause by aliasing and mutation.

In [69]: opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}

In [70]: alias = opposites

In [71]: alias is opposites
Out[71]: True

In [72]: not_alias = assoc(alias, 'right', 'left')

In [73]: not_alias is opposites
Out[73]: False

In [74]: get('right', opposites)
Out[74]: 'wrong'

In [75]: get('right', alias)
Out[75]: 'wrong'

In [76]: get('right', not_alias)
Out[76]: 'left'

5.1.5. Immutable and Persistent Dictionaries

In the last section, the functions assoc and dissoc where used to create new copies of dictionaries with updated values. We would expect that copying a dictionary would add some complexity to the process. We will take a closer look at the relative efficiency of a number of approaches to building a dictionary in the next chapter, but in the mean time we note that the pyrsistent module contains a persistent dictionary/map that should reduce the penalty for copying the contents of a dictionary over and over.

In [77]: from pyrsistent import pmap

In [78]: opposites = pmap({'up': 'down', 'right': 'wrong', 'true': 'false'})

In [79]: type(opposites)
Out[79]: pyrsistent._pmap.PMap

In [80]: not_alias = assoc(alias, 'right', 'left')

In [81]: not_alias
Out[81]: {'right': 'left', 'true': 'false', 'up': 'down'}

In [82]: not_alias is opposites
Out[82]: False

In [83]: get('up', not_alias)
Out[83]: 'down'

In [84]: also_not_alias = dissoc(not_alias, 'up')

In [85]: also_not_alias
Out[85]: {'right': 'left', 'true': 'false'}

In [86]: also_not_alias is not_alias
Out[86]: False

In [87]: also_not_alias is alias
Out[87]: False

Notice that the same semantics that we use to work with dictionaries, i.e. using assoc, dissoc and get also work with a pmap.

It should also be noted that, because all of the data types in pyrsistent are immutable, these data structures can be used as keys in a dictionary!

In [88]: from pyrsistent import pmap, pvector

In [89]: d = {pmap({'a':1}):'pmap', pvector([1,2]):'pvector'}

In [90]: d
Out[90]: {pvector([1, 2]): 'pvector', pmap({'a': 1}): 'pmap'}
Next Section - 5.2. Sets