5.5. Working with Nested Data Structures

Storing information in a nested data structure has become a common practice. This allows information to be collected in one structure while still allowing the most appropriate data structure for storing each different types of information internally.

One popular standard format for storing such information is JSON, which stands for JavaScript Object Notation. An example of data stored in the JSON format is presented below.

{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": {
    "home":   {
                "type": "home",
                "number": "212 555-1234"
              },
    "office":{
               "type": "office",
               "number": "646 555-4567"
             },
    "modile":{
               "type": "mobile",
               "number": "123 456-7890"
             }
  },
  "children": ["Alice", "Ben"],
  "spouse": null
}

Source

This example was adapted from the Wikipedia JSON page which is shared under the Creative Commons Attribution-ShareAlike License.

Notice that the JSON data structure consists of literal values (string, numbers, JavaScript booleans, null, etc.) inside of data structures that look remarkably similar to Python lists and dictionaries.

Note

The json module also has functions for reading and writing JSON to a file (load and dump, repectively).

5.5.1. Reading JSON data into Python

Python can read a JSON file with only a few changes to the data. This is typically accomplished using the json standard module. In this case, we will use the loads function from json to load the JSON data using a (multiline) string.

In [1]: from json import loads

In [2]: s = '''
   ...: {
   ...:   "firstName": "John",
   ...:   "lastName": "Smith",
   ...:   "isAlive": true,
   ...:   "age": 25,
   ...:   "address": {
   ...:     "streetAddress": "21 2nd Street",
   ...:     "city": "New York",
   ...:     "state": "NY",
   ...:     "postalCode": "10021-3100"
   ...:   },
   ...:   "phoneNumbers": {
   ...:     "home":   {
   ...:                 "type": "home",
   ...:                 "number": "212 555-1234"
   ...:               },
   ...:     "office":{
   ...:                "type": "office",
   ...:                "number": "646 555-4567"
   ...:              },
   ...:     "mobile":{
   ...:                "type": "mobile",
   ...:                "number": "123 456-7890"
   ...:              }
   ...:   },
   ...:   "children": ["Alice", "Ben"],
   ...:   "spouse": null
   ...: }'''
   ...: 

In [3]: data = loads(s)

In [4]: type(data)
Out[4]: dict

In [5]: data
Out[5]: 
{'address': {'city': 'New York',
  'postalCode': '10021-3100',
  'state': 'NY',
  'streetAddress': '21 2nd Street'},
 'age': 25,
 'children': ['Alice', 'Ben'],
 'firstName': 'John',
 'isAlive': True,
 'lastName': 'Smith',
 'phoneNumbers': {'home': {'number': '212 555-1234', 'type': 'home'},
  'mobile': {'number': '123 456-7890', 'type': 'mobile'},
  'office': {'number': '646 555-4567', 'type': 'office'}},
 'spouse': None}

Note that the only changes that needed to be made where

  1. changing the JavaScript true to True, and
  2. changing the Javascript null value to None

Furthermore, there was no change to the literal data structure sytnax, as Python’s list and dictionaries are similar enough to JavaScript’s arrays and objects that direct substitution works in this case.

5.5.2. Getting values from a nested data structure

The standard method for getting data from a nested data structure with nested applications of Python’s get syntax. For example, if we want to get the state from Johns address, we would use data['address']['state']. This pulls the address dictionary (data['address']) from data and then pulls the state (data['address']['state']) from that dictionary.

In [6]: data['address']['state']
Out[6]: 'NY'

Just as we have been using get as a common API for getting values from a single list or dictionary, the toolz package provides get_in for accessing data from nested data.

In [7]: from toolz import get_in

In [8]: get_in(['address', 'state'], data)
Out[8]: 'NY'

The code block below shows two more examples of using get_in to get data from this nested structure.

# Getting data three levels deep
In [9]: get_in(['phoneNumbers', 'mobile', 'number'], data)
Out[9]: '123 456-7890'

# Getting data from an embedded list
In [10]: get_in(['children',0], data)
Out[10]: 'Alice'

It is important to note that the default behavior of get_in is to return None for any missing data. You should either get in the habit of checking if a value is None or change the default to force an exception on missing data.

# Missing data return None
In [11]: get_in(['phoneNumbers', 'car', 'number'], data) is None
Out[11]: True

# Changing the default behavior to throw an exception
In [12]: get_in(['phoneNumbers', 'car', 'number'], data, no_default=True)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-56af9e5c377d> in <module>()
----> 1 get_in(['phoneNumbers', 'car', 'number'], data, no_default=True)

/Users/bn8210wy/.pyenv/versions/3.5.2/envs/runestone/lib/python3.5/site-packages/toolz/dicttoolz.py in get_in(keys, coll, default, no_default)
    309     """
    310     try:
--> 311         return reduce(operator.getitem, keys, coll)
    312     except (KeyError, IndexError, TypeError):
    313         if no_default:

KeyError: 'car'

5.5.3. Updating values in a nested data structure

If you want to update the data in a nested structure without mutating the data, use assoc_in from toolz, which will return a new data structure with the associated changes.

We would use the following expression to change John’s mobile number.

In [13]: from toolz import assoc_in

In [14]: assoc_in(data, ['phoneNumbers', 'car', 'number'], '507 867 5309')
Out[14]: 
{'address': {'city': 'New York',
  'postalCode': '10021-3100',
  'state': 'NY',
  'streetAddress': '21 2nd Street'},
 'age': 25,
 'children': ['Alice', 'Ben'],
 'firstName': 'John',
 'isAlive': True,
 'lastName': 'Smith',
 'phoneNumbers': {'car': {'number': '507 867 5309'},
  'home': {'number': '212 555-1234', 'type': 'home'},
  'mobile': {'number': '123 456-7890', 'type': 'mobile'},
  'office': {'number': '646 555-4567', 'type': 'office'}},
 'spouse': None}

Since we are not changing the original data, we need to make sure to save a reference to the new data structure, or we will miss the changes.

# The last change was not saved
In [15]: get_in(['phoneNumbers', 'mobile', 'number'], data)
Out[15]: '123 456-7890'

# save a reference to have access to the new data
In [16]: data1 = assoc_in(data, ['phoneNumbers', 'car', 'number'], '507 867 5309')

In [17]: data1 is data
Out[17]: False

In [18]: get_in(['phoneNumbers', 'car', 'number'], data1)
Out[18]: '507 867 5309'

Note

toolz provides update_in which is similar to assoc_in, but mutates the values in place. This function will can also be used to create the initial shell of a nested structure. See the toolz documentation.

Next Section - 5.6. Exercises