Working With JSON

In this notebook we review some basic python functionality for working with JSON data.

Python includes a json library which provides basic JSON functionality:

In [1]:
import json

Define a Python Object

In the following we create a python list and then save that dictionary as JSON:

In [2]:
data = [
  {
    "Prof": "Gonzalez",
    "Classes": [
      "CS186", 
      { "Name": "Data100", "Year": [2017,2018] }
    ],
    "Tenured": False
  },
  {
    "Prof": "Nolan",
    "Classes": [
      "Stat133", "Stat153", "Stat198", "Data100"
    ],
    "Tenured": True
  }
]
data
Out[2]:
[{'Classes': ['CS186', {'Name': 'Data100', 'Year': [2017, 2018]}],
  'Prof': 'Gonzalez',
  'Tenured': False},
 {'Classes': ['Stat133', 'Stat153', 'Stat198', 'Data100'],
  'Prof': 'Nolan',
  'Tenured': True}]

Saving a Python Object as a JSON String

In [3]:
json_str = json.dumps(data, indent=2)
print(json_str)
[
  {
    "Prof": "Gonzalez",
    "Classes": [
      "CS186",
      {
        "Name": "Data100",
        "Year": [
          2017,
          2018
        ]
      }
    ],
    "Tenured": false
  },
  {
    "Prof": "Nolan",
    "Classes": [
      "Stat133",
      "Stat153",
      "Stat198",
      "Data100"
    ],
    "Tenured": true
  }
]

Saving a Python Object as JSON file

In [4]:
with open("bla.json", "w") as f:
    json.dump(data, f, indent=2)
In [5]:
from utils import head
head("bla.json", lines=100)
Out[5]:
['[\n',
 '  {\n',
 '    "Prof": "Gonzalez",\n',
 '    "Classes": [\n',
 '      "CS186",\n',
 '      {\n',
 '        "Name": "Data100",\n',
 '        "Year": [\n',
 '          2017,\n',
 '          2018\n',
 '        ]\n',
 '      }\n',
 '    ],\n',
 '    "Tenured": false\n',
 '  },\n',
 '  {\n',
 '    "Prof": "Nolan",\n',
 '    "Classes": [\n',
 '      "Stat133",\n',
 '      "Stat153",\n',
 '      "Stat198",\n',
 '      "Data100"\n',
 '    ],\n',
 '    "Tenured": true\n',
 '  }\n',
 ']']

Loading a JSON Object from a String

In [6]:
obj = json.loads(json_str)
obj
Out[6]:
[{'Classes': ['CS186', {'Name': 'Data100', 'Year': [2017, 2018]}],
  'Prof': 'Gonzalez',
  'Tenured': False},
 {'Classes': ['Stat133', 'Stat153', 'Stat198', 'Data100'],
  'Prof': 'Nolan',
  'Tenured': True}]

Loading a JSON Object from a File

In [7]:
with open("bla.json", "r") as f:
    obj = json.load(f)
    
obj
Out[7]:
[{'Classes': ['CS186', {'Name': 'Data100', 'Year': [2017, 2018]}],
  'Prof': 'Gonzalez',
  'Tenured': False},
 {'Classes': ['Stat133', 'Stat153', 'Stat198', 'Data100'],
  'Prof': 'Nolan',
  'Tenured': True}]

Traversing a Python/JSON Object

In [8]:
type(obj)
Out[8]:
list
In [9]:
len(obj)
Out[9]:
2
In [10]:
first_obj = obj[0]
In [11]:
first_obj.keys()
Out[11]:
dict_keys(['Prof', 'Classes', 'Tenured'])

Building a DataFrame

We could build the dataframe by constructing one field at a time:

In [12]:
import pandas as pd
df = pd.DataFrame()
df['Names'] = [p['Prof'] for p in obj]
df['Tenured'] = [p['Tenured'] for p in obj]
df
Out[12]:
Names Tenured
0 Gonzalez False
1 Nolan True

Notice things get tricky with irregular nesting ...

In [13]:
import pandas as pd
df = pd.DataFrame()
df['Names'] = [p['Prof'] for p in obj]
df['Tenured'] = [p['Tenured'] for p in obj]
df['Classes'] = [p['Classes'] for p in obj]
df
Out[13]:
Names Tenured Classes
0 Gonzalez False [CS186, {'Name': 'Data100', 'Year': [2017, 201...
1 Nolan True [Stat133, Stat153, Stat198, Data100]

Pandas from Object List

In [14]:
pd.DataFrame(obj)
Out[14]:
Classes Prof Tenured
0 [CS186, {'Name': 'Data100', 'Year': [2017, 201... Gonzalez False
1 [Stat133, Stat153, Stat198, Data100] Nolan True