SQL continued¶

In [1]:
import pandas as pd
import plotly.express as px

Introducing sqlalchemy¶

Last time, we saw how to use the magic %%sql command to run SQL code in a Jupyter notebook.

In [2]:
%load_ext sql
In [3]:
%%sql
sqlite:///data/basic_examples.db
In [4]:
%%sql
SELECT * FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[4]:
name type cost
ravioli entree 10
ramen entree 7
taco entree 7
edamame appetizer 4
fries appetizer 4
potsticker appetizer 4
ice cream dessert 5
In [5]:
%%sql
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[5]:
type MAX(cost)
appetizer 4
dessert 5
entree 10

Let's see how we can use the sqlalchemy library.

Unlike the %%sql magic command, sqlalchemy creates a connection objection that can be used as the second argument to the pandas.read_sql| command. An example is shown below.

In [6]:
import sqlalchemy
import pandas as pd

# create a SQL Alchemy connection to the database
engine = sqlalchemy.create_engine("sqlite:///data/basic_examples.db")
connection = engine.connect()

# query is a multi-line Python string.
query = """
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;"""

pd.read_sql(query, connection)
Out[6]:
type MAX(cost)
0 appetizer 4
1 dessert 5
2 entree 10

query is a multi-line Python string:

In [7]:
query
Out[7]:
'\nSELECT type, MAX(cost)\nFROM Dish\nGROUP BY type;'

This SQL to pandas workflow is handy since it lets you work with data as you're used to in DS100 so far.

In [8]:
import plotly.express as px

query = """
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;"""

# save query result to a DataFrame df
df = pd.read_sql(query, connection) 

# then, visualize the data
px.bar(df, x = "type", y = "MAX(cost)")