import pandas as pd
import plotly.express as px
Last time, we saw how to use the magic %%sql
command to run SQL code in a Jupyter notebook.
%load_ext sql
%%sql
sqlite:///data/basic_examples.db
%%sql
SELECT * FROM Dish;
* sqlite:///data/basic_examples.db Done.
name | type | cost |
---|---|---|
ravioli | entree | 10 |
ramen | entree | 7 |
taco | entree | 7 |
edamame | appetizer | 4 |
fries | appetizer | 4 |
potsticker | appetizer | 4 |
ice cream | dessert | 5 |
%%sql
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | MAX(cost) |
---|---|
appetizer | 4 |
dessert | 5 |
entree | 10 |
Let's see how we can use the sqlalchemy
library.
Unlike the %%sql
magic command, sqlalchemy creates a connection
objection that can be used as the second argument to the pandas.read_sql
| command. An example is shown below.
import sqlalchemy
import pandas as pd
# create a SQL Alchemy connection to the database
engine = sqlalchemy.create_engine("sqlite:///data/basic_examples.db")
connection = engine.connect()
# query is a multi-line Python string.
query = """
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;"""
pd.read_sql(query, connection)
type | MAX(cost) | |
---|---|---|
0 | appetizer | 4 |
1 | dessert | 5 |
2 | entree | 10 |
query
is a multi-line Python string:
query
'\nSELECT type, MAX(cost)\nFROM Dish\nGROUP BY type;'
This SQL to pandas workflow is handy since it lets you work with data as you're used to in DS100 so far.
import plotly.express as px
query = """
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;"""
# save query result to a DataFrame df
df = pd.read_sql(query, connection)
# then, visualize the data
px.bar(df, x = "type", y = "MAX(cost)")