Starting Up SQL¶
Before we look at SQL syntax in detail, let's first get ourselves set up to run SQL queries in Jupyter.
Approach #1: SQL Magic¶
1. Load the sql
Module.
Load %%sql
cell magic.
%load_ext sql
2. Connect to a database.
Here, we connect to the SQLite database basic_examples.db
.
%sql duckdb:///data/example_duck.db --alias duck
If you were connecting to an "enterprise data platform"
from sqlalchemy import create_engine
snow_engine = create_engine(
f"snowflake://{user}:{password}@{account_identifier}")
%sql snow_engine --alias snow
db_engine = create_engine(
url = f"databricks://token:{access_token}@{server_hostname}?" +
f"http_path={http_path}&catalog={catalog}&schema={schema}"
)
%sql db_engine --alias db
3. Run a simple SQL query.
Note the %%sql
lets Jupyter parse the rest of the lines as a SQL command.
%%sql
SELECT * FROM Dragon;
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
puff | 2010 | 100 |
smaug | 2011 | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Simple query, this time on two different lines.
%%sql
SELECT *
FROM Dragon;
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
puff | 2010 | 100 |
smaug | 2011 | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Approach #3: pd.read_sql
¶
It turns out that pandas
has a special-purpose function to parse SQL queries. We can pass in a SQL query as a string to return a pandas
DataFrame. To achieve the same result as we did using cell magic above, we can do the following.
1. Connect to a database
import sqlalchemy
import pandas as pd
engine = sqlalchemy.create_engine("duckdb:///data/example_duck.db")
2. Run a simple SQL query
query = """
SELECT *
FROM Dragon;
"""
df = pd.read_sql(query, engine)
df
name | year | cute | |
---|---|---|---|
0 | hiccup | 2010 | 10.0 |
1 | drogon | 2011 | -100.0 |
2 | dragon 2 | 2019 | 0.0 |
3 | puff | 2010 | 100.0 |
4 | smaug | 2011 | NaN |
Approach "3" -- Duck DB Special¶
Now that we are using DuckDB we can do something extra crazy:
import seaborn as sns
mpg = sns.load_dataset("mpg")
%%sql
SELECT * FROM mpg
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name |
---|---|---|---|---|---|---|---|---|
18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
15.0 | 8 | 429.0 | 198.0 | 4341 | 10.0 | 70 | usa | ford galaxie 500 |
14.0 | 8 | 454.0 | 220.0 | 4354 | 9.0 | 70 | usa | chevrolet impala |
14.0 | 8 | 440.0 | 215.0 | 4312 | 8.5 | 70 | usa | plymouth fury iii |
14.0 | 8 | 455.0 | 225.0 | 4425 | 10.0 | 70 | usa | pontiac catalina |
15.0 | 8 | 390.0 | 190.0 | 3850 | 8.5 | 70 | usa | amc ambassador dpl |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 10
If you want to see more, please visit displaylimit configuration
That is right!! DuckDB can also see my dataframes in the python environment allowing me to do dataframe manipulation in SQL!
Return to Lecture
Tables and Schema¶
A database contains a collection of SQL tables. Let's connect to our "toy" database basic_examples.db
and explore the tables it stores.
%%sql
SELECT * FROM information_schema.tables
table_catalog | table_schema | table_name | table_type | self_referencing_column_name | reference_generation | user_defined_type_catalog | user_defined_type_schema | user_defined_type_name | is_insertable_into | is_typed | commit_action | TABLE_COMMENT |
---|---|---|---|---|---|---|---|---|---|---|---|---|
example_duck | main | assignment | BASE TABLE | None | None | None | None | None | YES | NO | None | None |
example_duck | main | dish | BASE TABLE | None | None | None | None | None | YES | NO | None | None |
example_duck | main | dragon | BASE TABLE | None | None | None | None | None | YES | NO | None | None |
example_duck | main | grade | BASE TABLE | None | None | None | None | None | YES | NO | None | None |
example_duck | main | scene | BASE TABLE | None | None | None | None | None | YES | NO | None | None |
example_duck | main | student | BASE TABLE | None | None | None | None | None | YES | NO | None | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT * FROM information_schema.columns
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | domain_catalog | domain_schema | domain_name | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximum_cardinality | dtd_identifier | is_self_referencing | is_identity | identity_generation | identity_start | identity_increment | identity_maximum | identity_minimum | identity_cycle | is_generated | generation_expression | is_updatable | COLUMN_COMMENT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
example_duck | main | assignment | assignment_id | 1 | None | NO | INTEGER | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | assignment | description | 2 | None | YES | VARCHAR | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | dish | name | 1 | None | NO | VARCHAR | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | dish | type | 2 | None | YES | VARCHAR | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | dish | cost | 3 | None | YES | INTEGER | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | dragon | name | 1 | None | NO | VARCHAR | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | dragon | year | 2 | None | YES | INTEGER | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | dragon | cute | 3 | None | YES | INTEGER | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | grade | student_id | 1 | None | YES | INTEGER | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
example_duck | main | grade | assignment_id | 2 | None | YES | INTEGER | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Truncated to displaylimit of 10
If you want to see more, please visit displaylimit configuration
Getting Schema information with SQLAlchemy¶
How you list the tables varies across database platforms. For example, the statement:
SELECT * FROM information_schema.columns
only works on Postgres compatible databases.
For example, if we wanted to get the schema for tables in sqlite we would need the following:
pd.read_sql("SELECT * FROM sqlite_schema", "sqlite:///data/basic_examples.db")
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | sqlite_sequence | sqlite_sequence | 7 | CREATE TABLE sqlite_sequence(name,seq) |
1 | table | Dragon | Dragon | 2 | CREATE TABLE Dragon (\n name TEXT PRIMARY K... |
2 | index | sqlite_autoindex_Dragon_1 | Dragon | 3 | None |
3 | table | Dish | Dish | 4 | CREATE TABLE Dish (\n name TEXT PRIMARY KEY... |
4 | index | sqlite_autoindex_Dish_1 | Dish | 5 | None |
5 | table | Scene | Scene | 6 | CREATE TABLE Scene (\n id INTEGER PRIMARY K... |
Fortunately, SQLAlchemy has some generic tools that will be helpful regardless of what database platform you use.
from sqlalchemy import inspect
inspector = inspect(engine)
inspector.get_table_names()
['assignment', 'dish', 'dragon', 'grade', 'scene', 'student']
inspector.get_columns('scene')
[{'name': 'id', 'type': INTEGER(), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}, {'name': 'biome', 'type': VARCHAR(), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}, {'name': 'city', 'type': VARCHAR(), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}, {'name': 'visitors', 'type': INTEGER(), 'nullable': True, 'default': None, 'autoincrement': False, 'comment': None}, {'name': 'created_at', 'type': TIMESTAMP(), 'nullable': True, 'default': 'current_date()', 'autoincrement': False, 'comment': None}]
Same with SQLite
sqlite_engine = sqlalchemy.create_engine("sqlite:///data/basic_examples.db")
inspect(sqlite_engine).get_columns("scene")
[{'name': 'id', 'type': INTEGER(), 'nullable': True, 'default': None, 'primary_key': 1}, {'name': 'biome', 'type': TEXT(), 'nullable': False, 'default': None, 'primary_key': 0}, {'name': 'city', 'type': TEXT(), 'nullable': False, 'default': None, 'primary_key': 0}, {'name': 'visitors', 'type': INTEGER(), 'nullable': True, 'default': None, 'primary_key': 0}, {'name': 'created_at', 'type': DATETIME(), 'nullable': True, 'default': "DATETIME('now')", 'primary_key': 0}]
Example of table creation with interesting constraints
More advanced example of creating tables with primary and foreign key constraints:
%%sql
DROP TABLE IF EXISTS grade;
DROP TABLE IF EXISTS assignment;
DROP TABLE IF EXISTS student;
CREATE TABLE student (
student_id INTEGER PRIMARY KEY,
name VARCHAR,
email VARCHAR
);
CREATE TABLE assignment (
assignment_id INTEGER PRIMARY KEY,
description VARCHAR
);
CREATE TABLE grade (
student_id INTEGER,
assignment_id INTEGER,
score REAL CHECK (score > 0 AND score <= 100),
FOREIGN KEY (student_id) REFERENCES student(student_id),
FOREIGN KEY (assignment_id) REFERENCES assignment(assignment_id)
);
INSERT INTO student VALUES
(123, 'JoeyG', 'jegonzal@berkeley.edu'),
(456, 'NargesN', 'norouzi@berkeley.edu');
INSERT INTO assignment VALUES
(1, 'easy assignment'),
(2, 'hard assignment');
%%sql
INSERT INTO grade VALUES
(123, 1, 80),
(123, 2, 42),
(456, 2, 100);
%sql SELECT * FROM grade;
student_id | assignment_id | score |
---|---|---|
123 | 1 | 80.0 |
123 | 2 | 42.0 |
456 | 2 | 100.0 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Basic Queries¶
Every SQL query must contain a SELECT
and FROM
clause.
SELECT
: specify the column(s) to return in the outputFROM
: specify the database table from which to extract data
%%sql
SELECT * FROM Dragon;
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
puff | 2010 | 100 |
smaug | 2011 | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT cute, year FROM Dragon;
cute | year |
---|---|
10 | 2010 |
-100 | 2011 |
0 | 2019 |
100 | 2010 |
None | 2011 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Aliasing with AS
%%sql
SELECT cute AS cuteness,
year AS "birth year"
FROM Dragon;
cuteness | birth year |
---|---|
10 | 2010 |
-100 | 2011 |
0 | 2019 |
100 | 2010 |
None | 2011 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Uniqueness with DISTINCT
%%sql
SELECT DISTINCT year
FROM Dragon;
year |
---|
2010 |
2019 |
2011 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Filtering with WHERE
%%sql
SELECT name, year
FROM Dragon
WHERE cute > 0;
name | year |
---|---|
hiccup | 2010 |
puff | 2010 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT name, cute, year
FROM Dragon
WHERE cute > 0 OR year > 2013;
name | cute | year |
---|---|---|
hiccup | 10 | 2010 |
puff | 100 | 2010 |
dragon 2 | 0 | 2019 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT name, year
FROM Dragon
WHERE name IN ('puff', 'hiccup');
name | year |
---|---|
puff | 2010 |
hiccup | 2010 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT name, cute
FROM Dragon
WHERE cute IS NOT NULL;
name | cute |
---|---|
hiccup | 10 |
drogon | -100 |
dragon 2 | 0 |
puff | 100 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Ordering data using ORDER BY
%%sql
SELECT *
FROM Dragon
ORDER BY cute DESC;
name | year | cute |
---|---|---|
puff | 2010 | 100 |
hiccup | 2010 | 10 |
dragon 2 | 2019 | 0 |
drogon | 2011 | -100 |
smaug | 2011 | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Restricting output with LIMIT
and OFFSET
%%sql
SELECT *
FROM Dragon
LIMIT 2;
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
name | year | cute |
---|---|---|
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
What if we wanted a random sample:
%%sql
SELECT *
FROM Dragon
ORDER BY RANDOM()
LIMIT 2
name | year | cute | |
---|---|---|---|
0 | drogon | 2011 | -100 |
1 | dragon 2 | 2019 | 0 |
%%sql
SELECT *
FROM Dragon USING SAMPLE reservoir(2 ROWS) REPEATABLE (100);
name | year | cute | |
---|---|---|---|
0 | puff | 2010 | 100 |
1 | drogon | 2011 | -100 |
Grouping Data with GROUP BY
¶
%%sql
SELECT *
FROM Dish;
name | type | cost |
---|---|---|
ravioli | entree | 10 |
ramen | entree | 13 |
taco | entree | 7 |
edamame | appetizer | 4 |
fries | appetizer | 4 |
potsticker | appetizer | 4 |
ice cream | dessert | 5 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
A small note: the fact that type
is highlighted in green below is a consequence of Jupyter assuming that we are writing Python code (where type
is a built-in keyword). type
does not have a special meaning in SQL, so the color below does not indicate any special functionality. When we run the cell, Jupyter realizes it should recognize the code as SQL.
%%sql
SELECT type
FROM Dish;
type |
---|
entree |
entree |
entree |
appetizer |
appetizer |
appetizer |
dessert |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT type
FROM Dish
GROUP BY type;
type |
---|
entree |
dessert |
appetizer |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT type, SUM(cost)
FROM Dish
GROUP BY type;
type | sum("cost") |
---|---|
entree | 30 |
dessert | 5 |
appetizer | 12 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT type,
SUM(cost),
MIN(cost),
MAX(name)
FROM Dish
GROUP BY type;
type | sum("cost") | min("cost") | max("name") |
---|---|---|---|
entree | 30 | 7 | taco |
dessert | 5 | 5 | ice cream |
appetizer | 12 | 4 | potsticker |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT year, COUNT(cute)
FROM Dragon
GROUP BY year;
year | count(cute) |
---|---|
2010 | 2 |
2011 | 1 |
2019 | 1 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
%%sql
SELECT year, COUNT(*)
FROM Dragon
GROUP BY year;
year | count_star() |
---|---|
2010 | 2 |
2011 | 2 |
2019 | 1 |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
Working with the sql
results as Python variables¶
By default, executing a query with a magic command produces output but doesn't save it into any Python variable:
%sql SELECT * FROM Dragon
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
puff | 2010 | 100 |
smaug | 2011 | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
This can be inconvenient if you later want to do further processing of these data in Python.
Storing one-line %sql
queries¶
For simple one-line queries, you can use IPython's ability to store the result of a magic command like %sql
as if it were any other Python statement, and save the output to a variable:
dragon_table = %sql SELECT * FROM Dragon
dragon_table
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
puff | 2010 | 100 |
smaug | 2011 | None |
ResultSet
: to convert to pandas, call .DataFrame()
or to polars, call .PolarsDataFrame()
As noted above, the result of the query is a Python variable of type ResultSet
, more specifically:
type(dragon_table)
sql.run.ResultSet
You need to manually convert it to a Pandas DataFrame if you want to do pandas-things with its content:
dragon_df = dragon_table.DataFrame()
dragon_df
name | year | cute | |
---|---|---|---|
0 | hiccup | 2010 | 10.0 |
1 | drogon | 2011 | -100.0 |
2 | dragon 2 | 2019 | 0.0 |
3 | puff | 2010 | 100.0 |
4 | smaug | 2011 | NaN |
You can configure jupysql
to automatically convert all outputs to Pandas DataFrames. This can be handy if you intend all your Python-side work to be done with Pandas, as it saves you from manually having to call .DataFrame()
first on all outputs. On the other hand, you don't get access to the original SQL ResultSet
object, which have a number of interesting properties and capabilities. You can learn more about those in the jupysql documentation.
For now, let's turn this on so you can see how this simplified, "pandas all the way" worfklow looks like:
%config SqlMagic.autopandas = True
dragon_df = %sql SELECT * FROM Dragon
dragon_df
name | year | cute | |
---|---|---|---|
0 | hiccup | 2010 | 10.0 |
1 | drogon | 2011 | -100.0 |
2 | dragon 2 | 2019 | 0.0 |
3 | puff | 2010 | 100.0 |
4 | smaug | 2011 | NaN |
type(dragon_df)
pandas.core.frame.DataFrame
Storing multi-line %%sql
queries¶
For a more complex query that won't fit in one line, such as for example:
%%sql
SELECT year, COUNT(*)
FROM Dragon
GROUP BY year;
year | count_star() | |
---|---|---|
0 | 2010 | 2 |
1 | 2011 | 2 |
2 | 2019 | 1 |
You can use the variable <<
syntax in jupysql to store its output (this will honor your autopandas
state and store either a sql.run.ResultState
or a Pandas DataFrame
):
%%sql dragon_years <<
SELECT year, COUNT(*)
FROM Dragon
GROUP BY year;
dragon_years
year | count_star() | |
---|---|---|
0 | 2010 | 2 |
1 | 2011 | 2 |
2 | 2019 | 1 |
More Adavnced SQL with DuckDB¶
You could do a lot what we do in this class just using Duck DB and the many duck db functions:
from ds100_utils import fetch_and_cache
url = "https://gist.github.com/domoritz/fd517a3a3a210c24a488e61870e2cf2c/raw/b1d53719e8e0eb9f6a95de82fdaccf0b001c0dea/flights-1m.parquet"
fetch_and_cache(url,"flights.parquet")
Using cached version that was downloaded (UTC): Tue Apr 2 03:41:25 2024
PosixPath('data/flights.parquet')
%%sql
SELECT * FROM 'data/flights.parquet' LIMIT 10;
FL_DATE | DEP_DELAY | ARR_DELAY | AIR_TIME | DISTANCE | DEP_TIME | ARR_TIME | |
---|---|---|---|---|---|---|---|
0 | 2006-01-01 | 5 | 19 | 350 | 2475 | 9.083333 | 12.483334 |
1 | 2006-01-02 | 167 | 216 | 343 | 2475 | 11.783334 | 15.766666 |
2 | 2006-01-03 | -7 | -2 | 344 | 2475 | 8.883333 | 12.133333 |
3 | 2006-01-04 | -5 | -13 | 331 | 2475 | 8.916667 | 11.950000 |
4 | 2006-01-05 | -3 | -17 | 321 | 2475 | 8.950000 | 11.883333 |
5 | 2006-01-06 | -4 | -32 | 320 | 2475 | 8.933333 | 11.633333 |
6 | 2006-01-08 | -3 | -2 | 346 | 2475 | 8.950000 | 12.133333 |
7 | 2006-01-09 | 3 | 0 | 334 | 2475 | 9.050000 | 12.166667 |
8 | 2006-01-10 | -7 | -21 | 334 | 2475 | 8.883333 | 11.816667 |
9 | 2006-01-11 | 8 | -10 | 321 | 2475 | 9.133333 | 12.000000 |
%%sql avg_delays <<
SELECT
dayname(fl_date) AS "Day of the Week",
mean(dep_delay) AS "Mean Departure Delay",
FROM 'data/flights.parquet'
GROUP BY "Day of the Week"
import plotly.express as px
px.bar(avg_delays, x="Day of the Week", y = "Mean Departure Delay",
category_orders={"Day of the Week": ["Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday"]})