Syllabus

This syllabus is still under development and is subject to change.

Week	Lecture	Date	Topic	Lab	Discussion	Homework
1	1	8/23/18	Course Overview, Data Design and Sources of Bias [slides] Demo Notebook (also on Datahub) Textbook: Data Science Life Cycle Textbook: Data Design Screencast		Disc0 Solutions	HW0 Released HW0 Solutions
2	2	8/28/18	Data Manipulation with Pandas I [slides] Textbook: Tabular Data Demo Notebook (also on Datahub) Pandas Basics (HTML) Case Study (HTML) Screencast	Lab1 Solutions	Disc1 Solutions
2	3	8/30/18	Data Manipulation with Pandas II [slides] Demo Notebook on Datahub Case Study (HTML) Enrollment Exercise (HTML) Groupby & Pivot (HTML) Screencast
3	4	9/4/18	Data Cleaning & EDA [slides] Textbook: Data Cleaning Demo Notebook (ZIP) Groupby Pivot and Merge (HTML) Screencast	Lab2 Solutions	Lab2	HW0 Due, HW1 Released HW1 Solutions
3	5	9/6/18	EDA and Visualization [slides] Textbook: EDA EDA_and_Cleaning notebook (HTML) code and data (includes notebooks and scripts as needed) Screencast
4	6	9/11/18	Visualization and Data Transformations [slides] Textbook: Data Visualization Screencast	Lab3 Solutions	Disc3 Solutions TA Slides
4	7	9/13/18	Working with Text [slides] Textbook: Working With Text Screencast code and data (includes notebooks and scripts as needed)			HW1 Due, HW2 Released HW2 Solutions
5	8	9/18/18	Modeling and Estimation [slides] Textbook: Modeling and Estimation Estimation notebook (HTML Version) convex-functions notebook (HTML Version) Screencast code and data (includes notebooks and scripts as needed)	Lab4 Solutions	Disc4 Solutions TA Slides
5	9	9/20/18	Modeling and Estimation II [slides] Textbook: Gradient Descent Demo Notebook HTML Version Screencast
6	10	9/25/18	Generalization and Empirical Risk Minimization [slides] Textbook: Probability and Generalization Screencast	Lab5 Solutions	Disc5 Solutions	HW2 Due, HW3 Released HW3 Solutions
6	11	9/27/18	Linear Regression and Feature Engineering [slides] Textbook: Linear Regression Textbook: Feature Engineering Notebook (HTML) Notebook (zip) Screencast
7	12	10/2/18	Bias-Variance Tradeoff and Regularization [slides] Textbook: Bias-Variance Tradeoff Notebook (HTML) Notebook (zip) Screencast	Lab6 Solutions	Disc6 Solutions
7	13	10/4/18	Cross-Validation and Regularization [slides] Textbook: Regularization Textbook: Cross-Validation Bias-Variance and Regularization Notebook (HTML Version) Feature Engineering Part 1 Notebook (HTML Version) Feature_Engineering Part 2 Notebook (HTML Version) Make Toy Data Notebook (HTML Version) Screencast
8	14	10/9/18	Ethics [slides] Screencast	Lab7 Solutions	Disc7 Solutions	HW3 Due, Proj1 Released Proj1 Solutions
8	15	10/11/18	Midterm Review Part 1 [slides] Screencast
9	16	10/16/18	Midterm Review Part 2 [slides] Screencast	Midterm Review (Lab8)	Midterm OH
9	17	10/18/18	Classification and Logistic Regression I [slides] Textbook: Classification Extra Plots notebook (HTML Version) Logistic Regression Part 1 notebook (HTML Version) Logistic Regression Part 2 notebook (HTML Version) Notebook (zip) Screencast
10	18	10/23/18	Classification and Logistic Regression II [slides] Screencast	Project 1 OH	Disc8 Solutions
10	19	10/25/18	Probability theory, Monte Carlo, Bootstrapping [slides] Central Limit Theorem notebook PRNG notebook Restaurant Estimation notebook Screencast			Proj1 Due, HW4 Released HW4 Solutions
11	20	10/30/18	Hypothesis Testing I [slides] Textbook: Statistical Inference Notebook (ipynb) Notebook (html) Screencast	Lab9 Solutions	Disc9 Solutions
11	21	11/1/18	Numerical issues, condition numbers, higher dimensions Screencast Notebooks (HTML) KL Divergence Numerical Chaos Monte Carlo ND Condition Number Volumes in ND
12	22	11/6/18	SQL [slides] Textbook: SQL Notebook (ipynb) Notebook (html) Screencast	Lab10 Solutions	Disc10 Solutions
12	23	11/8/18	Advanced SQL [slides] Notebook (ipynb) Notebook (html) Screencast			HW4 Due, HW5 Released HW5 Solutions
13	24	11/13/18	Big Data [slides] Slides in PPT Format Spark Demo (HTML) Spark Demo (ipynb) Screencast	Lab11 Solutions	Lab11	Proj2A Released, Grad Project Released Proj2A Solutions
13	25	11/15/18	Distributed Computing [slides] Screencast Ray Documentation Notebook
14	26	11/20/18	A/B Testing [slides] Screencast Demo Notebook (HTML)	Project 2 OH	Break	HW6 Released, Proj2B Released HW6 Solutions Proj2B Solutions
14	27	11/22/18	Thanksgiving Break
15	28	11/27/18	Data Commons [slides] Screencast	Lab12	Lab12	HW5 Due
15	29	11/29/18	Conclusion [slides] Screencast			Proj2A Due
16	30	12/4/18	RRR week [slides] Screencast			Proj2B Due
16	31	12/6/18	RRR week Screencast			HW6 Due, Grad Project Due
17	32	12/11/18
17	33	12/13/18	Final Exam (11:30am-2:30pm)

Syllabus

Course Overview, Data Design and Sources of Bias [slides]

Data Manipulation with Pandas I [slides]

Data Manipulation with Pandas II [slides]

Data Cleaning & EDA [slides]

EDA and Visualization [slides]

Visualization and Data Transformations [slides]

Working with Text [slides]

Modeling and Estimation [slides]

Modeling and Estimation II [slides]

Generalization and Empirical Risk Minimization [slides]

Linear Regression and Feature Engineering [slides]

Bias-Variance Tradeoff and Regularization [slides]

Cross-Validation and Regularization [slides]

Ethics [slides]

Midterm Review Part 1 [slides]

Midterm Review Part 2 [slides]

Classification and Logistic Regression I [slides]

Classification and Logistic Regression II [slides]

Probability theory, Monte Carlo, Bootstrapping [slides]

Hypothesis Testing I [slides]

Numerical issues, condition numbers, higher dimensions

SQL [slides]

Advanced SQL [slides]

Big Data [slides]

Distributed Computing [slides]

A/B Testing [slides]

Thanksgiving Break

Data Commons [slides]

Conclusion [slides]

RRR week [slides]

RRR week

Final Exam (11:30am-2:30pm)