Principles and Techniques of Data Science

UC Berkeley, Summer 2022

Anirudhan Badrinath

Anirudhan Badrinath

abadrinath@berkeley.edu

Dominic Liu

Dominic Liu

he/him

hangxingliu@berkeley.edu

Jump to current week: here.

  • Frequently Asked Questions: Before posting on the class Ed, please read the class FAQ page.
  • The Syllabus contains a detailed explanation of how each course component will work this summer, given that the course is being taught entirely online.
  • Textbook readings are optional and actively in development. See the Resources for more details.
  • Note: The schedule of lectures and assignments is subject to change.

Schedule

Week 1

Jun 21

Lecture 1 Course Overview, Sampling and Probability

Ch. 1, 2, 3.1

Lab 1 Prerequisite Coding (due Jun 27)

Lab 2 Pandas (due Jun 27)

Jun 22

Lecture 2 Pandas I

Ch. 6.1, 6.5

Textbook: Pandas Reference Table

Reference: Pandas API Documentation

Jun 23

Lecture 3 Pandas II

Ch. 6.2-6.4

Discussion 0 (Optional) Fundamentals

Solution, Recording

Discussion 1 Sampling and Probability, Pandas, code

Solution, Recording

Homework 1 Intro + Prerequisites (due Jun 27)

Jun 23

Exam Prep 1 Sampling and Probability, Pandas

Solution, Recording

Week 2

Jun 27

Lecture 4 Data Cleaning, EDA

Ch. 8-9

Weekly Check 2 Weekly Check 2

Lab 3 Data Cleaning and EDA (due Jul 2)

Lab 4 Transformations and KDE (due Jul 2)

Homework 2 Food Safety (due Jun 30)

Jun 28

Lecture 5 Regex

Ch. 13

Discussion 2 Pandas, Data Cleaning

Solution, Recording

Jun 29

Lecture 6 Visualization I

Ch. 11.1-11.3

Textbook: Seaborn Reference Table

Textbook: Matplotlib Reference Table

Jun 30

Lecture 7 Visualization II

Ch. 11.4-11.6

Discussion 3 Regex, Visualization

Solution, Recording

Homework 3 Tweets (due Jul 5)

Jul 1

Exam Prep 2 Pandas, Visualization, Regex

Solution, Recording

Catch-up section 1

Recording

Week 3

Jul 4

Independence Day

Weekly Check 3 Weekly Check 3

Lab 5 Modeling, Loss Functions, and Summary Statistics (due Jul 9)

Lab 6 Linear Regression (due Jul 9)

Homework 4 Bike Sharing (due Jul 7)

Jul 5

Lecture 8 Intro to Modeling, Simple Linear Regression

Ch. 15.1-15.2

Discussion 4 Modeling and Visualization

Solution, Recording

Jul 6

Lecture 9 Constant Model, Loss, and Transformations

Ch. 4

Jul 7

Lecture 10 Ordinary Least Squares (Multiple Linear Regression)

Ch. 15.3-15.4

Discussion 5 Linear Model and Loss Function

Solution, Recording

Homework 5 Regression (On paper) (due Jul 11)

Jul 8

Exam Prep 3 SLR & OLR

Solution, Recording

Catch-up section 2

Recording

Week 5

Jul 18

Midterm Midterm Exam

Reference Sheet

Weekly Check 5 Weekly Check 5

Lab 9 Probability and Modeling (due Jul 23)

Jul 19

Break (No Lecture)

Jul 20

Lecture 15 Probability I: Random Variables

Ch. 3.2-3.5, 16.3

Jul 21

Lecture 16 Probability II: Estimators, Bias, and Variance

Ch. 16.1, Ch. 16.4, 19.2

Discussion 8 Probability and Bias-Variance Trade-off

Solution,Recording

Jul 22

Exam Prep 5 CV, Probability, BVT

Solution, no recording

Catch-up section 4 TBD

Recording

Week 6

Jul 25

Lecture 17 SQL I

Ch. 7.1-7.2, 7.5

Weekly Check 6 Weekly Check 6

Lab 10 SQL (due Jul 30)

Lab 11 PCA (due Jul 30)

Homework 6 Probability and Estimators coding, written pdf, written latex (due Jul 28)

Jul 26

Lecture 18 SQL II and PCA I

Ch. 7.3-7.4

Discussion 9 BVT & SQL

Solution, Recording

Jul 27

Lecture 19 PCA II

Ch. 22

Jul 28

Break (No Lecture)

Discussion 10 SQL & PCA

Solution, Recording

Homework 7 SQL and PCA (due Aug 1)

Jul 29

Exam Prep 6 SQL & PCA

Solution, Recording

Catch-up section 5

Recording

Week 7

Aug 1

Lecture 20 Classification and Logistic Regression I

Ch. 19.1-19.3

Weekly Check 7 Weekly Check 7

Lab 12 Logistic Regression (due Aug 6)

Lab 13 Decision Trees & Random Forests (due Aug 6)

Proj 2A Spam I (due Aug 4)

Aug 2

Lecture 21 Logistic Regression II

Ch. 19.4-19.8

Discussion 11 Logistic Regression

Solution, Recording

Aug 3

Lecture 22 Decision Trees

Ch. 23

Aug 4

Lecture 23 Clustering

Ch. 24

Discussion 12 Decision Trees, Clustering

Solution, Recording

Proj 2B Spam II (due Aug 8)

Aug 5

Exam Prep 7 Classifier and Clustering

Solution, Recording

Catch-up section 6

Recording

Week 8

Aug 8

Weekly Check 8 Weekly Check 8

Topical Review Session

Aug 9

Topical Review Session

Aug 10

Topical Review Session

Aug 11

Optional Lecture Neural Networks

Review

Aug 12

Final Final Exam

Reference Sheet