Principles and Techniques of Data Science

UC Berkeley, Spring 2022

Lecture Zoom Discussion Sign-Up/Zoom Office Hour/Lab Help

Jump to current week: here.

  • Lecture is hybrid: in-person in Li Ka Shing 245 and online via Zoom (see link above). Recordings will be posted within 12 hours of live lecture.
  • Frequently Asked Questions: Before posting on the class Ed, please read the class FAQ page.
  • Join Ed: here.
  • Textbook readings are optional and actively in development. See the Resources for more details.
  • Note: The schedule of lectures and assignments is subject to change.



Schedule

Week 1

Jan 18

Lecture 1 Course Overview

Ch. 1

Weekly Check 1 Weekly Check 1 (due Jan 24)

Jan 20

Lecture 2 Sampling and Probability

Ch. 2, 3.1

Jan 21

Discussion 1 Intro (solutions) (recording)

Lab 1 Prerequisite Coding (due Jan 25)

Homework 1 Intro + Prerequisites (due Jan 27)

Week 2

Jan 24

Weekly Check 2 Weekly Check 2 (due Jan 31)

Jan 25

Lecture 3 Pandas I

Ch. 6.1, 6.5

Textbook: Pandas Reference Table

Reference: Pandas API Documentation

Jan 27

Lecture 4 Pandas II

Ch. 6.2-6.4

Jan 28

Discussion 2 Sampling and Probability, Pandas (code) (solutions) (recording)

Lab 2 Pandas (due Feb 1)

Homework 2 Food Safety (due Feb 3)

Week 3

Jan 31

Weekly Check 3 Weekly Check 3 (due Feb 7)

Feb 1

Lecture 5 Data Wrangling, EDA

Ch. 8-9

Feb 3

Lecture 6 Regex

Ch. 12

Feb 4

Discussion 3 Pandas, Data Cleaning (code) (solutions) (recording)

Lab 3 Data Cleaning (due Feb 8)

Homework 3 Tweets (due Feb 10)

Week 4

Feb 7

Weekly Check 4 Weekly Check 4 (due Feb 14)

Feb 8

Lecture 7 Visualization I

Ch. 10.1-10.3

Textbook: Seaborn Reference Table

Textbook: Matplotlib Reference Table

Feb 10

Lecture 8 Visualization II

Feb 11

Discussion 4 Regex, Visualization (solutions) (recording)

Lab 4 Transformations and KDE (due Feb 15)

Homework 4 Bike Sharing (due Feb 17)

Week 6

Feb 21

Weekly Check 6 Weekly Check 6 (due Feb 28)

Feb 22

Lecture 11 Ordinary Least Squares (Multiple Linear Regression)

Ch. 15.4, 19.1

Feb 24

Midterm Midterm 1 (8-10 pm) (No Lecture)

Feb 25

Discussion 6 Ordinary Least Squares (solutions) (recording)

Lab 6 Ordinary Least Squares (due Mar 1)

Week 7

Feb 28

Weekly Check 7 Weekly Check 7 (due Mar 7)

Mar 1

Lecture 12 Gradient Descent, sklearn

Ch. 17

Mar 3

Lecture 13 Feature Engineering

Ch. 20

Mar 4

Discussion 7 Human Contexts in Engineering and Feature Engineering (solutions) (recording)

Lab 7 Gradient Descent and sklearn (due Mar 8)

Proj 1A Housing I (due Mar 10)

Week 8

Mar 7

Weekly Check 8 Weekly Check 8 (due Mar 14)

Mar 8

Lecture 14 Case Study (HCE): Fairness in Housing Appraisal

Mar 10

Lecture 15 Cross-Validation and Regularization

Ch. 22, 21.3

Mar 11

Discussion 8 HCE Part 2, Regularization (Budget Fact Sheet) (solutions) (recording)

Lab 8 Model Selection, Regularization, and Cross-Validation (due Mar 15)

Proj 1B Housing II (due Mar 17 Mar 18, Ed post)

Week 9

Mar 14

Weekly Check 9 Weekly Check 9 (due Mar 28)

Mar 15

Lecture 16 Probability I: Random Variables

Ch. 3.2-3.5, 16.1

Mar 17

Lecture 17 Probability II: Estimators, Bias, and Variance

Ch. 16.2-16.3, 19.2

Mar 18

Discussion 9 Cross-Validation + Probability I (solutions) (recording)

Lab 9 Probability and Modeling (due Mar 29)

Homework 6 Probability and Estimators (due Mar 31)

Spring Break

Mar 22

Spring Break

Mar 24

Spring Break

Week 10

Mar 28

Weekly Check 10 Weekly Check 10 (due Apr 4)

Mar 29

Lecture 18 SQL I

Ch. 7.1-7.2, 7.5

Mar 31

Lecture 19 SQL II and PCA

Ch. 7.3-7.4

Apr 1

Discussion 10 Probability II + SQL I (solutions) (recording)

Lab 10 SQL (due Apr 5Apr 6)

Homework 7 SQL and PCA (due Apr 14)

Apr 2

Grad Project Grad Project Released

Week 11

Apr 4

Weekly Check 11 Weekly Check 11 (due Apr 11)

Apr 5

Lecture 20 PCA II

Ch. 26

Apr 7

Midterm Midterm 2 (7-8:30 pm) (No Lecture)

Apr 8

Discussion 11 SQL II + PCA (solutions) (Cancelled, No Live Discussion)

Lab 11 PCA (due Apr 12)

Week 12

Apr 11

Weekly Check 12 Weekly Check 12 (due Apr 18)

Apr 12

Lecture 21 Classification and Logistic Regression I

Ch. 24.1-24.3

Apr 14

Lecture 22 Logistic Regression II

Ch. 24.4-24.8

Apr 15

Discussion 12 Logistic Regression I (solutions) (recording)

Lab 12 Logistic Regression (due Apr 19)

Proj 2A Spam & Ham I (due Apr 21 Apr 22)

Week 13

Apr 18

Weekly Check 13 Weekly Check 13 (due Apr 25)

Apr 19

Lecture 23 Decision Trees

Ch. 27

Apr 21

Lecture 24 Clustering

Ch. 28

Apr 22

Discussion 13 Decision Trees and Random Forests (solutions) (recording)

Lab 13 Decision Trees & Random Forests (due Apr 26)

Proj 2B Spam & Ham II (due Apr 28)

Week 14

Apr 26

Lecture 25 Guest Speaker: Amol Deshpande - Data Regulations

Apr 27

Grad Project First Draft for Grad Project Due

Apr 28

Lecture 26 Guest Speaker: Matei Zaharia - Parallel Data Analytics; Conclusion

Weekly Check 14 Weekly Check 14 (due May 2 May 5)

Apr 29

Discussion 14 Clustering and Final Review (solutions)

Lab 14 Clustering (Optional, no due date)

Homework 8 Taxis (Optional, no due date)

Week 15

May 3

RRR Week

May 5

RRR Week

Week 16

May 9

Grad Project Final Draft of Grad Project Due

May 13

Final Final Exam (7-10 pm)