# Principles and Techniques of Data Science

UC Berkeley, Spring 2022

Lecture Zoom Discussion Sign-Up/Zoom Office Hour/Lab Help

Jump to current week: here.

- Lecture is hybrid: in-person in Li Ka Shing 245 and online via Zoom (see link above). Recordings will be posted within 12 hours of live lecture.
**Frequently Asked Questions:**Before posting on the class Ed, please read the class FAQ page.- Join Ed: here.
- Textbook readings are optional and actively in development. See the Resources for more details.
**Note:**The schedule of lectures and assignments is subject to change.

## Schedule

### Week 1

- Jan 18
**Lecture 1**Course Overview**Weekly Check 1**Weekly Check 1 (due Jan 24)- Jan 20
**Lecture 2**Sampling and Probability- Jan 21
**Lab 1**Prerequisite Coding (due Jan 25)**Homework 1**Intro + Prerequisites (due Jan 27)

### Week 2

- Jan 24
**Weekly Check 2**Weekly Check 2 (due Jan 31)- Jan 25
**Lecture 3**Pandas ITextbook: Pandas Reference Table

Reference: Pandas API Documentation

- Jan 27
**Lecture 4**Pandas II- Jan 28
**Discussion 2**Sampling and Probability, Pandas (code) (solutions) (recording)**Lab 2**Pandas (due Feb 1)**Homework 2**Food Safety (due Feb 3)

### Week 3

- Jan 31
**Weekly Check 3**Weekly Check 3 (due Feb 7)- Feb 1
**Lecture 5**Data Wrangling, EDA- Feb 3
**Lecture 6**Regex- Feb 4
**Discussion 3**Pandas, Data Cleaning (code) (solutions) (recording)**Lab 3**Data Cleaning (due Feb 8)**Homework 3**Tweets (due Feb 10)

### Week 4

- Feb 7
**Weekly Check 4**Weekly Check 4 (due Feb 14)- Feb 8
**Lecture 7**Visualization ITextbook: Seaborn Reference Table

Textbook: Matplotlib Reference Table

- Feb 10
**Lecture 8**Visualization II- Feb 11
**Discussion 4**Regex, Visualization (solutions) (recording)**Lab 4**Transformations and KDE (due Feb 15)**Homework 4**Bike Sharing (due Feb 17)

### Week 5

- Feb 14
**Weekly Check 5**Weekly Check 5 (due Feb 21)- Feb 15
- Feb 17
**Lecture 10**Constant Model, Loss, and Transformations- Feb 18
**Discussion 5**Modeling and Simple Linear Regression (solutions) (recording)**Lab 5**Modeling, Summary Statistics, Loss Functions (due Feb 22)**Homework 5**Regression (on paper) (LaTeX Template) (due Mar 3)

### Week 6

- Feb 21
**Weekly Check 6**Weekly Check 6 (due Feb 28)- Feb 22
**Lecture 11**Ordinary Least Squares (Multiple Linear Regression)- Feb 24
**Midterm**Midterm 1 (8-10 pm) (No Lecture)- Feb 25
**Discussion 6**Ordinary Least Squares (solutions) (recording)**Lab 6**Ordinary Least Squares (due Mar 1)

### Week 7

- Feb 28
**Weekly Check 7**Weekly Check 7 (due Mar 7)- Mar 1
**Lecture 12**Gradient Descent, sklearn- Mar 3
**Lecture 13**Feature Engineering- Mar 4
**Discussion 7**Human Contexts in Engineering and Feature Engineering (solutions) (recording)**Lab 7**Gradient Descent and sklearn (due Mar 8)**Proj 1A**Housing I (due Mar 10)

### Week 8

- Mar 7
**Weekly Check 8**Weekly Check 8 (due Mar 14)- Mar 8
- Mar 10
**Lecture 15**Cross-Validation and Regularization- Mar 11
**Discussion 8**HCE Part 2, Regularization (Budget Fact Sheet) (solutions) (recording)**Lab 8**Model Selection, Regularization, and Cross-Validation (due Mar 15)**Proj 1B**Housing II (due~~Mar 17~~Mar 18, Ed post)

### Week 9

- Mar 14
**Weekly Check 9**Weekly Check 9 (due Mar 28)- Mar 15
**Lecture 16**Probability I: Random Variables- Mar 17
- Mar 18
**Discussion 9**Cross-Validation + Probability I (solutions) (recording)**Lab 9**Probability and Modeling (due Mar 29)**Homework 6**Probability and Estimators (due Mar 31)

### Spring Break

- Mar 22
Spring Break

- Mar 24
Spring Break

### Week 10

- Mar 28
**Weekly Check 10**Weekly Check 10 (due Apr 4)- Mar 29
**Lecture 18**SQL I- Mar 31
**Lecture 19**SQL II and PCA- Apr 1
**Discussion 10**Probability II + SQL I (solutions) (recording)**Lab 10**SQL (due~~Apr 5~~Apr 6)**Homework 7**SQL and PCA (due Apr 14)- Apr 2
**Grad Project**Grad Project Released

### Week 11

- Apr 4
**Weekly Check 11**Weekly Check 11 (due Apr 11)- Apr 5
**Lecture 20**PCA II- Apr 7
**Midterm**Midterm 2 (7-8:30 pm) (No Lecture)- Apr 8
**Discussion 11**SQL II + PCA (solutions) (Cancelled, No Live Discussion)**Lab 11**PCA (due Apr 12)

### Week 12

- Apr 11
**Weekly Check 12**Weekly Check 12 (due Apr 18)- Apr 12
**Lecture 21**Classification and Logistic Regression I- Apr 14
**Lecture 22**Logistic Regression II- Apr 15
**Discussion 12**Logistic Regression I (solutions) (recording)**Lab 12**Logistic Regression (due Apr 19)**Proj 2A**Spam & Ham I (due~~Apr 21~~Apr 22)

### Week 13

- Apr 18
**Weekly Check 13**Weekly Check 13 (due Apr 25)- Apr 19
**Lecture 23**Decision Trees- Apr 21
**Lecture 24**Clustering- Apr 22
**Discussion 13**Decision Trees and Random Forests (solutions) (recording)**Lab 13**Decision Trees & Random Forests (due Apr 26)**Proj 2B**Spam & Ham II (due Apr 28)

### Week 14

- Apr 26
- Apr 27
**Grad Project**First Draft for Grad Project Due- Apr 28
**Lecture 26**Guest Speaker: Matei Zaharia - Parallel Data Analytics; Conclusion**Weekly Check 14**Weekly Check 14 (due~~May 2~~May 5)- Apr 29
**Discussion 14**Clustering and Final Review (solutions)**Lab 14**Clustering (Optional, no due date)**Homework 8**Taxis (Optional, no due date)

### Week 15

- May 3
RRR Week

- May 5
RRR Week

### Week 16

- May 9
**Grad Project**Final Draft of Grad Project Due- May 13
**Final**Final Exam (7-10 pm)