Principles and Techniques of Data Science
UC Berkeley, Spring 2022
Lecture Zoom Discussion Sign-Up/Zoom Office Hour/Lab Help
Jump to current week: here.
- Lecture is hybrid: in-person in Li Ka Shing 245 and online via Zoom (see link above). Recordings will be posted within 12 hours of live lecture.
- Frequently Asked Questions: Before posting on the class Ed, please read the class FAQ page.
- Join Ed: here.
- Textbook readings are optional and actively in development. See the Resources for more details.
- Note: The schedule of lectures and assignments is subject to change.
Schedule
Week 1
- Jan 18
Lecture 1 Course Overview
Weekly Check 1 Weekly Check 1 (due Jan 24)
- Jan 20
Lecture 2 Sampling and Probability
- Jan 21
Lab 1 Prerequisite Coding (due Jan 25)
Homework 1 Intro + Prerequisites (due Jan 27)
Week 2
- Jan 24
Weekly Check 2 Weekly Check 2 (due Jan 31)
- Jan 25
Lecture 3 Pandas I
Textbook: Pandas Reference Table
Reference: Pandas API Documentation
- Jan 27
Lecture 4 Pandas II
- Jan 28
Discussion 2 Sampling and Probability, Pandas (code) (solutions) (recording)
Lab 2 Pandas (due Feb 1)
Homework 2 Food Safety (due Feb 3)
Week 3
- Jan 31
Weekly Check 3 Weekly Check 3 (due Feb 7)
- Feb 1
Lecture 5 Data Wrangling, EDA
- Feb 3
Lecture 6 Regex
- Feb 4
Discussion 3 Pandas, Data Cleaning (code) (solutions) (recording)
Lab 3 Data Cleaning (due Feb 8)
Homework 3 Tweets (due Feb 10)
Week 4
- Feb 7
Weekly Check 4 Weekly Check 4 (due Feb 14)
- Feb 8
Lecture 7 Visualization I
Textbook: Seaborn Reference Table
Textbook: Matplotlib Reference Table
- Feb 10
Lecture 8 Visualization II
- Feb 11
Discussion 4 Regex, Visualization (solutions) (recording)
Lab 4 Transformations and KDE (due Feb 15)
Homework 4 Bike Sharing (due Feb 17)
Week 5
- Feb 14
Weekly Check 5 Weekly Check 5 (due Feb 21)
- Feb 15
- Feb 17
Lecture 10 Constant Model, Loss, and Transformations
- Feb 18
Discussion 5 Modeling and Simple Linear Regression (solutions) (recording)
Lab 5 Modeling, Summary Statistics, Loss Functions (due Feb 22)
Homework 5 Regression (on paper) (LaTeX Template) (due Mar 3)
Week 6
- Feb 21
Weekly Check 6 Weekly Check 6 (due Feb 28)
- Feb 22
Lecture 11 Ordinary Least Squares (Multiple Linear Regression)
- Feb 24
Midterm Midterm 1 (8-10 pm) (No Lecture)
- Feb 25
Discussion 6 Ordinary Least Squares (solutions) (recording)
Lab 6 Ordinary Least Squares (due Mar 1)
Week 7
- Feb 28
Weekly Check 7 Weekly Check 7 (due Mar 7)
- Mar 1
Lecture 12 Gradient Descent, sklearn
- Mar 3
Lecture 13 Feature Engineering
- Mar 4
Discussion 7 Human Contexts in Engineering and Feature Engineering (solutions) (recording)
Lab 7 Gradient Descent and sklearn (due Mar 8)
Proj 1A Housing I (due Mar 10)
Week 8
- Mar 7
Weekly Check 8 Weekly Check 8 (due Mar 14)
- Mar 8
- Mar 10
Lecture 15 Cross-Validation and Regularization
- Mar 11
Discussion 8 HCE Part 2, Regularization (Budget Fact Sheet) (solutions) (recording)
Lab 8 Model Selection, Regularization, and Cross-Validation (due Mar 15)
Proj 1B Housing II (due
Mar 17Mar 18, Ed post)
Week 9
- Mar 14
Weekly Check 9 Weekly Check 9 (due Mar 28)
- Mar 15
Lecture 16 Probability I: Random Variables
- Mar 17
- Mar 18
Discussion 9 Cross-Validation + Probability I (solutions) (recording)
Lab 9 Probability and Modeling (due Mar 29)
Homework 6 Probability and Estimators (due Mar 31)
Spring Break
- Mar 22
Spring Break
- Mar 24
Spring Break
Week 10
- Mar 28
Weekly Check 10 Weekly Check 10 (due Apr 4)
- Mar 29
Lecture 18 SQL I
- Mar 31
Lecture 19 SQL II and PCA
- Apr 1
Discussion 10 Probability II + SQL I (solutions) (recording)
Lab 10 SQL (due
Apr 5Apr 6)Homework 7 SQL and PCA (due Apr 14)
- Apr 2
Grad Project Grad Project Released
Week 11
- Apr 4
Weekly Check 11 Weekly Check 11 (due Apr 11)
- Apr 5
Lecture 20 PCA II
- Apr 7
Midterm Midterm 2 (7-8:30 pm) (No Lecture)
- Apr 8
Discussion 11 SQL II + PCA (solutions) (Cancelled, No Live Discussion)
Lab 11 PCA (due Apr 12)
Week 12
- Apr 11
Weekly Check 12 Weekly Check 12 (due Apr 18)
- Apr 12
Lecture 21 Classification and Logistic Regression I
- Apr 14
Lecture 22 Logistic Regression II
- Apr 15
Discussion 12 Logistic Regression I (solutions) (recording)
Lab 12 Logistic Regression (due Apr 19)
Proj 2A Spam & Ham I (due
Apr 21Apr 22)
Week 13
- Apr 18
Weekly Check 13 Weekly Check 13 (due Apr 25)
- Apr 19
Lecture 23 Decision Trees
- Apr 21
Lecture 24 Clustering
- Apr 22
Discussion 13 Decision Trees and Random Forests (solutions) (recording)
Lab 13 Decision Trees & Random Forests (due Apr 26)
Proj 2B Spam & Ham II (due Apr 28)
Week 14
- Apr 26
- Apr 27
Grad Project First Draft for Grad Project Due
- Apr 28
Lecture 26 Guest Speaker: Matei Zaharia - Parallel Data Analytics; Conclusion
Weekly Check 14 Weekly Check 14 (due
May 2May 5)- Apr 29
Discussion 14 Clustering and Final Review (solutions)
Lab 14 Clustering (Optional, no due date)
Homework 8 Taxis (Optional, no due date)
Week 15
- May 3
RRR Week
- May 5
RRR Week
Week 16
- May 9
Grad Project Final Draft of Grad Project Due
- May 13
Final Final Exam (7-10 pm)