Principles and Techniques of Data Science
UC Berkeley, Summer 2022

Anirudhan Badrinath
- Frequently Asked Questions: Before posting on the class Ed, please read the class FAQ page.
- The Syllabus contains a detailed explanation of how each course component will work this summer, given that the course is being taught entirely online.
- Textbook readings are optional and actively in development. See the Resources for more details.
- Note: The schedule of lectures and assignments is subject to change.
Schedule
Week 1
- Jun 21
Lab 1 Prerequisite Coding (due Jun 27)
Lab 2 Pandas (due Jun 27)
- Jun 22
Lecture 2 Pandas I
Textbook: Pandas Reference Table
Reference: Pandas API Documentation
- Jun 23
Lecture 3 Pandas II
Discussion 0 (Optional) Fundamentals
Discussion 1 Sampling and Probability, Pandas, code
Homework 1 Intro + Prerequisites (due Jun 27)
- Jun 23
Exam Prep 1 Sampling and Probability, Pandas
Week 2
- Jun 27
Lecture 4 Data Cleaning, EDA
Weekly Check 2 Weekly Check 2
Lab 3 Data Cleaning and EDA (due Jul 2)
Lab 4 Transformations and KDE (due Jul 2)
Homework 2 Food Safety (due Jun 30)
- Jun 28
Lecture 5 Regex
Discussion 2 Pandas, Data Cleaning
- Jun 29
Lecture 6 Visualization I
Textbook: Seaborn Reference Table
Textbook: Matplotlib Reference Table
- Jun 30
Lecture 7 Visualization II
Discussion 3 Regex, Visualization
Homework 3 Tweets (due Jul 5)
- Jul 1
Exam Prep 2 Pandas, Visualization, Regex
Catch-up section 1
Week 3
- Jul 4
Independence Day
Weekly Check 3 Weekly Check 3
Lab 5 Modeling, Loss Functions, and Summary Statistics (due Jul 9)
Lab 6 Linear Regression (due Jul 9)
Homework 4 Bike Sharing (due Jul 7)
- Jul 5
Lecture 8 Intro to Modeling, Simple Linear Regression
Discussion 4 Modeling and Simple Linear Regression
Solution, Recording
- Jul 6
Lecture 9 Constant Model, Loss, and Transformations
- Jul 7
Lecture 10 Ordinary Least Squares (Multiple Linear Regression)
Discussion 5 Ordinary Least Squares
Solution, Recording
Homework 5 Regression (On paper) (due Jul 11)
- Jul 8
Exam Prep 2 TBD
Solution, Recording
Catch-up section 1 TBD
Recording
Week 4
- Jul 11
Lecture 11 Gradient Descent, sklearn
Lab 7 Gradient Descent and sklearn (due Jul 16)
Lab 8 Cross-Validation and Regularization (due Jul 16)
Proj 1A Housing I (due Jul 14)
- Jul 12
Lecture 12 Feature Engineering
Discussion 6 HCE, Gradient Descent
- Jul 13
Lecture 13 Cross-Validation and Regularization
- Jul 14
Lecture 14 Case Study (HCE): Fairness in Housing Appraisal
Discussion 7 HCE, Regularization, and Cross-Validation
Proj 1B Housing II (due Jul 25)
Week 5
- Jul 18
Midterm Midterm Exam
Lab 9 Probability and Modeling (due Jul 23)
- Jul 19
Break (No Lecture)
- Jul 20
Lecture 15 Probability I: Random Variables
- Jul 21
Lecture 16 Probability II: Estimators, Bias, and Variance
Discussion 8 Probability
Week 6
- Jul 25
Lecture 17 SQL I
Lab 10 SQL (due Jul 30)
Lab 11 PCA (due Jul 30)
Homework 6 Probability and Estimators (due Jul 28)
- Jul 26
Lecture 18 SQL II
Discussion 9 SQL
- Jul 27
Lecture 19 PCA
- Jul 28
Break (No Lecture)
Discussion 10 PCA
Homework 7 SQL and PCA (due Aug 1)
Week 7
- Aug 1
Lecture 20 Classification and Logistic Regression I
Lab 12 Logistic Regression (due Aug 6)
Lab 13 Decision Trees & Random Forests (due Aug 6)
Proj 2A Spam I (due Aug 4)
- Aug 2
Lecture 21 Logistic Regression II
Discussion 11 Logistic Regression
- Aug 3
Lecture 22 Decision Trees
- Aug 4
Lecture 23 Clustering
Discussion 12 Decision Trees, Clustering
Proj 2B Spam II (due Aug 8)
Week 8
- Aug 8
Review
- Aug 9
Review
- Aug 10
Review
- Aug 11
Review
- Aug 12
Final Final Exam