Lecture 17 – Cross-Validation and Regularization

Presented by Isaac Schmidt, Paul Shao

Content by Isaac Schmidt, Joseph Gonzalez, Suraj Rampure, Paul Shao

17.8 is a supplementary video, created by Paul Shao. It gives a great high-level overview of both the bias-variance tradeoff and regularization. The instructors highly recommend this video.

Note: The demos in this lecture were adapted from demos that Prof. Joey Gonzalez recorded in Spring 2020. We decided to redo them as the originals significantly rely on sklearn’s Pipeline object, which is not in scope this semester. However, these notebooks are still available to you, in case you wish to use this style of code for your own projects. They are in the lecture folder on DataHub named lec17-alt-1.ipynb and lec17-alt-2.ipynb, and you can see the HTML for Part 1 and Part 2.

	Video	Quick Check
17.0 Introduction.
17.1 Training error vs. testing error. Why we need to split our data into train and test. How cross-validation works, and why it is useful.		17.1
17.2 Using scikit-learn to construct a train-test split. Building a linear model and determining its training and test error.		17.2
17.3 Implementing cross-validation, and using it to help select a model.		17.3
17.4 An introduction to regularization.		17.4
17.5 Reformulating the regularization optimization problem. Relationship between the hyperparameter and error. Standardizing features.		17.5
17.6 Ridge regression and LASSO regression. Distinction between parameters and hyperparameters.		17.6
17.7 Using cross-validation with ridge regression and LASSO regression in scikit-learn.		17.7
17.8 Supplemental. An overview of the bias-variance tradeoff, and how it interfaces with regularization.		N/A

Data 100

Lecture 17 – Cross-Validation and Regularization