Introduction to Overfitting

One of the key challenges with feature engineering is that you can "over engineer" your features and produce a model that fits the data but performs poorly when making predictions on new data. This is typically referred to as overfitting to your data and is the focus on the next set of lectures.

In this notebook, we will provide a very simple illustration of overfitting, but as you will see and soon experience, it is very easy to overfit to your data and this will become the key challenge in the design of good models.





Toy Data and Model Setup

For this problem we will use a very simple toy dataset to help illustrate where things will fail.

Notice that there are only 8 datapoints in this dataset. Small data is especially prone to the challenges of overfitting.





Fit a Basic Linear Model

We can start by fitting a basic linear model to the data:

As before we define a helper routine to track our progress in model design.

Over Engineering

How could we improve the model fit?

Success!?

What happens if we get more data from the world?

Plotting this new data (in red) on top of the old data we see that while the more complex RBF model fit the original data perfectly, it does not fit the new

What happens if we plot this data on top of our previous picture?

The following plots the training and test error. Try zooming in to see what happens to training error and testing error as we increase the number of features in our model.

In the rest of this lecture we will dig into the ideas drive this behavior.

What's happening: Over-fitting

As we increase the expressiveness of our model we begin to over-fit to the variability in our training data. That is we are learning patterns that do not generalize beyond our training dataset

Over-fitting is a key challenge in machine learning and statistical inference. At it's core is a fundamental trade-off between bias and variance: the desire to explain the training data and yet be robust to variation in the training data.

We will study the bias-variance trade-off in today's lecture.