Introduction to Overfitting

One of the key challenges with feature engineering is that you can "over engineer" your features and produce a model that fits the data but performs poorly when making predictions on new data. This is typically referred to as overfitting to your data and is the focus on the next set of lectures.

In this notebook, we will provide a very simple illustration of overfitting, but as you will see and soon experience, it is very easy to overfit to your data and this will become the key challenge in the design of good models.

Toy Data and Model Setup

For this problem we will use a very simple toy dataset to help illustrate where things will fail.

Notice that there are only 8 datapoints in this dataset. Small data is especially prone to the challenges of overfitting.