Fitting Linear Models

In this notebook we briefly review the normal equations, introduce the scikit-learn framework, evaluation metrics, and describe methods to visualize model fit.

Toy Data set

To enable easy visualization of the model fitting process we will use a simple synthetic data set.

We can visualize the data in three dimensions:

Fitting an Ordinary Least Squares Model

Given a model of the form:

$$ \hat{\mathbb{Y}} = f_\theta(\mathbb{X}) = \mathbb{X} \theta $$

and taking the average squared loss over our data:

$$ R(\theta) = \frac{1}{n}\sum_{i=1}^n \left(\mathbb{Y}_i - (\mathbb{X}\theta)_i\right)^2 $$

In lecture, we showed that the $\hat{\theta}$ that minimizes this loss:

$$ \hat{\theta} = \arg\min_\theta R(\theta) $$

is given by the solution to the normal equations:

$$ \hat{\theta} = \left( \mathbb{X}^T \mathbb{X} \right)^{-1} \mathbb{X}^T \mathbb{Y} $$

We can directly implement this expression using standard linear algebra routines:

A more efficient way to solve the normal equations is using the solve function to solve the linear systems:

$$ \mathbb{X}^T \mathbb{X} \theta = \mathbb{X}^T \mathbb{Y} $$

can be simplified to:

$$ A \theta = b $$

where $A=\mathbb{X}^T \mathbb{X}$ and $b=\mathbb{X}^T \mathbb{Y}$:

Notice that this second implementation produces a numpy array? This is because Pandas actually implements inversion but the solve routine is entirely in numpy.

Making A Prediction

We can use our $\hat{\theta}$ to make predictions:

$$ \hat{y} = X \hat{\theta} $$

How good are our predictions? We can plot $Y$ vs $\hat{Y}$

We can also plot the residual distribution.

Visualizing the Model

For the synthetic data set we can visualize the model in three dimensions. To do this we will use the following plotting function that at evenly space grid points in for X0 and X1.

Plotting the data and the plane

Notice that the plane is constrained to pass through the origin. To fix this, we will need to add a constant term to the model. However, to simplify the process let's switch to using the scikit-learn python library for our modeling.

Introducing Scikit-learn

In this class, we introduce the normal equations as well as several other algorithms to provide some insight behind how these techniques work and perhaps more importantly how they fail. However, in practice you will seldom need to implement the core algorithms and will instead use various machine learning software packages. In this class, we will focus on the widely used scikit-learn package.

Scikit-learn, or as the cool kids call it sklearn (pronounced s-k-learn), is an large package of useful machine learning algorithms. For this lecture, we will use the LinearRegression model in the linear_model module. The fact that there is an entire module with many different models within the linear_model module might suggest that we have a lot to cover still (we do!).

What you should know about sklearn models:

  1. Models are created by first building an instance of the model:
    model = SuperCoolModelType(args)
    
  2. You then fit the model by calling the fit function passing in data:
    model.fit(df[['X1' 'X1']], df[['Y']])
    
  3. You then can make predictions by calling predict:
    model.predict(df2[['X1' 'X1']])
    

The neat part about sklearn is most models behave like this. So if you want to try a cool new model you just change the class of mode you are using.

Using Scikit-learn

We import the LinearRegression model

Create an instance of the model. This time we will add an intercept term to the model directly.

Fit the model by passing it the $X$ and $Y$ data:

Make some predictions and even save them back to the original DataFrame

Analyzing the fit again:

We can also plot the residual distribution.

Computing Error Metrics

As we tune the features in our model it will be important to define some useful error metrics.

As we play with the model we might want a standard visualization

Examining our latest model:

Our first model without the intercept term:

Examining the above data we see that there is some periodic structure as well as some curvature. Can we fit this data with a linear model?