Lecture 13 – Data 100, Summer 2021

by Suraj Rampure

Recap from Lecture 12

In the last lecture, we used this magical package sklearn to determine the optimal value of $\hat{\theta}$ for a linear model that uses AST and 3PA to predict PTS. (The usage of sklearn will be covered in the next lecture.)

Note: We didn't actually cover this part in the video, but it was in the notebook.

We then looked at the values of the coefficients:

This meant our model was of the form

$$\text{predicted PTS} = 2.1563 + 1.6407 \cdot \text{AST} + 1.2576 \cdot \text{3PA}$$

Using np.linalg.inv to solve for optimal $\hat{\theta}$

We will now use what we know about the solution to the normal equations to determine $\hat{\theta}$ on our own, by hand. We know that for the ordinary least squares model,

$$\hat{\theta} = (\mathbb{X}^T\mathbb{X})^{-1} \mathbb{X}^T\mathbb{Y}$$

Here, our design matrix $\mathbb{X}$ only has two columns. But, in order to incorporate the intercept term (i.e. $\theta_0$), we need to include a column that contains all 1s. This is referred to as the "intercept" or "bias" column. (sklearn handled this for us, because we set fit_intercept = True. This is actually the default behavior in sklearn.)

Here, theta[0] = 2.156. Since the first column of X is the intercept column, this means that $\theta_0 = 2.1563$. Similarly, we have $\theta_1 = 1.6407$ and $\theta_2 = 1.2576$. These are the exact same coefficients that sklearn found for us!

Residual plots

For the simple linear case, let's revisit Anscombe's quartet.

Dataset 1 appears to have a linear trend between $x$ and $y$:

While dataset 2 does not:

Let's fit simple linear regression models to both datasets, and look at residual plots of residual vs. $x$.

For dataset 1, it appears that the residuals are generally equally spread out, and that there is no trend. This indicates that our linear model was a good fit here. (It is also true that the "positive" and "negative" residuals cancel out, but this is true even when our fit isn't good.)

The fit does not appear to be as good here. While the positive and negative residuals cancel out, there is a clear trend – the underlying data has a quadratic relationship, and the residuals reflect that. We'd likely need to increase the complexity of our model here.

As mentioned above, the residuals in both sum to zero, but this is true of any linear model with an intercept term, as discussed in lecture.

For the multiple linear regression case, let's go back to the nba data. Let's once again use AST and 3PA to predict PTS.