Lecture 11: Introduction to Modeling

Data 100, Summer 2021

Suraj Rampure

Adapted from Fernando Perez

Losses

Toy Data

Let's plot the $L_2$ loss for a single observation. We'll plot the $L_2$ loss for the first observation; since $y_1 = 20$, we'll be plotting

$$L_2(20, \theta) = (20 - \theta)^2$$

We can see that the loss for a single observation is minimized by that observation itself (i.e. when $\theta = 20$, the above loss is minimized).

Let's now compute the average loss over all of our toy data.

The explicit expression for the MSE here is

$$R(\theta) = \frac{1}{5} \big((20 - \theta)^2 + (21 - \theta)^2 + (22 - \theta)^2 + (29 - \theta)^2 + (33 - \theta)^2\big)$$

Note; the shape looks similar, but the minimizing value of $\theta$ is now shifted. It appears to be closer to 25 (which you may notice as the mean of[20, 21, 22, 29, 33]).

Let's now do the same, but for L1 loss. For our first observation, the $L_1$ loss is

$$L_1(20, \theta) = |20 - \theta|$$

Again this is centered on the observation itself, 20.

Averaging across all of our data:

The explicit expression for the MAE here is

$$R(\theta) = \frac{1}{5} \big(|20 - \theta| + |21 - \theta| + |22 - \theta| + |29 - \theta| + |33 - \theta|\big)$$

Note, it is pointy, and not smooth like the MSE. It also doesn't exactly look like a simple absolute value curve. It's a combination of several absolute value functions.

As we show in the lecture, the minimizing value of the MAE here is $\theta = 22$, as that is the median of our observations.

What if we instead had an even number of points. There wouldn't be a unique median! Let's see what happens: