Decomposing Model Risk

## Simulating from the truth

True mean function:

$$
`\begin{array}
\textrm{g}(x) &= \theta_0 + \theta_1 x + \theta_2 x^2 \\
&= 23 + 4 x - 3.2 x^2
\end{array}`
$$

True data generating function:

$$
Y = g(x) + \epsilon; \quad \epsilon \sim N(0, 11) \\
$$

---

$$
g(x) = \theta_0 + \theta_1 x + \theta_2 x^2
$$

---

$$
y = \theta_0 + \theta_1 x + \theta_2 x^2 + \epsilon; \, n = 20
$$

---

$$
y = \theta_0 + \theta_1 x + \theta_2 x^2 + \epsilon; \, n = 20000
$$

---

## Visualizing Bias and Variance

### Procedure

1. Assume true generative model

2. Generate data set of size `$n$`

3. Estimate `$\hat{y}(x)$`

4. Repeat 2 and 3 many times to get a sense of the variation in `$\hat{y}(x)$`

### Estimating `$\hat{y}(x)$`

Let's naively assume a *linear form*, work with data sets of size 20, and fit 
`$\hat{y}(x)$` by least squares.

$$
g(x) = \theta_0 + \theta_1 x
$$

---

---

## Estimating `$\hat{y}(x)$`, take two

Next, let's presciently assume a quadratic form...

---

---
## Estimating `$\hat{y}(x)$`, take three (or seven?)

Finally, let's get ridiculous and assume a septic form...

---