import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import YouTubeVideo
Note that we're defining expectation only for discrete random variables. For continuous random variables you have to replace the sum by an appropriate integral, which we won't need in this class. But all the properties of expectation listed in the slides hold for both kinds of random variables.
YouTubeVideo("wBBWFYz9248")
Now go over Slides 20-26.
Please work out every line in Slides 23 and 26, and note how the "balance point" interpretation reduces the need for calculation.
Linear Transformation rules are used frequently because:
Non-linear transformations don't play well with expectation:
When in doubt, try to use additivity.
Question 1: Which of the following is $\mathbb{E}(\mathbb{E}(X))$?
Question 2: Suppose you know that $\mathbb{E}(X) = 2$ and $\mathbb{E}(X^2) = 13$. If possible, find $\mathbb{E}(X - 5)$.
Question 3: Suppose you know that $\mathbb{E}(X) = 2$ and $\mathbb{E}(X^2) = 13$. If possible, find $\mathbb{E}(\vert X - 5 \vert)$.
Question 4: Suppose you know that $\mathbb{E}(X) = 2$ and $\mathbb{E}(X^2) = 13$. If possible, find $\mathbb{E}((X - 5)^2)$.
Notice how squared loss is easier to work with mathematically than absolute loss.
To quantify how far a random variable can be from its mean, it's natural to start with the deviation from mean:
$$ D ~ = ~ X - \mathbb{E}(X) $$You should show that $\mathbb{E}(D) = 0$.
The positive deviations exactly cancel out the negative deviations. So to get a sense of how big $D$ is, we have to ignore its sign somehow. That is why we look instead at the mean squared error $\mathbb{E}(D^2)$, which is called the variance of $X$.
It's worth writing out this obvious fact: variance is non-negative. Details:
Question 5: $X$ has the uniform distribution on the values $-1$ and $1$. $Y$ has the uniform distribution on the values $-1$, $0$, and $1$. Which is bigger: $\mathbb{SD}(X)$ or $\mathbb{SD}(Y)$? Answer without calculation. [Try drawing (by hand) overlaid histograms of the two distributions.]
This is sometimes called the computational formula for $\mathbb{V}ar(X)$ but it can actually pretty bad in terms of numerical accuracy if $X$ has large positive or negative values. The term is a holdover from the days when computation meant cranking out by hand. It can also reduce algebra.
Before looking at the derivation, let's use the result, which is
$$ \mathbb{V}ar(X) ~ = ~ \mathbb{E}(X^2) - (\mathbb{E}(X))^2 $$Consequences
Note that result! It's going to get used.
Question 6: Suppose $\mathbb{E}(X) = 10$ and $\mathbb{SD}(X) = 2$. Match the numbers below with $\mathbb{E}(X^2)$, $(\mathbb{E}(X))^2$, and $\mathbb{V}ar(X)$. You can use numbers more than once, and some will be left over.
Question 7: Suppose $\mathbb{E}(X^2) = 13$ and $\mathbb{V}ar(X) = 9$. Which of the following could $\mathbb{E(X)}$ be? Pick all the options that you think will work.
YouTubeVideo("poYb0w7LhY8")
Yes, those again. This sequence of figures should explain why $\mathbb{SD}(aX+b) = \vert a \vert \mathbb{SD}(X)$.
Important consequence: The squared coeffiecient in $\mathbb{V}ar(aX) = a^2\mathbb{V}ar(X)$.
# Probability distribution of X
vals_X = np.arange(1, 4)
probs = np.array([0.2, 0.5, 0.3])
# Distribution of X
#bins_X = np.arange(0.5, 3.6, 1)
bins_X = np.arange(-9.5, 9.6)
def plot_dist_X():
plt.hist(vals_X, bins=bins_X, weights=probs, ec='w')
plt.xticks(range(-10, 11, 2))
plt.xlim(-10, 10)
plot_dist_X()
# Distribution of X+4
# SD doesn't change
plot_dist_X()
shift_b = 4
plt.hist(vals_X+shift_b, bins=bins_X, weights=probs,ec='w');
# Distribution of 3X
# SD gets multiplied by 3
plot_dist_X()
scale_a = 3
plt.hist(scale_a*vals_X, bins=bins_X, weights=probs, ec='w');
# Distribution of -3X
# SD gets multiplied by 3
plot_dist_X()
scale_a = -3
plt.hist(scale_a*vals_X, bins=bins_X, weights=probs, ec='w');
Question 8: Suppose $\mathbb{V}ar(X) = \sigma^2$. Match the following with $\mathbb{SD}(X)$, $\mathbb{SD}(-X)$, and $\mathbb{V}ar(-X)$. You can use values more than once, and some will be left over.
$-\sigma^2$, $-\sigma$, $\sigma$, $\sigma^2$
Question 9: On a 10-question test where each question is graded as either Right or Wrong, a student guesses randomly according to some wild and weird scheme. Let $R$ be the number of questions the students gets Right and $W$ the number Wrong. Suppose $E(R) = 3$ and $SD(R) = 2$. Find $E(W)$ and $SD(W)$.
Work out every line of Slide 33, please. The results put together a bunch of previous results about variance and standard deviation, and are used frequently.