InĀ [1]:
import pandas as pd
import numpy as np
import plotly.express as px
Which would you pick?¶
- $\large Y_A = 10 X_1 + 10 X_2 $
- $\large Y_B = \sum\limits_{i=1}^{20} X_i$
- $\large Y_C = 20 X_1$
First let's construct the probability distribution for a single coin. This will let us flip 20 IID coins later.
InĀ [2]:
# First construct probability distribution for a single fair coin
p = 0.5
coin_df = pd.DataFrame({"x": [1, 0], # [Heads, Tails]
"P(X = x)": [p, 1 - p]})
coin_df
Out[2]:
| x | P(X = x) | |
|---|---|---|
| 0 | 1 | 0.5 |
| 1 | 0 | 0.5 |
Choice A:¶
$\large Y_A = 10 X_1 + 10 X_2 $
A couple ways to sample:
InĀ [3]:
coin_df.sample(10, weights="P(X = x)", replace=True)["x"]
Out[3]:
0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Name: x, dtype: int64
InĀ [4]:
N = 10000
np.random.rand(N,2) < p
Out[4]:
array([[False, False],
[ True, True],
[ True, True],
...,
[False, True],
[ True, True],
[ True, True]])
InĀ [5]:
sim_flips = pd.DataFrame(
{"Choice A": np.sum((np.random.rand(N,2) < p) * 10, axis=1)})
sim_flips
Out[5]:
| Choice A | |
|---|---|
| 0 | 20 |
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 20 |
| ... | ... |
| 9995 | 10 |
| 9996 | 20 |
| 9997 | 20 |
| 9998 | 0 |
| 9999 | 20 |
10000 rows Ć 1 columns
Choice B:¶
$\large Y_B = \sum\limits_{i=1}^{20} X_i$
InĀ [6]:
sim_flips["Choice B"] = np.sum((np.random.rand(N,20) < p), axis=1)
sim_flips
Out[6]:
| Choice A | Choice B | |
|---|---|---|
| 0 | 20 | 10 |
| 1 | 20 | 16 |
| 2 | 20 | 9 |
| 3 | 10 | 11 |
| 4 | 20 | 12 |
| ... | ... | ... |
| 9995 | 10 | 10 |
| 9996 | 20 | 9 |
| 9997 | 20 | 7 |
| 9998 | 0 | 11 |
| 9999 | 20 | 14 |
10000 rows Ć 2 columns
Choice C:¶
$\large Y_C = 20 X_1$
InĀ [7]:
sim_flips["Choice C"] = 20 * (np.random.rand(N,1) < p)
sim_flips
Out[7]:
| Choice A | Choice B | Choice C | |
|---|---|---|---|
| 0 | 20 | 10 | 20 |
| 1 | 20 | 16 | 0 |
| 2 | 20 | 9 | 0 |
| 3 | 10 | 11 | 0 |
| 4 | 20 | 12 | 0 |
| ... | ... | ... | ... |
| 9995 | 10 | 10 | 0 |
| 9996 | 20 | 9 | 20 |
| 9997 | 20 | 7 | 0 |
| 9998 | 0 | 11 | 0 |
| 9999 | 20 | 14 | 20 |
10000 rows Ć 3 columns
If you're curious as to what these distributions look like, I've simulated some populations:
InĀ [8]:
px.histogram(sim_flips.melt(), x="value", facet_row="variable",
barmode="overlay", histnorm="probability",
title="Empirical Distributions",
width=600, height=600)
InĀ [9]:
pd.DataFrame([
sim_flips.mean().rename("Simulated Mean"),
sim_flips.var().rename("Simulated Var"),
np.sqrt(sim_flips.var()).rename("Siumulated SD")
])
Out[9]:
| Choice A | Choice B | Choice C | |
|---|---|---|---|
| Simulated Mean | 10.214000 | 9.979400 | 10.092000 |
| Simulated Var | 49.879192 | 4.956271 | 100.001536 |
| Siumulated SD | 7.062520 | 2.226268 | 10.000077 |