Scientist creates randomness by taking a random sample, assigning subjects at random to treatment or control, etc.
Scientist invents (assumes) a probability model for data the world gives.
(1) allows sound inferences.
(2) is only as good as the assumptions.
In the South Seas there is a cargo cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they've arranged to imitate things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he's the controller—and they wait for the airplanes to land. They're doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn't work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they're missing something essential, because the planes don't land.
Now it behooves me, of course, to tell you what they’re missing. But it would he just about as difficult to explain to the South Sea Islanders how they have to arrange things so that they get some wealth in their system. It is not something simple like telling them how to improve the shapes of the earphones. But there is one feature I notice that is generally missing in Cargo Cult Science. That is the idea that we all hope you have learned in studying science in school—we never explicitly say what this is, but just hope that you catch on by all the examples of scientific investigation. It is interesting, therefore, to bring it out now and speak of it explicitly. It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty—a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked—to make sure the other fellow can tell they have been eliminated.
Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can—if you know anything at all wrong, or possibly wrong—to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.
In summary, the idea is to try to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another.
[] We've learned from experience that the truth will come out. Other experimenters will repeat your experiment and find out whether you were wrong or right. Nature's phenomena will agree or they'll disagree with your theory. And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven't tried to be very careful in this kind of work. And it's this type of integrity, this kind of care not to fool yourself, that is missing to a large extent in much of the research in cargo cult science.
The first principle is that you must not fool yourself—and you are the easiest person to fool. So you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.
—Richard Feynman, 1974. http://calteches.library.caltech.edu/51/2/CargoCult.htm
Nested (monotone) hypothesis tests:
$\{A_\alpha : \alpha \in (0, 1] \}$
$\mathbb{P}_0 \{ X \notin A_\alpha \} \le \alpha$ (or more generally, $\mathbb{P} \{ X \notin A_\alpha \} \le \alpha, \; \forall \mathbb{P} \in \mathcal{P}_0$)
$A_\alpha \subset A_\beta$ if $\beta < \alpha$ (Can always re-define $A_\alpha \leftarrow \cup_{\beta \ge \alpha } A_\beta$)
P-values measure the strength of the evidence against the null: smaller values, stronger evidence.
If the $P$-value equals $p$, either:
Alternative hypothesis matters for power, but not for level.
Rejecting the null is not evidence for the alternative: it's evidence against the null.
If the null is unreasonable, no surprise if we reject it. Null needs to make sense.
Unreasonable null is not support for the alternative.
Probablility doesn't come out of a calculation unless probability went into the calculation.
Can't conclude that a process is random without making assumptions that amount to assuming that the process is random. (Something has to put the randomness rabbit into the hat.)
Testing whether the process appears to be random using the assumption that it is random cannot prove that it is random. (You can't borrow a rabbit from an empty hat.)
Posterior distributions don't exist without prior distributions.
Anytime you see a $P$-value, you should ask what the null hypothesis is.
E.g., $\mu = 0$ is not the whole null hypothesis:
Anytime you see a posterior probability, you should ask what the prior was.
Anytime you see a confidence interval or standard error, you should ask what was random.
Many P-values and other "probabilities" and most cost-benefit analyses are quantifauxcation.
Usually involves some combination of data, pure invention, ad hoc models, inappropriate statistics, and logical lacunae.
Randomization model: two lists. Are they "different"?
$t$-test. Assumptions?
Permutation distribution
11 pairs of rats, each pair from the same litter.
Randomly—by coin tosses—put one of each pair into "enriched" environment; other sib gets "normal" environment.
After 65 days, measure cortical mass (mg).
enriched | 689 | 656 | 668 | 660 | 679 | 663 | 664 | 647 | 694 | 633 | 653 |
impoverished | 657 | 623 | 652 | 654 | 658 | 646 | 600 | 640 | 605 | 635 | 642 |
difference | 32 | 33 | 16 | 6 | 21 | 17 | 64 | 7 | 89 | -2 | 11 |
How should we analyze the data?
Cartoon of Rosenzweig, M.R., E.L. Bennet, and M.C. Diamond, 1972. Brain changes in response to experience, Scientific American, 226, 22–29 report an experiment in which 11 triples of male rats, each triple from the same litter, were assigned at random to three different environments, "enriched" (E), standard, and "impoverished." See also Bennett et al., 1969.
Null hypothesis: treatment has "no effect."
Alternative hypothesis: treatment increases cortical mass.
Suggests 1-sided test for an increase.
$$ \frac{\mbox{mean(treatment) - mean(control)}} {\mbox{pooled estimate of SD of difference of means}} $$
$$
\frac{\mbox{mean(differences)}}{\mbox{SD(differences)}/\sqrt{11}}
$$
Better, since littermates are presumably more homogeneous.
Assumptions of the permutation test are true by design: That's how treatment was assigned.
If we reject the null for the 1-sample $t$-test, what have we learned?
That the data are not (statistically) consistent with the assumption that they are an IID random sample from a normal distribution with mean 0.
So what? We never thought they were.
This is a straw man null hypothesis.
Reflexive way to try to represent uncertainty (post-WWII phenomenon)
Not all uncertainty can be represented by a probability
"Aleatory" versus "Epistemic"
Aleatory
Epistemic: stuff we don't know
"Pistimetry": measuring beliefs
Bayesian way of combining aleatory variability epistemic uncertainty puts beliefs on a par with an unbiased physical measurement w/ known uncertainty.
what if I don't trust your internal scale, or your assessment of its accuracy?
Toss a fair coin $k$ times independently; $X$ is the number of heads; $\theta$ is the chance of heads.
$$ \mathbb{P}(X=k || \theta) = {n \choose k} \theta^k (1-\theta)^{n-k}. $$
Suppose prior is of the form
$$ \pi(\theta) = \frac{\theta^\alpha (1-\theta)^\beta}{\int t^\alpha (1-t)^\beta dt}. $$
After tossing the coin, the posterior distribution will be of the same form.
Suppose it turns out to be
$$ p(\theta) = C \theta^{100}(1-\theta)^{100}. $$
According to Bayesian inference, that is everything there is to know about $\theta$ based on prior beliefs and the experiment.
But doesn't it matter whether this is simply a prior, the posterior after 5 tosses, or the posterior after 200 tosses?
Bayesian formalism does not distinguish between these cases.
In a series of trials, if each trial has the same probability $p$ of success, and if the trials are independent, then the rate of successes converges (in probability) to $p$. Law of Large Numbers
If a finite series of trials has an empirical rate $p$ of success, that says nothing about whether the trials are random.
If the trials are random and have the same chance of success, the empirical rate is an estimate of $p$.
If the trials are random and have the same chance of success and the dependence of the trials is known (e.g., the trials are independent), can quantify the uncertainty of the estimate.
Why does the first invite an answer, and the second not?
Have a collection of numbers, e.g., MME climate model predictions of warming
Take mean and standard deviation.
Report mean as the estimate; construct a confidence interval or "probability" statement from the results, generally using Gaussian critical values
IPCC does this, as do many others.
No random sample; no stochastic errors.
Even if there were a random sample, what justifies using normal theory?
Even if random and normal, misinterprets confidence as probability. Garbled; something like Fisher's fiducial inference
Ignores known errors in physical approximations
Ultimately, quantifauxcation.
Notions like probability, p-value, confidence intervals, etc., apply only if the sample is random (or for some kinds of measurement errors)
Avian / wind-turbine interactions
Earthquake probabilities
Wind turbines kill birds, notably raptors.
how many, and of what species?
how concerned should we be?
what design and siting features matter?
how do you build/site less lethal turbines?
Periodic on-the-ground surveys, subject to:
censoring
shrinkage/scavenging
background mortality
is this pieces of two birds, or two pieces of one bird?
how far from the point of injury does a bird land? attribution...
Is it possible to ...
make an unbiased estimate of mortality?
reliably relate the mortality to individual turbines in wind farms?
Common: Mixture of a point mass at zero and some distribution on the positive axis. E.g., "Zero-inflated Poisson"
Countless alternatives, e.g.:
observe $\max\{0, \mbox{Poisson}(\lambda_j)-b_j\}$, $b_j > 0$
observe $b_j\times \mbox{Poisson}(\lambda_j)$, $b_j \in (0, 1)$.
observe true count in area $j$ with error $\epsilon_j$, where $\{\epsilon_j\}$ are dependent, not identically distributed, nonzero mean
Why random?
Why Poisson?
Why independent from site to site? From period to period? From bird to bird? From encounter to encounter?
Why doesn't chance of detection depend on size, coloration, groundcover, …?
Why do different observers miss carcasses at the same rate?
What about background mortality?
#encounters
$\times P(\mbox{heads})$ constant?Probabilistic seismic hazard analysis (PSHA): basis for building codes in many countries & for siting nuclear power plants
Models locations & magnitudes of earthquakes as random; mags iid
Models ground motion as random, given event. Distribution depends on the location and magnitude of the event.
Claim to estimate "exceedance probabilities": chance acceleration exceeds some threshold in some number of years
In U.S.A., codes generally require design to withstand accelerations w probability ≥2% in 50y.
PSHA arose from probabilistic risk assessment (PRA) in aerospace and nuclear power.
Those are engineered systems whose inner workings are known but for some system parameters and inputs.
Inner workings of earthquakes are almost entirely unknown: PSHA is based on metaphors and heuristics, not physics.
Some assumptions are at best weakly supported by evidence; some are contradicted.
Model earthquake occurrence as a marked stochastic process with known parameters.
Model ground motion in a given place as a stochastic process, given the quake location and magnitude.
Then,
probability of a given level of ground movement in a given place is the integral (over space and magnitude) of the conditional probability of that level of movement given that there's an event of a particular magnitude in a particular place, times the probability that there's an event of a particular magnitude in that place
That earthquakes occur at random is an assumption not based in theory or observation.
involves taking rates as probabilities
Models amount to saying there's an "earthquake deck"
Turn over one card per period. If the card has a number, that's the size quake you get.
Journals and journals full of arguments about how many "8"s in the deck, whether the deck is fully shuffled, whether cards are replaced and re-shuffled after dealing, etc.
Why not say earthquakes are like terrorist bombings?
What advantage is there to the casino metaphor?
The physics of earthquakes might be stochastic. But it isn't.
A stochastic model might provide a compact, accurate description of earthquake phenomenology. But it doesn't.
A stochastic model might be useful for predicting future seismicity. But it isn't (Poisson, Gamma renewal, ETAS)
3 of the most destructive recent earthquakes were in regions seismic hazard maps showed to be relatively safe (2008 Wenchuan M7.9, 2010 Haiti M7.1, & 2011 Tohoku M9) Stein, Geller, & Liu, 2012
See also Mulargia, Geller, & Stark, 2017
Freedman, D.A., 1995, Some issues in the foundations of Statistics, Foundations of Science, 1, 19–39.
LeCam, L., 1977. A note on metastatistics or 'an essay towards stating a problem in the doctrine of chances', Synthese, 36, 133–160.
Mulargia, F., R.J. Geller, and P.B. Stark, 2017. Why is Probabilistic Seismic Hazard Analysis (PSHA) still used?, Physics of the Earth and Planetary Interiors, 264, 63–75.
Stark, P.B. and D.A. Freedman, 2003. What is the Chance of an Earthquake? in Earthquake Science and Seismic Risk Reduction, F. Mulargia and R.J. Geller, eds., NATO Science Series IV: Earth and Environmental Sciences, v. 32, Kluwer, Dordrecht, The Netherlands, 201–213. Preprint: https://www.stat.berkeley.edu/~stark/Preprints/611.pdf
Stark, P.B. and L. Tenorio, 2010. A Primer of Frequentist and Bayesian Inference in Inverse Problems. In Large Scale Inverse Problems and Quantification of Uncertainty, Biegler, L., G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio, B. van Bloemen Waanders and K. Willcox, eds. John Wiley and Sons, NY. Preprint: https://www.stat.berkeley.edu/~stark/Preprints/freqBayes09.pdf
Stark, P.B., 2015. Constraints versus priors. SIAM/ASA Journal on Uncertainty Quantification, 3(1), 586–598. doi:10.1137/130920721, Reprint: http://epubs.siam.org/doi/10.1137/130920721, Preprint: https://www.stat.berkeley.edu/~stark/Preprints/constraintsPriors15.pdf
Stark, P.B., 2016. Pay no attention to the model behind the curtain. https://www.stat.berkeley.edu/~stark/Preprints/eucCurtain15.pdf