
I. The Null Hypothesis Ritual.
The American Statistical Association (ASA) was founded on November 27, 1839; it is the second-oldest continuously operating professional society in the United States, behind only the Massachusetts Medical Society (founded in 1781). The ASA publishes a number of journals, on its own and in partnership with other organizations, including the Journal of the American Statistical Association (JASA) and The American Statistician (TAS). It is among the most important mathematical societies in the world on matters of statistics, including technique and interpretation.
By 2015, the problems (that I’ve been railing about) with p-values in scientific publishing had gotten so bad that the Board of the ASA decided to convene a 20-member panel to write an Editorial, later published in The American Statistician in March 2016.
Underpinning many published scientific conclusions is the concept of “statistical significance,” typically assessed with an index called the p-value. While the p-value can be a useful statistical measure, it is commonly misused and misinterpreted. This has led to some scientific journals discouraging the use of p-values, and some scientists and statisticians recommending their abandonment, with some arguments essentially unchanged since p-values were first introduced.
Most important for my purposes is that the editorial (finally) gets around to addressing the most common logical fallacy (-ies) around p-values, which is the notion that a p-value tells us anything at all about the strength of the hypothesis under consideration - it does not - despite frequentists’ most ardent secret desire, yet public denial of same.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.
Wasserstein, R. L., & Lazar, N. A. (2016). “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108
This March 2016 Editorial by Ronald Wasserstein, Executive Director of the ASA since 2007, and Nicole Lazar, Editor-in-Chief of the JASA and head of Penn State’s Statistics Department, and the rest of the Board of the ASA, changed exactly nothing at all about “research funding, journal practices, career advancement, scientific education, public policy, journalism,” or “law.” Having failed to do anything about the “reproducibility crisis,” three years later Wasserstein, Lazar, and Allen Schirm published another (even stronger) Editorial on the same subject, this time in their individual capacities:
The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way.
To move forward to a world beyond “p < 0.05,” we must recognize afresh that statistical inference is not—and never has been—equivalent to scientific inference (Hubbard, Haig, and Parsa 2019; Ziliak 2019).
The American Statistician, 2019, VOL. 73, NO. S1, 1–19: Editorial, https://doi.org/10.1080/00031305.2019.1583913
This Editorial, it should be noted, had the same exact impact on the problems in “research funding, journal practices, career advancement,” et cet, as the previous one - which is to say, none whatsoever. If any of these apparently goode men and gentle lady had followed the career and writings of distinguished psychologist and professor Gerd Gigerenzer of the Max Planck Institute, they would have known that this outcome of their publication was well-nigh inevitable. The so-called scientific publishing industry should really be called the scientistic publishing: looks a lot like science, but really isn’t.
Professor Gigerenzer has for decades been standing like the mythical Colossus of Rhodes, holding his hand up against frequentist statistics/ significance testing in science, especially Null-Hypothesis Significance Testing (NHST), which is the statistical and philosophical infrastructure upon which virtually all “peer reviewed” journals are built. Professor Gigerenzer’s valiant attempt to explain just what the hell happened in science has gone completely unheeded, as far as I can determine, since at least 1987. Given his gentlemanly disposition he doesn’t say it that way, but he bears witness to - and documents - a stark change in his own field (psychology) during his professional career, coinciding directly with changes in statistical interpretation and experimentation. And as we shall see below, it didn’t stop in psychology.
II. The “Inference” Revolution
THE INFERENCE REVOLUTION
What happened between the time of Piaget, Kohler, Pavlov, Skinner, and Bartlett and the time I was trained? In Kendall’s (1942) words, statisticians “have already overrun every branch of science with a rapidity of conquest rivalled only by Attila, Mohammed, and the Colorado beetle” (p. 69).
What has been termed the probabilistic revolution in science (Gigerenzer et al., 1989; Krüger, Daston, & Heidelberger, 1987; Krüger, Gigerenzer, & Morgan, 1987) reveals how profoundly our understanding of nature changed when concepts such as chance and probability were introduced as fundamental theoretical concepts. The work of Mendel in genetics, that of Maxwell and Boltzmann on statistical mechanics, and the quantum mechanics of Schrödinger and Heisenberg that built indeterminism into its very model of nature are key examples of that revolution in thought.
…
But the real, enduring transformation came with statistical inference, which became institutionalized and used in a dogmatic and mechanized way. This use of statistical theory contrasts sharply with physics, where statistics and probability are indispensable in theories about nature, whereas mechanized statistical inference such as null hypothesis testing is almost unknown.
So what happened with psychology? David Murray and I described the striking change in research practice and named it the inference revolution in psychology (Gigerenzer & Murray, 1987). It happened between approximately 1940 and 1955 in the United States, and led to the institutionalization of one brand of inferential statistics as the method of scientific inference in university curricula, textbooks, and the editorials of major journals.
The figures are telling. Before 1940, null hypothesis testing using analysis of variance or 1test was practically nonexistent: Rucci and Tweney (1980) found only 17 articles in all from 1934 through 1940. By 1955, more than 80% of the empirical articles in four leading journals used null hypothesis testing (Sterling, 1959). Today, the figure is close to 100%. By the early 1950s, half of the psychology departments in leading U.S. universities had made inferential statistics a graduate program requirement (Rucci & Twency, 1980). Editors and experimenters began to measure the quality of research by the level of significance obtained.
Gigerenzer, G., “The SuperEgo, the Ego, and the Id in Statistical Reasoning” in A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues, ed. by Keren and Lewis (1993), pp. 311-339.
Prof. Gigerenzer’s publication list reads like a giant sign with “PLEASE! NO! STOP!” wherein he has been trying - unsuccessfully - to explain how and where things went wrong in statistical inference and to get people to stop using “the null ritual.”
See, e.g., Gigerenzer, G., & Edwards, A.G.K., “Simple tools for understanding risks: From innumeracy to insight,” British Medical Journal, 327, 741–744 (2003). See also Gigerenzer G., Krauss S., Vitouch O., “The null ritual: What you always wanted to know about null hypothesis testing but were afraid to ask,” in Kaplan D. (Ed.), SAGE Handbook on quantitative methods in the social sciences (pp. 391–408)(2004). Gigerenzer, G., “Statistical Rituals: The Replication Delusion and How We Got There,” Advances in Methods and Practices in Psychological Science, Volume 1, Issue 2, June 2018, pp. 198-218. https://journals.sagepub.com/doi/epub/10.1177/2515245918771329
Here’s a video of the good professor at the Broken Science Initiative - at a lecture of his I attended recently - along with a transcript of his talk, where he discusses some of this work, including the problems with the creation of, and obsession with, the null ritual.
So what do textbook writers do when there are two different ideas about statistical inference? One solution would have been, you present both. And maybe also Bayes or Tukey, and others, and teach researchers to use their judgment to develop a sense in what situation it’s not working and where it’s better to do this. No, that was not what textbook writers were going for. They created a hybrid theory of statistical inference that didn’t exist and doesn’t exist in statistics proper. Taking some parts from Fisher some parts from Neyman, and adding their own parts, mostly about the idea that scientific inference must be without any judgment.
That’s what I mean mindless… automatic.
And the essence of this hybrid theory is the null ritual.
Let me say this again for emphasis: the null-hypothesis significance testing (NHST) is a complete fiction. Gigerenzer has been (repeatedly) lecturing the scientific community about its origins - also to zero effect. The NHST is the bastard creation of textbook publishers. It is not a course in any mathematics curriculum. And just so people know what I’m talking about, here is how the null ritual is done (with my italicized comments):
Assume that the “null hypothesis” is true - pause. What the hell is a “null hypothesis?” I am told that this is “assuming no effect” from the tested intervention. But why should we assume this “anti-hypothesis” at all? What does it have to do with science?
Setting aside those problems for the moment, the next step is to set the significance interval at p = 0.05. “WTF- Why!? Why .05, as opposed to .0025, or .00125, or etc…? Who decided that number magically creates “significance” in the data.1
If the data from our experiment is less than p = 0.05, then voila! “Significance.” What does this tell us about our hypothesis? NOTHING at all. And yet… the world over - in scientistic publishing - authors, publishers, reporters, and lay people on the ‘Net all assert confidently - and the lower the p-value, the more confidently - that this data constitutes “proof” that the hypothesis is most decidedly NOT the byproduct of this force they call “random chance.”2
Profit!3
How accurately does our hypothesis or experimental setup describe reality? “Doesn’t matter! Just complete null ritual, Citizen. Significance follow.”
III. Does God Play Dice?
In physics, just like on the internet, we learn that “correlation does NOT equal causation…”
Well, yes, except that sometimes it does - in point of fact, a lot of times it does. Screaming “correlation does NOT equal causation” as a way of appearing internet smaaht is actually stating an exception the general rule, because in the physical world correlation quite often is causation.
Did you see that lightning flash right before you heard that sizzle-crack, followed by that big boom overhead? Are you telling me those highly-correlated events aren’t ACK-CHOO-ULL-LEE (also) causally related? Has the cat got your “correlation-does-not-equal-causation” tongue when you see a tornado spiral out of a giant midwest thunderstorm?
The 1927 Solvay Conference on physics became famous for multiple reasons: first, because of its adoption - by vote - of what we currently know as the predominant paradigm for physics: quantum mechanics. This is the Niels Bohr and Werner Heisenberg “Copenhagen Interpretation,” which asserts (among other things) that electrons and other subatomic particles don’t actually exist until they are measured. Prior to the act of measurement, the particles exist in “superposition” in a “wave function” that “collapses” into existence upon measurement. One other significant part of the Copenhagen Interpretation is the belief that probability is ontological: i.e. that the Universe itself is probabilistic. Einstein dissented from the Solvay Conference’s adoption of the “Copenhagen Interpretation” by famously saying (among many other things) that, “God does not play dice with the Universe.”
Over an eight-year running debate, Einstein and Bohr argued over Bohr’s Copenhagen Interpretation. The Copenhagen Interpretation won out, at least according to the academy, if not in the minds of everyone who was in the affected community.
This point also needs to be stressed, because most people who have not studied quantum theory on the full technical level are incredulous when told that it does not concern itself with causes; and, indeed, it does not even recognize the notion of ‘physical reality’. The currently taught interpretation of the mathematics is due to Niels Bohr, who directed the Institute for Theoretical Physics in Copenhagen; therefore it come to be called ‘The Copenhagen interpretation’.
As Bohr stressed repeatedly in his writings and lectures, present quantum theory can only answer questions of the form: ‘If this experiment is performed, what are the possible results and their probabilities?’ It cannot, as a matter of principle, answer any question of the form: ‘What is really happening when…?” Again, the mathematical formalism of present quantum theory, like Orwellian newspeak, does not even provide the vocabulary in which one could ask such a question…
Jaynes, E.T., “Probability Theory: the Logic of Science,” p. 328.
At this point, if you’re thinking, “Who TF is this Jaynes’ guy to be criticizing Niels Bohr and some of the greatest scientists and physicists ever?” then let me take a moment to make a long overdue introduction. Edwin T. Jaynes is not just an internet folk hero or Bayesian crank. He was, in his youth, one of four graduate students who had worked with Oppenheimer at UC Berkeley and then followed him to the east coast to the Institute for Advanced Studies (IAS) at Princeton, New Jersey.
I believe it is best to hear it from the horse’s mouth, so G. Larry Brethorst has done a wonderful job, not only in publishing “Probability Theory: The Logic of Science,” but also in putting together E.T. Jaynes’s biography page, which quotes Jaynes himself on this subject.
Between his junior and senior, year, he worked at the Warner Institute with Drs. Gustav Martin and Marvin R. Thompson. After receiving his B. A. he intended to return to the Warner Institute for another summer, but his plans changed because of the war. From 1942 to 1944 he worked for the Sperry Gyroscope Company on Long Island helping to develop Doppler radar.
At the end of 1944, he became Ensign Jaynes, and worked at the Anacostia Naval Research Lab in Washington D. C. developing microwave systems. During his stay in the Navy he spent some time on Guam. There is one picture of him on Guam holding a machine gun. When he was discharged in 1946, he was a lieutenant (j.g.). There are two documents written by Ensign Jaynes: the first is a series of 9 lectures on solving circuit problems using Laplace and Fourier transforms [1]; the second, is titled ``Theory of Microwave Coupling Systems'' [2]. These two documents constitute the earliest known professional writings of Ed Jaynes.
Jaynes left the Navy in 1946 and headed for California. In the summer of 1946 he worked in the W. W. Hansen Laboratories of Physics at Stanford on the design of the first linear electron accelerator. At the end of the summer, he enrolled at the University of California at Berkeley, In “Disturbing The Memory” [7] Jaynes notes:
“I first met Julian Schwinger, Robert Dicke, and Donald Hamilton during the War when we were all engaged in developing microwave theory, measurement techniques, and applications to pulsed and Doppler radar; Schwinger and Dicke at the MIT Radiation Laboratory, Hamilton and I at the Sperry Gyroscope Laboratories on Long Island. Bill Hansen (for whom the W. W. Hansen Laboratories at Stanford are now named) was running back and forth weekly, giving lectures at MIT and bringing us back the notes on the Schwinger lectures as they appeared, and I accompanied him on a few of those trips.
I first met Edward Teller when he visited Stanford in the Summer of 1946 and Hansen, Teller, and I discussed the design of the first Stanford LINAC, then underway. After some months of correspondence I first met J. R. Oppenheimer in September 1946, when I arrived at Berkeley as a beginning graduate student, to learn quantum theory from him -- the result of Bill Hansen having recommended us strongly to each other. When in the Summer of 1947 Oppy moved to Princeton to take over the Institute for Advanced Study, I was one of four students that he took along. The plan was that we would enroll as graduate students at Princeton University, finish our theses under Oppy although he was not officially a Princeton University faculty member; and turn them in to Princeton (which had agreed to this somewhat unusual arrangement in view of the somewhat unusual circumstances). My thesis was to be on Quantum Electrodynamics.
But, as this writer learned from attending a year of Oppy’s lectures (1946-47) at Berkeley, and eagerly studying his printed and spoken words for several years thereafter, Oppy would never countenance any retreat from the Copenhagen position... He derived some great emotional satisfaction from just those elements of mysticism that Schrödinger and Einstein had deplored, and always wanted to make the world still more mystical, and less rational.
This desire was expressed strongly in his 1955 BBC Reith lectures (of which I still have some cherished tape recordings which recall his style of delivery at its best). Some have seen this as a fine humanist trait. I saw it increasingly as an anomaly - a basically anti-scientific attitude in a person posing as a scientist - that explains so much of the contradictions in his character.
As a more practical matter, it presented me with a problem in carrying out my plan to write a thesis under Oppy’s supervision, quite aside from the fact that his travel and other activities made it so hard to see him. Mathematically, the Feynman electromagnetic propagator made no use of those superfluous degrees of freedom; it was equally well a Green’s function for an unquantized EM field. So I wanted to reformulate electrodynamics from the ground up without using field quantization. The physical picture would be very different; but since the successful Feynman rules used so little of that physical picture anyway, I did not think that the physical predictions would be appreciably different; at least, if the idea was wrong, I wanted to understand in detail why it was wrong.
If this meant standing in contradiction with the Copenhagen interpretation, so be it; I would be delighted to see it gone anyway, for the same reason that Einstein and Schrödinger would. But I sensed that Oppy would never tolerate a grain of this; he would crush me like an eggshell if I dared to express a word of such subversive ideas. I could do a thesis with Oppy only if it was his thesis, not mine.
Eugene Wigner became Ed’s thesis advisor in 1948.
In some ways we’ve now arrived at the reveal: Jaynes wasn’t - and isn’t - just “some guy.” His criticism of the Solvay Conference, and the Copenhagen Interpretation, is very much a part of the schism between frequentist and Bayesian understandings of what the difference is between statistics and probability in science.
We suggest, then, that those who try to justify the concept of ‘physical probability’ by pointing to quantum theory are entrapped in circular reasoning, not basically different from that noted above with coins and bridge hands. Probabilities in present quantum theory express the incompleteness of human knowledge just as truly as did those in classical statistical mechanics; only its origin is different.
In classical statistical mechanics, probability distributions represented our ignorance of the true microscopic coordinates - ignorance that was avoidable in principle but unavoidable in practice, but which did not prevent us from predicting reproducible phenomena, just because those phenomena are independent of the microscopic details.
In current quantum theory, probabilities express our own ignorance due to our failure to search for the real causes of the physical phenomena; and worse, our failure even to think seriously about the problem. This ignorance may be unavoidable in practice, but in our present state of knowledge we do not know whether it is unavoidable in principle; the ‘central dogma’ simply asserts this, and draws conclusions that belief in causes, and searching for them, is philosophically naïve. If everybody accepted this and abided by it, no further advances in understanding of physical law would ever be made; indeed, no such advance has been made since the 1927 Solvay Congress in which this mentality became solidified into physics. But it seems to us that this attitude places a premium on stupidity; to lack the ingenuity to think of a rational physical explanation is to support the supernatural view.4
These seem like rather bold declarations, but Jaynes is certainly entitled; he was in many ways one of the heirs to the original Einsteinian, deterministic universe view. Over time, Jaynes has come to stand in opposition to the frequentist “statistical” view that has come to dominate in all of the sciences. Sabine Hossenfelder, a theoretical physicist of more recent vintage, wrote in 2018 “Lost in Math: How Beauty Leads Physics Astray.” The upshot of Hossenfelder’s book is that physics, the hardest of the hard sciences, has largely abandoned empiricism in favor of mathematical elegance. Throughout the book she interviews well-regarded physicists and gets them to admit that what passes for current “state of the art” in physics is theoretical only. In fact, over the 6 years since the book was first published, you can find Hossenfelder gradually drifting closer and closer toward the Jaynes’ view in various youtube videos and lectures.5
In sum, the essence of the problem is this: statistics deals with data - ONLY. You can draw no inferences at all about the credibility of a hypothesis based upon the “significance” - or “unlikelihood” - of data, particularly based upon a comparison to some fictitious set of initinfinite trials that have never actually been done. The problem with using frequentist statistics to try to draw conclusions is - as we demonstrated with the Prosecutor’s Fallacy in Sally Clark’s case - that data, particularly when it’s in the form of frequency distributions for sample populations, can be sliced in ways that will seductively suggest to us things that we would already like to believe. And the more unusual the data, the more we convince ourselves that this is a “good enough” proxy for claiming that this anti-hypothesis we’ve assumed true (our “null”) so unlikely that… well, we must be right. Fisher and Popper used “special rules” and/or “consensus” to simply hop the logical gap to “I’m correct” when the data achieves significance. Frequentism relies upon improbable data to assert that “close enough” means the “opposite” of the null must be true - itself an incorrect logical leap.
This is why Frequentism, as practiced in the null ritual, has produced irreproducible “science” - because, ultimately, it is lazy science using statistics as a mask for what amounts to “I’m correct because six sigma, bruh.” It’s science according to the characters in a Bret Easton Ellis novel. The predictive power of a hypothesis - the hallmark of real science - not only distinguishes between conjecture, hypothesis, theory, and law specifically by the strength of their predictive power, it is the demarcation between science and non-science. In frequentist “science”, predictive strength is replaced as method of validation by (a) publication in a journal, of (b) data that is statistically unusual, (c) using a fabricated, hybrid-statistic by non-mathematicians (the null ritual) of (d) a conjecture that my peers agree with.
So, why did I do all of this by way of background? Well, first, because I wanted to build you up to seeing why I don’t believe the Copenhagen Interpretation of quantum theory - I do not believe God plays dice either - and to declare my fealty to the Laplace-Jeffries-Einstein-Polya-Jaynes line of reasoning that does not believe probability inheres in objects, either. But second, because the frequentist view is - of course - the one that our Supreme Court adopted and reified first in the 1920s (of course) and then updated in the 1960s and 1980s. We’ll take a look at the Court’s view of science from Frye to Daubert and Kumho Tire and see what it means for litigants and lawyers.
In David Stove’s “Scientific Irrationalism: the Birth of a Postmodern Cult,” he provides chapter and verse of cases where Sir Karl Popper, when faced with this problem, simply declared the need for “special rules” that would come from “consensus” among professionals. How utterly convenient for academic bullies like Fisher and Popper that science would somehow come down to votes by faculty - consensus - to determine whether someone was “right” or “wrong”. See David Stove,
Our good friend William Briggs explains - and vilifies - the null ritual in many ways on his blog, but here’s just one explanation of how wrong Popper, Fisher, and their intellectual descendants are via David Stove’s incisive writing.
https://knowyourmeme.com/memes/profit - except in scientistic publishing, it works. “Science” publications profit massively from a cartel-like monopoly on the work of authors who are beholden to this entire system.
Jaynes, “Probability Theory: The Logic of Science,” pp. 328-329.
See, for example, “Why Einstein was a Determinist.”
Let go your wee P! At least, don't flash it in public.