Probability theory, understood meta-rationally
Ask: how and why to use it; and whether some other approach to uncertainty would be better.

This post illustrates the meta-rationalist understanding of rationality using a relatively simple example: ways we use probability theory.
It is a follow-on to “The parable of the pebbles,” an even simpler illustration of how rationality works in actuality.
Most of the second half of Part One of the book, starting with “Probabilism,” explains how the rationalist explanation of probability is wrong. This section sketches the meta-rationalist alternative. It’s also a warm-up to the discussion of meta-rational statistical practice in Part Five. That is much more complicated!
This is a section of the Meta-rationality book. It’s cross-posted from there.
Probability theory is formal, not rational
Rationality makes use of formal methods, but it is not just that. Part of what “formal” means is “not about anything in particular.” The number 37 is formal: it doesn’t specify 37 what. It is inherently meaningless, which is what gives it such power. “37 sheep,” on the other hand, can be rational and meaningful; not merely formal. More generally, mathematics is formal but not rational (in the reasonably standard sense I’m using that word).
“Probability theory” is formal, but not inherently rational (in this sense). It is a simple1 set of formal rules for manipulating numbers between 0.0 and 1.0. Those are referred to as “probabilities” in probability theory, but in the math itself they’re just numbers. They aren’t special or distinguished from other numbers in any way. So “probability theory” itself isn’t about anything. You can call 0.2792348 “a probability,” but unless you say what it is the probability of, it’s just a number.
This isn’t a defect in probability theory! It’s a beautiful mathematical system. It can also provide great practical value, when applied appropriately, in some situations, for some purposes. As we saw in “What probability can’t do,” inappropriate application can cause disasters.
How the rationalist misunderstanding is attractive
The promise of rationalism is certainty; and through that, control. For millennia, the promise was explicit. Rationalisms claimed that, with gradual intellectual progress, eventually everything could be known. Then utopia would be possible through complete control. That fantasy collapsed in the early twentieth century, with catastrophic social and cultural consequences.
Probabilistic rationalism offers a second-best fantasy: that there is an optimal way of thinking and acting in the face of uncertainty. Although we can never know anything for sure, we can use rules of evidence to increase justified confidence. Specifically, probability theory is the Correct set of rules of knowledge. Using it, we can gradually approach Truth more and more closely, and gain increasingly reliable control.
This “probabilism” systematically confuses uncertainty with probability. It makes the unthought assumption that if there’s uncertainty, there must be a Correct number to put on it, and you can use some definite method to determine what that is. This assumption is pure metaphysics. It is a matter of faith, whose basis is no more than emotional comfort.
A main cause of uncertainty is ontological nebulosity: the inherent indefiniteness of the actual world. It’s not just that it is difficult, or even impossible, to gain certain knowledge. It is that there are no absolute Truths about clouds, eggplants, dance moves, or bank runs. For rationalism, that is emotionally intolerable. So the rationalist reflex, when encountering ontological nebulosity, is to misunderstand it as epistemic uncertainty. Then one can say “no problem, at least not in principle: we know how to deal with that, probability theory ftw!”
Then probabilism assigns a special, unique role to probability theory as as the thing which connects other formalisms, such as physical theories, with the actual world. This is wrong for two reasons.
First, probability theory can’t do that. It’s a formal system of rules for manipulating numbers. It has no method for relating those to anything in the actual world.
We mostly can’t and don’t use math to assign probabilities. Instead, that requires circumrationality: concrete activities in the actual world, not a formal system. We get probabilities by counting things, by watching and seeing things, and by poking at things to find out what happens. We get probabilities by asking people things and arguing with them about things. We get probabilities by understanding things, for example with causal or mechanistic models.
Second, we relate numerous other formal systems to the actual world without using probability theory.
“The parable of the pebbles” uses the natural numbers {0, 1, 2, 3, …} as its example. The shepherd relates those to sheep by putting pebbles in a bucket.
In fact, in the parable, he uses the natural numbers themselves as a way of coping with uncertainty! Namely, his not knowing whether all the sheep are in the fold until he counts them.
Interpretations of probability theory
“Probability is the most important concept in modern science, especially as nobody has the slightest notion what it means.” —Bertrand Russell
Circumrational methods are nebulous: indefinite, various, complicated, unreliable, improvised, and unenumerable. That makes rationalism’s fantasy of an optimality guarantee impossible.
When confronted with nebulosity, rationalism retreats from the actual world into the imaginary metaphysical world. Everything there is an Ideal Platonic Form: perfectly definite, so that there are really truly true Truths about it. Then rationalism pretends the actual world is the metaphysical world.
Probability theory, in rationalist fantasies, is not just a random bit of math that is sometimes useful. It is a foundation of the ultimate structure of Reality. Probabilities, rationalisms insist, are not just numbers, as I asserted earlier. They are special metaphysical entities with a special role.
Rationalisms, however, disagree about what that role is. There are half a dozen dueling metaphysical theories, called “interpretations of probability theory.”2 These are impossible nonsense, if taken as metaphysics.
However, we will see that they can be understood instead as categories of circumrational work. As such, they may be useful guides for action.
According to metaphysical interpretations, probabilities are inherently of things. Interpretations differ as to what sort of thing they are of: individual events, types of events in sequences of similar ones, or beliefs, for example. They also differ as to where the probabilities live: in individual objects, general physical laws, specific mechanisms, or individual people’s minds.
Probability theory began as an analysis of gambling with six-sided dice. The probability that a rolled die will come up showing three pips is 1/6 ≈ 0.16667. What does that mean, and why is it true?
The common sense interpretation is that this is a fact about dice. An individual die has this “propensity” as an intrinsic property.
This is often the most practical way of understanding probability. What is the probability that my toothbrush will stop working soon and I’ll need to buy a new one? That’s a fact about that specific toothbrush, whose switch is getting flaky.
But, like, how is this supposed to work? Where inside the die is the propensity hiding, and what sort of thing is it? This interpretation is often the most useful in practice, but it’s not compatible with physics.
Analysis using Newtonian physics might yield the 1/6 probability in virtue of the physical, cubic symmetry of dice. Here the probability is the consequence of general physical laws. Each side is identical to the others, and there are six, so each has a 1/6 probability. This is elegant!
However, it holds true only if the rolling of the die is truly random. What does that mean? It is difficult to define “random” other than in terms of probability itself, making the analysis circular, and so probably unhelpful. Nevertheless, this way of understanding is also often useful, so long as you have some other basis for taking a process as random.
Also, very few situations in which we use probability involve physical symmetry, so it is not often useful in practice. Even individual dice are not perfectly symmetric. In actuality, the probability is not exactly 1/6; just very close to that.
You could find out what the true probability is by rolling the die a million times, counting how many of times it showed how many pips, and dividing by a million. More generally, if you repeat a process identically many times, you can get probabilities by counting outcomes.
On the “frequentist interpretation,” this is not just how to discover the probability. It’s what probability means; what it really is. This pragmatic interpretation is frequently useful in technical practice; in statistical quality control for manufacturing, for example.
However, it works only when you can repeat a process many times, identically. That is usually expensive when it is even possible. In many situations where we use probabilities, it is not possible even in principle. What is the probability that you will die in a freak eel accident on Friday the 13th of April, 2029? Either you will or you won’t; you can’t repeat the experiment the next day.
Back to dice. Newtonian physics isn’t random; it’s deterministic. In principle, you could predict the outcome of a die roll with perfect certainty if you knew exactly how fast it was spinning in which direction, how far above the tabletop it was, the table and die’s quantitative elasticity and coefficients of friction, and so forth. But you don’t know.
According to the “Bayesian” interpretation, to say that a die roll is “random” is not stating a fact about the world, but about your mind. “Random” means that you are ignorant.

Nevertheless, I can say that it’s awfully unlikely that I’ll get shocked to death by an electric eel, and pretty likely that I’ll need a new toothbrush soon. I’m somewhat ignorant, but I can bring relevant evidence to bear. There are no electric eels where I live, I have only ever seen them a couple times in public aquariums, and I haven’t visited one of those in decades. I have a super vague memory that their shocks are rarely fatal? On the other hand, a couple weeks after my last toothbrush’s power switch started flaking, it wouldn’t turn off no matter what I did. The noise was annoying and I had to leave it in the garage overnight to run down before chucking it in the E-waste box.
The Bayesian theory says that “awfully unlikely” and “pretty likely” are really numbers, or at any rate should be treated as numbers. If I knew the numerical probability that I would go boating in Guyana on that fateful day in 2029, and the fatality rate for eel shocks, and a slew of other relevant factors, I could calculate my probability of death. However, I can’t guess even to within an order of magnitude, and that is common in cases of uncertainty. Unfortunately, the Bayesian interpretation has nothing to say about how to get your numbers! So it’s useful only in certain atypical sorts of situations.
“Interpretations” as categories of circumrational work
“As to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel.” — Leonard Savage, founder of the “subjective Bayesian” school of interpretation
Many rationalists take the interpretations as mutually exclusive alternatives. Arguments about what probability “really means” are often bitter. This is generally true of metaphysical arguments, because metaphysics is pure fantasy. Each metaphysical interpretation is nonsense; none of them are True. Mathematical systems have no inherent meaning, in the sense of interpretability outside mathematics itself. Unconstrained by evidence, disagreement proceeds by content-free vituperation.
However, I’ve pointed out how each interpretation can be understood instead as describing a pragmatic category of uses. We can choose when, whether, and how to apply a mathematical system in the actual world. In this alternative understanding, none of the interpretations is True, but each can be a useful way of thinking and acting in some contexts for some purposes. (To be fair, most rationalists do recognize this, implicitly in their technical practice at least, even if not explicitly.)

Rationality requires circumrational work, as well as formal calculations. In our earlier example, integer arithmetic is not inherently about anything in particular. It has many practical uses, all of which involve circumrational work which relates numbers with aspects of the actual world, such as sheep. The type of work can be highly varied: dropping pebbles in a bucket, counting out loud, or gluing a digital-optical device in place and connecting it to your wifi network.
You can also maintain any of several quite different sorts of relationships between numbers and things. For example, using natural numbers as “cardinals” treats the items counted as interchangeable. For financial purposes, it may only matter that you own 37 sheep, and it doesn’t matter which ones. Alternatively, you can use natural numbers as “ordinals”: you assign one to each sheep as a unique label that distinguishes it. Sheep #17 is looking poorly; you need to remember to check on it tomorrow morning. Ordinal usage also establishes a priority relationship. If you always want to sell your oldest sheep when you go to market, numbering them individually at birth would do the trick.
The “interpretations” could be taken as describing broad classes of circumrational ways of getting numerical probabilities. “Meaning is use” is a meta-rational maxim.3 The frequentist interpretation recommends counting things. The Bayesian interpretation recommends gathering heterogeneous bits of evidence to combine their significance.
Actually doing that can be complicated and difficult. The interpretations give little guidance. Instead, this is a matter of circumrational methods, transmitted within communities of practice. Often those are informal, and sometimes tacit. Sometimes they are highly technical and laid out in detail in procedure manuals.
As in relating numbers with sheep, different interpretations of probability describe different sorts of relationships between numbers and uncertain things, useful for different sorts of purposes. The nature and function of “fair coins have a 0.5 probability of coming up heads” and “the probability that my toothbrush will die tomorrow is .047502” are quite different.
Meta-rational choices when using probability theory in the actual world
Using probability in technical work is typically not meta-rational. The meta-rational question is: In this situation, what ways can we use probability to get work done that’s valuable for our purposes? That’s rarely asked. Instead, you adopt the methods of your community of practice, without considering whether they are a good fit.
You do “what is done” in your subfield of chemical engineering, investment management, or experimental psychology. You collect the data in the way people doing that kind of work do it, and you feed it to the same statistical routine everyone does. If you did something different, your boss and peers wouldn’t know how to evaluate it. They would reject your project proposal or journal submission. Anyway, presumably there’s some good reason for the standard methods, which got worked out by someone famous years ago, and it’s not your place to buck against the system.
This is efficient so long as there is a good reason it works. “Shut up and calculate” is often right; but wrong in cases where meanings are unclear and consequential. When a community of practice loses meta-rational contact with actuality, catastrophe can ensue: as in the science replication crisis, and the Great Financial Crisis.
Meta-rationality asks:
What do we mean by “probability” here? Specifically how do these numbers relate to the actual world?
Are the ways we’re getting them reliable? Are better circumrational methods possible here?
What are these probabilities of? Are those the best things to consider probabilistically? (“Events” and “beliefs” are not objectively-given items. You have to choose which nebulous phenomena to treat as those for analytical purposes.)
Do these probabilities mean what we want them to? What inferences are we drawing from them, and why do we think that’s valid?
Are the actions our formal analysis recommends sane, all things considered? What risks might it be overlooking?
Can probability theory help us do the actual thing, or are we just performing rationality theater? Is it is the best approach here, or is some other way of dealing with uncertainty better?
The answers to these questions are matters of meta-rational judgement. Skill in answering depends on numerous, diverse considerations, maxims, and methods. There are no generally correct answers, and no generally correct methods for getting answers.
Metaphorical use of “probability”
This is common in everyday talk, including by people who have no formal knowledge of probability theory.
“Are you going to the party tonight?”
“Probably not.”
Such uses are perfectly cromulent in many informal contexts, even though what they mean is highly nebulous (and certainly not numerical).
People who do know probability theory often use its technical terms informally as well:
“You know that Lisa is going?”
“Oh! Ah. Hmm. Updating my priors!”
“Updating priors” is a technical term from the Bayesian interpretation. It’s used here metaphorically, to mean “Now I’m thinking about changing my mind, and going after all!” This is fine as a joke, or as in-group slang, or as a reminder to oneself to consider the matter further.
It can become dysfunctional if you refuse to admit that it’s metaphorical. If you pretend you are using probability theory when you’re just waving your hands, and start doing the calculations with numbers you just made up, you may get spurious confidence in meaningless results. Acting on that may be a big mistake.
What you are actually doing when “waving your hands” to “update your priors” is interesting and worth investigating. It’s mostly not currently understood. I recommend an excellent Twitter thread by Adam Strandberg exploring these points.
A current highly-consequential example is “p(doom).” This is an enjoyable party game in which everyone sits in a circle and announces their estimate of the probability that artificial intelligence will soon kill all human beings. Then all participants perform a Bayesian update, and go around the circle again giving their new numbers.
The numbers announced by famous, highly-placed experts in artificial intelligence vary wildly, and after many years have not converged, despite the party having gone around the circle many times. This suggests that something is wrong. Maybe lots of people with mathematics PhDs are failing to perform the simple arithmetic correctly? That seems quite improbable.
Or else, probability theory is not the right tool for the job. These numbers are, in fact, meaningless. They have nearly no relationship with reality, and actions based on them are likely to go badly. I’ve discussed this in “New evidence that AI predictions are meaningless” and “How not to predict the future.”
There’s a couple of ways this goes wrong. One is obvious. AI doom scenarios are stories in which a series of events occur, culminating in robots killing everyone, or an AI-engineered virus killing everyone, or whatever. Given probabilities for the individual steps toward doom, you can do a trivial calculation to get the probability of doom itself. But there’s no good way of getting the probabilities of those prerequisites.
A less obvious problem is that many different paths to doom are imaginable (plus probably ones we haven’t thought of yet). Unenumerably many factors are relevant to the probability of each step in each path. The most serious p(doom) analyses consider only a handful. As Molly Hickman explained her own “How not to predict the future,” the subset of factors you take into account radically affects your p(doom), even holding their individual probabilities constant.
Relevance is a central issue in meta-rationality. Choosing which factors to include in a probabilistic analysis is called “making a small world idealization.” (I explained those in “The probability of green cheese”; Hickman has a fine general discussion in her most recent post, “What we’re looking at and what we’re paying attention to.”) A small world idealization is always necessary, implicitly or explicitly, when using probability theory. It can be done more or less skillfully. When it can’t be done well, probability theory is worse than useless.
So what’s up with all the experts giving wildly different values for p(doom)? Many of them seem not to have done any math at all! (Probably because it’s obviously pointless.) Instead, their numbers are based on feels alone.
In practice, p(doom) means “how worried I feel, as an embodied emotion.” This is conceptually interesting, as another “interpretation” of probability theory which has been overlooked by philosophers!
It’s analogous to the Bayesian interpretation, often described as “strength of belief.” That “strength” is vaguely assumed to be a bloodless number in your head somehow. But often confidence and conviction, doubt and incredulity, are embodied sensations in your chest. Then “probability” lives in there, not in your head.
Most people would agree that human extinction would be quite bad, and therefore worth putting some effort into avoiding. The uncertainty is consequential; but probability theory is not a good way to address it. My book Better without AI suggests pragmatic alternatives.
Alternatives to probability theory
Sometimes using probability theory works. Sometimes it doesn’t. Sometimes there’s a better alternative!
Sometimes when I say this, probabilists (friendly but puzzled, or hostile and dismissive) ask:
OK, so what alternative mathematical system do you advocate?
The implicit assumption is that any alternative must be some similar formalism. That’s partly due to the systematic confusion of uncertainty and probability. It’s also partly because probabilistic rationality is misunderstood as only the formal part, because that’s all that gets taught explicitly.
Cox’s Theorem says there can’t be any similar alternative to standard probability theory. (I wrote about this in “Probability theory does not extend logic.”) So this can’t be the right question to ask in uncertain situations in which probability theory isn’t working well.
Probability theory is way of solving a type of formal problem that may not accurately model the type of real-world uncertainty you face. If you can only make up numbers, you are probably in this sort of situation! You don’t need a different solution to this formal problem. You need a different understanding of your real-world situation.
Rather than asking “Why does probability sometimes not work,” ask:
Why does it work, when it does? (“Because it’s formally correct” is a wrong answer.)
What real-world conditions make probabilistic methods likely to work well enough in practice?
When and how can one ensure that those adequacy conditions hold?
For instance, how can we alter this particular bit of the world to better fit probability theory?
What other, non-probabilistic approaches to uncertainty might work better here?
These are meta-rational questions. Some answers to the last one include:
Reengineer the situation to reduce uncertainty
By shielding equipment, for example
Prefer actions whose outcomes are less dependent on what you are uncertain about
Or whose failure modes are the least bad
Anticipate as many plausible outcomes as seems feasible and prepare contingency plans for dealing with them if they happen
Or, if none of them are catastrophic, wait to figure out how to cross the river until you get there
There’s a huge amount to say about each of these. The specifics depend on the type of situation you are in and what you are trying to do. They may be extensively technical. So, no general explanation is possible!
However, Part Four discusses many of these in moderate detail.
There’s the high school version of probability theory, which assumes a finite set of things assigned probabilities that sum to 1.0. That’s very simple; it’s just arithmetic. If you want to assign probabilities to continuously varying quantities, like how much it will rain tomorrow, you need the full-strength version. That’s only slightly more complicated, but you do need calculus to understand it.
If you’d like to read more about the metaphysical interpretations of probability theory, the Wikipedia article is a good place to start. For more detail, I recommend David R. MacIver’s “Probably enough probability for you.” His take on probability theory is meta-rational and quite close to mine. His discussion is clear, a good length, and somewhat more technical than mine here. I usually recommend the Stanford Encyclopedia of Philosophy, but its article is very long and very technical and probably way more detailed than you want.
The maxim is due originally to Ludwig Wittgenstein, considering linguistic meaning. As in my example of “The eggplant is a straw hat, and the spinach is yelling about politics.” But it applies to meanings generally: they are tools for getting purposive work done in context.



In antiquity, a similar thing happened, in a more primitive form. The Aristotelians thought tremendous progress had been made. Pyrrho came back from India and refuted Aristotelian metaphysics. In the field of medicine, this philosophical dispute created two camps: the rationalist/dogmatist camp and the skeptic/empiric camp.
Plato's Academy then re-interpreted Socrates to be like Pyrrho. For a while, the Academy was essentially Pyrrhonist. Then Carneades swaped pithanon for epoche - i.e., he retreated from suspension of judgment and allowed that one could choose on the basis of what seemed most plausible. This is the system Cicero studied and wrote about. When Cicero translated "pithanon" into Latin he used "probabilis" - the same word we derive "probability" from.
Could you give a pointer to any further reading on the connection to the 2008 financial crisis? Here, you just link to another page of yours where all that is said on the matter is "The belief that such methods and guarantees do exist has been a major cause of the 2008 financial crisis and the science replication crisis, among other catastrophes.", and the interested reader can only take that on faith :(