Category Archives: Probability

Why a density matrix is not a probability distribution

I’m back! And I’m ready to hit you with some really heavy thoughts that have been weighing me down, because I need to get them off my chest.

Years ago, Rob Spekkens and Matt Leifer published an article in which they tried to define a “causally neutral theory of quantum inference”. Their starting point was an analogy between a density matrix (actually the Choi-Jamiołkowski matrix of a CPT map, but hey, lets not split hairs) and a conditional probability distribution. They argued that in many respects, this “conditional density matrix” could be used to define equations of inference for density matrices, in complete analogy with the rules of Bayesian inference applied to probability distributions.

At the time, something about this idea just struck me as wrong, although I couldn’t quite put my finger on what it was. The paper is not technically wrong, but the idea just felt like the wrong approach to me. Matt Leifer even wrote a superb blog post explaining the analogy between quantum states and probabilities, and I was kind of half convinced by it. At least, my rational brain could find no flaw in the idea, but some deep, subterranean sense of aesthetics could not accept this analogy.

In my paper with Časlav Brukner on quantum causal models, we took a diametrically opposite approach. We refused to deal directly with quantum states, and instead tried to identify the quantum features of a system by looking only at the level of statistics, where the normal rules of inference would apply. The problem is, the price that you pay in working at the level of probabilities is that the structure of quantum states slips through your fingers and gets buried in the sand.

(I like to think of probability distributions as sand. You can push them around any which way, but the total amount of sand stays the same. Underneath the sand, there is some kind of ontological structure, like a dinosaur skeleton or an alien space-ship, whose ridges and contours sometimes show through in places where we brush the sand away. In quantum mechanics, it seems that we can never completely reveal what is buried, because when we clear away sand from one part, we end up dumping it on another part and obscuring it.)

Buried spaceship from the storyboard of The Thing (1982).

One problem I had with this probability-level approach was that the quantum structure did not emerge the way I had hoped. In particular, I could not find anything like a quantum Reichenbach Principle to replace the old classical Reichenbach Principle, and so the theory was just too unconstrained to be interesting. Other approaches along the same lines tended to deal with this by putting in the quantum stuff by hand, without actually `revealing’ it in a natural way. So I gave up on this for a while.

And what became of Leifer and Spekkens’ approach, the one that I thought was wrong? Well, it also didn’t work out. Their analogy seemed to break down when they tried to add the state of the system at more than two times. To put the last nail in, last year Dominic Horsman presented some work that showed that any approach along the lines of Leifer and Spekkens would run into trouble, because quantum states evolving in time just do not behave like probabilities do. Probabilities are causally neutral, which means that when we use information about one system to infer something about another system, it really doesn’t matter if the systems are connected in time or separated in space. With quantum systems, on the other hand, the difference between time and space is built into the formalism and (apparently) cannot easily be got rid of. Even in relativistic quantum theory, space and time are not quite on an equal footing (and Carlo Rovelli has had a lot to say about this in the past).

Still, after reading Horsman et. al.’s paper, I felt that it was too mathematical and didn’t touch the root of the problem with the Leifer-Spekkens analogy. Again, it was a case of technical accuracy but without conveying the deeper, intuitive reasons why the approach would fail. What finally reeled me back in was a recent paper by John-Mark Allen et. al in which they achieve an elegant definition of a quantum causal model, complete with a quantum Reichenbach Principle (even for multiple systems), but at the expense of essentially giving up on the idea of defining a quantum conditional state over time, and hence forfeiting the analogy with classical Bayesian inference. To me, it seemed like a masterful evasion, or like one of those dramatic Queen sacrifices you see in chess tournaments. They realized that what was standing in the way of quantum causal models was the desire to represent all of the structure of conditional probability distributions, but that this was not necessary for defining a causal model. So they were able to achieve a quantum causal model with a version of Reichenbach’s Principle, but at the price of retaining only a partial analog of classical Bayesian inference.

This left me pondering that old paper of Leifer and Spekkens. Why did they really fail? Is there really no way to salvage a causally neutral theory of quantum inference? I will do my best to answer only the first question here, leaving the second one open for the time being.

One strategy to use when you suspect something is wrong with an idea, is to imagine a world in which the idea is true, and then enter that world and look around to see what is wrong with it. So let us imagine that, indeed, all of the usual rules of Bayesian inference for probabilities have an exact counterpart in terms of density matrices (and similar objects). What would that mean?

Well, if it looks like a duck and quacks like a duck … we could apply Ockham’s razor and say that a density matrix must actually represent a duck probability distribution. This is great news! It means that we can argue that density matrices are actually epistemic objects — they represent our ignorance about some underlying reality. This could be the chance we’ve been waiting for to sweep away all the sand and reveal the quantum skeleton underneath!

The problem is that this is too good to be true. Why? Because it seems to imply that a density matrix uniquely corresponds to a probability distribution over the elements of reality (whatever they are). In jargon, this means that the resulting model would be preparation non-contextual. But this is impossible because — as Spekkens himself proved in a classic paper — any ontological model of quantum mechanics must be preparation contextual.

Let me try to simplify that. It turns out that there are many different ways to prepare a quantum system, such that it ends up being described by the same statistics (as determined by its density matrix). For example, if a machine prepares one of two possible pure states based on the outcome of a coin flip (whose outcome is unknown to us), this can result in the same density matrix for the system as a machine that deterministically entangles the quantum system with another system (which we don’t have access to). These details about the preparation that don’t affect the density matrix are called the preparation context.

The thing is, if a density matrix is really interpretable as a probability distribution, then which distribution it represents has to depend on the preparation context (that is what Spekkens proved). Since Leifer and Spekkens are only looking at density matrices (sans context), we should not expect them to behave just like probability distributions — the analogy has to break somewhere.

Now this is nowhere near a proof — that’s why it is appearing here on a dodgy blog instead of in a scientific publication. But I think it does capture the reason why I felt that the approach of Leifer and Spekkens was just `wrong’: they seemed to be placing density matrices into the role of probabilities, where they just don’t fit.

Now let me point out some holes in my own argument. Although a density matrix can’t be thought of as a single probability distribution, it can perhaps be thought of as representing an equivalence class of distributions, and maybe these equivalence classes could turn out obey all the usual laws of classical inference, thereby rescuing the analogy. However, there is absolutely no reason to expect this to be true — on the contrary, one would expect it to be false. To me, this would be almost like if you tried to model a single atom as if it were a whole gas of particles, and found that it works. Or, to get even more Zen, it would be like an avalanche consisting of one grain of sand.

Image credit:
“To see a World in a Grain of Sand  And a Heaven in a Wild Flower        Hold Infinity in the palm of your hand And Eternity in an hour …”       –William Blake

It is interesting that the analogy can be carried as far as it has been. Perhaps this can be accounted for by the fact that, even though they can’t serve as replacements for probability distributions, density matrices do have a tight relationship with probabilities through the Born rule (the rule that tells us how to predict probabilities for measurements on a quantum system). So maybe we should expect at least some of the properties of probabilities to somehow rub off on density matrices.

Although it seems that a causally neutral theory of Bayesian inference cannot succeed using just density matrices (or similar objects), perhaps there are other approaches that would be more fruitful. What if one takes an explicitly preparation-contextual ontological model (like the fascinating Beltrametti-Bugajski model) and uses it to supplement our density matrices with the context that they need in order to identify them with probability distributions? What sort of theory of inference would that give us? Or, what if we step outside of the ontological models framework and look for some other way to define quantum inference? The door remains tantalizingly open.



The trouble with Reichenbach

(Note: this blog post is vaguely related to a paper I wrote. You can find it on the arXiv here. )

Suppose you are walking along the beach, and you come across two holes in the rock, spaced apart by some distance; let us label them ‘A’ and ‘B’. You observe an interesting correlation between them. Every so often, at an unpredictable time, water will come spraying out of hole A, followed shortly after by a spray of water out of hole B. Given our day-to-day experience of such things, most of us would conclude that the holes are connected by a tunnel underneath the rock, which is in turn connected to the ocean, such that a surge of water in the underground tunnel causes the water to spray from the two holes at about the same time.

Image credit: some douchebag
Now, therein lies a mystery: how did our brains make this deduction so quickly and easily? The mere fact of a statistical correlation does not tell us much about the direction of cause and effect. Two questions arise. First, why do correlations require explanations in the first place? Why can we not simply accept that the two geysers spray water in synchronisation with each other, without searching for explanations in terms of underground tunnels and ocean surges? Secondly, how do we know in this instance that the explanation is that of a common cause, and not that (for example) the spouting of water from one geyser triggers some kind of chain reaction that results in the spouting of water from the other?

The first question is a deep one. We have in our minds a model of how the world works, which is the product partly of history, partly of personal experience, and partly of science. Historically, we humans have evolved to see the world in a particular way that emphasises objects and their spatial and temporal relations to one another. In our personal experience, we have seen that objects move and interact in ways that follow certain patterns: objects fall when dropped and signals propagate through chains of interactions, like a series of dominoes falling over. Science has deduced the precise mechanical rules that govern these motions.

According to our world-view, causes always occur before their effects in time, and one way that correlations can arise between two events is if one is the cause of the other. In the present example, we may reason as follows: since hole B always spouts after A, the causal chain of events, if it exists, must run from A to B. Next, suppose that I were to cover hole A with a large stone, thereby preventing it from emitting water. If the occasion of its emission were the cause of hole B’s emission, then hole B should also cease to produce water when hole A is covered. If we perform the experiment and we find that hole B’s rate of spouting is unaffected by the presence of a stone blocking hole A, we can conclude that the two events of spouting water are not connected by a direct causal chain.

The only other way in which correlations can arise is by the influence of a third event — such as the surging of water in an underground tunnel — whose occurrence triggers both of the water spouts, each independently of the other. We could promote this aspect of our world-view to a general principle, called the Principle of the Common Cause (PCC): whenever two events A and B are correlated, then either one is a cause of the other, or else they share a common cause (which must occur some time before both of these events).

The Principle of Common Cause tells us where to look for an explanation, but it does not tell us whether our explanation is complete. In our example, we used the PCC to deduce that there must be some event preceding the two water spouts which explains their correlation, and for this we proposed a surge of water in an underground tunnel. Now suppose that the presence of water in this tunnel is absolutely necessary in order for the holes to spout water, but that on some occasions the holes do not spout even though there is water in the tunnel. In that case, simply knowing that there is water in the tunnel does not completely eliminate the correlation between the two water spouts. That is, even though I know there is water in the tunnel, I am not certain whether hole B will emit water, unless I happen to know in addition that hole A has just spouted. So, the probability of B still depends on A, despite my knowledge of the ‘common cause’. I therefore conclude that I do not know everything that there is to know about this common cause, and there is still information to be had.


It could be, for instance, that the holes will only spout water if the water pressure is above a certain threshold in the underground tunnel. If I am able to detect both the presence of the water and its pressure in the tunnel, then I can predict with certainty whether the two holes will spout or not. In particular, I will know with certainty whether hole B is going to spout, independently of A. Thus, if I had stakes riding on the outcome of B, and you were to try and sell me the information “whether A has just spouted”, I would not buy it, because it does not provide any further information beyond what I can deduce from the water in the tunnel and its pressure level. It is a fact of general experience that, conditional on complete knowledge of the common causes of two events, the probabilities of those events are no longer correlated. This is called the principle of Factorisation of Probabilities (FP). The union of FP and PCC together is called Reichenbach’s Common Cause Principle (RCCP).


In the above example, the complete knowledge of the common cause allowed me to perfectly determine whether the holes would spout or not. The conditional independence of these two events is therefore guaranteed. One might wonder why I did not talk about the principle of predetermination: conditional on on complete knowledge of the common causes, the events are determined with certainty. The reason is that predetermination might be too strong; it may be that there exist phenomena that are irreducibly random, such that even a full knowledge of the common causes does not suffice to determine the resulting events with certainty.

As another example, consider two river beds on a mountain slope, one on the left and one on the right. Usually (96% of the time) it does not rain on the mountain and both rivers are dry. If it does rain on the mountain, then there are four possibilities with equal likelihood: (i) the river beds both remain dry, (ii) the left river flows but the right one is dry (iii) the right river flows but the left is dry, or (iv) both rivers flow. Thus, without knowing anything else, the fact that one river is running makes it more likely that the other one is. However, conditional that it rained on the mountain, if I know that the left river is flowing (or dry), this does not tell me anything about whether the right river is flowing or dry. So, it seems that after conditioning on the common cause (rain on the mountain) the probabilities factorise: knowing about one river tells me nothing about the other.


Now we have a situation in which the common cause does not completely determine the outcomes of the events, but where the probabilities nevertheless factorise. Should we then conclude that the correlations are explained? If we answer ‘yes’, we have fallen into a trap.

The trap is that there may be additional information which, if discovered, would make the rivers become correlated. Suppose I find a meeting point of the two rivers further upstream, in which sediment and debris tends to gather. If there is only a little debris, it will be pushed to one side (the side chosen effectively at random), diverting water to one of the rivers and blocking the other. Alternatively, if there is a large build-up of debris, it will either dam the rivers, leaving them both dry, or else be completely destroyed by the build-up of water, feeding both rivers at once. Now, if I know that it rained on the mountain and I know how much debris is present upstream, knowing whether one river is flowing will provide information about the other (eg. if there is a little debris upstream and the right river is flowing, I know the left must be dry).


Before I knew anything, the rivers seemed to be correlated. Conditional on whether it rained on the mountain-top, the correlation disappeared. But now, conditional that it rained on the mountain and on the amount of debris upstream, the correlation is restored! If the only tools I had to explain correlations was the PCC and the FP, then how can I ever be sure that the explanation is complete? Unless the information of the common cause is enough to predetermine the outcomes of the events with certainty, there is always the possibility that the correlations have not been explained, because new information about the common causes might come to light which renders the events correlated again.

Now, at last, we come to the main point. In our classical world-view, observations tend to be compatible with predetermination. No matter how unpredictable or chaotic a phenomenon seems, we find it natural to imagine that every observed fact could be predicted with certainty, in principle, if only we knew enough about its relevant causes. In that case, we are right to say that a correlation has not been fully explained unless Reichenbach’s principle is satisfied. But this last property is now just seen as a trivial consequence of predetermination, implicit in out world-view. In fact, Reichenbach’s principle is not sufficient to guarantee that we have found an explanation. We can only be sure that the explanation has been found when the observed facts are fully determined by their causes.

This poses an interesting problem to anyone (like me) who thinks the world is intrinsically random. If we give up predetermination, we have lost our sufficient condition for correlations to be explained. Normally, if we saw a correlation, after eliminating the possibility of a direct cause we would stop searching for an explanation only when we found one that could perfectly determine the observations. But if the world is random, then how do we know when we have found a good enough explanation?

In this case, it is tempting to argue that Reichenbach’s principle should be taken as a sufficient (not just necessary) condition for an explanation. Then, we know to stop looking for explanations as soon as we have found one that causes the probabilities to factorise. But as I just argued with the example of the two rivers, this doesn’t work. If we believed this, then we would have to accept that it is possible for an explained correlation to suddenly become unexplained upon the discovery of additional facts! Short of a physical law forbidding such additional facts, this makes for a very tenuous notion of explanation indeed.

So fuck off
The question of what should constitute a satisfactory explanation for a correlation is, I think, one of the deepest problems posed to us by quantum mechanics. The way I read Bell’s theorem is that (assuming that we accept the theorem’s basic assumptions) quantum mechanics is either non-local, or else it contains correlations that do not satisfy the factorisation part of Reichenbach’s principle. If we believe that factorisation is a necessary part of explanation, then we are forced to accept non-locality. But why should factorisation be a necessary requirement of explanation? It is only justified if we believe in predetermination.

A critic might try to argue that, without factorisation, we have lost all ability to explain correlations. But I’m saying that this true even for those who would accept factorisation but reject predetermination. I say, without predetermination, there is no need to hold on to factorisation, because it doesn’t help you to explain correlations any better than the rest of us non-determinists! So what are we to do? Maybe it is time to shrug off factorisation and face up to the task of finding a proper explanation for quantum correlations.

Stop whining and accept these axioms.

One of the stated goals of quantum foundations is to find a set of intuitive physical principles, that can be stated in plain language, from which the essential structure of quantum mechanics can be derived.

So what exactly is wrong with the axioms proposed by Chiribella et. al. in arXiv:1011.6451 ? Loosely speaking, the principles state that information should be localised in space and time, that systems should be able to encode information about each other, and that every process should in principle be reversible, so that information is conserved. The axioms can all be explained using ordinary language, as demonstrated in the sister paper arXiv:1209.5533. They all pertain directly to the elements of human experience, namely, what real experimenters ought to be able to do with the systems in their laboratories. And they all seem quite reasonable, so that it is easy to accept their truth. This is essential, because it means that the apparently counter intuitive behaviour of QM is directly derivable from intuitive principles, much as the counter intuitive aspects of special relativity follow as logical consequences of its two intuitive axioms, the constancy of the speed of light and the relativity principle. Given these features, maybe we can finally say that quantum mechanics makes sense: it is the only way that the laws of physics can lead to a sensible model of information storage and communication!

Let me run through the axioms briefly (note to the wise: I take the `causality’ axiom as implicit, and I’ve changed some of the names to make them sound nicer). I’ll assume the reader is familiar with the distinction between pure states and mixed states, but here is a brief summary. Roughly, a pure state describes a system about which you have maximum information, whereas a mixed state can be interpreted as uncertainty about which pure state the system is really in. Importantly, a pure state does not need to determine the outcomes to every measurement that could be performed on it: even though it contains maximal information about the state, it might only specify the probabilities of what will happen in any given experiment. This is what we mean when we say a theory is `probabilistic’.

First axiom (Distinguishability): if there is a mixed state, for which there is at least one pure state that it cannot possibly be with any probability, then the mixed state must be perfectly distinguishable from some other state (presumably, the aforementioned one). It is hard to imagine how this rule could fail: if I have a bag that contains either a spider or a fly with some probability, I should have no problem distinguishing it from a bag that contains a snake. On the other hand, I can’t so easily tell it apart from another bag that simply contains a fly (at least not in a single trial of the experiment).

Second axiom (Compression): If a system contains any redundant information or `extra space’, it should be possible to encode it in a smaller system such that the information can be perfectly retrieved. For example, suppose I have a badly edited book containing multiple copies of some pages, and a few blank pages at the end. I should be able to store all of the information written in the book in a much smaller book, without losing any information, just by removing the redundant copies and blank pages. Moreover, I should be able to recover the original book by copying pages and adding blank pages as needed. This seems like a pretty intuitive and essential feature of the way information is encoded in physical systems.

Third axiom (Locality of information): If I have a joint system (say, of two particles) that can be in one of two different states, then I should be able to distinguish the two different states over many trials, by performing only local measurements on each individual particle and using classical communication. For example, we allow the local measurements performed on one particle to depend on the outcomes of the local measurements on the other particle. On the other hand, we do not need to make use of any other shared resources (like a second set of correlated particles) in order to distinguish the states. I must admit, out of all the axioms, this one seems the hardest to justify intuitively. What indeed is so special about local operations and classical communication that it should be sufficient to tell different states apart? Why can’t we imagine a world in which the only way to distinguish two states of a joint system is to make use of some other joint system? But let us put this issue aside for the moment.

Fourth axiom (Locality of ignorance): If I have two particles in a joint state that is pure (i.e. I have maximal information about it) and if I measure one of them and find it in a pure state, the axiom states that the other particle must also be in a pure state. This makes sense: if I do a measurement on one subsystem of a pure state that results in still having maximal information about that subsystem, I should not lose any information about the other subsystems during the process. Learning new information about one part of a system should not make me more ignorant of the other parts.

So far, all of the axioms described above are satisfied by classical and quantum information theory. Therefore, at the very least, if any of these axioms do not seem intuitive, it is only because we have not sufficiently well developed our intuitions about classical physics, so it cannot really be taken as a fault of the axioms themselves (which is why I am not so concerned about the detailed justification for axiom 3). The interesting axiom is the last one, `purification’, which holds in quantum physics but not in probabilistic classical physics.

Fifth axiom (Conservation of information) [aka the purification postulate]: Every mixed state of a system can be obtained by starting with several systems in a joint pure state, and then discarding or ignoring all except for the system in question. Thus, the mixedness of any state can be interpreted as ignorance of some other correlated states. Furthermore, we require that the purification be essentially unique: all possible pure states of the total set of systems that do the job must be convertible into one another by reversible transformations.

As stated above, it is not so clear why this property should hold in the world. However, it makes more sense if we consider one of its consequences: every irreversible, probabilistic process can be obtained from a reversible process involving additional systems, which are then ignored. In the same way that statistical mechanics allows us to imagine that we could un-scramble an egg, if only we had complete information about its individual atoms and the power to re-arrange them, the purification postulate says that everything that occurs in nature can be un-done in principle, if we have sufficient resources and information. Another way of stating this is that the loss of information that occurs in a probabilistic process is only apparent: in principle the information is conserved somewhere in the universe and is never lost, even though we might not have direct access to it. The `missing information’ in a mixed state is never lost forever, but can always be accessed by some observer, at least in principle.

It is curious that probabilistic classical physics does not obey this property. Surely it seems reasonable to expect that one could construct a probabilistic classical theory in which information is ultimately conserved! In fact, if one attempts this, one arrives at a theory of deterministic classical physics. In such a theory, having maximal knowledge of a state (i.e. the state is pure) further implies that one can perfectly predict the outcome of any measurement on the state, but this means the theory is no longer probabilistic. Indeed, for a classical theory to be probabilistic in the sense that we have defined the term, it necessarily allows processes in which information is irretrievably lost, violating the spirit of the purification postulate.

In conclusion, I’d say this is pretty close to the mystical “Zing” that we were looking for: quantum mechanics is the only reasonable theory in which processes can be inherently probabilistic while at the same time conserving information.

The Zen of the Quantum Omlette

[Quantum mechanics] is not purely epistemological; it is a peculiar mixture describing in part realities of Nature, in part incomplete human information about Nature, all scrambled up by Heisenberg and Bohr into an omelette that nobody has seen how to unscramble. Yet we think that the unscrambling is a prerequisite for any further advance in basic physical theory. For, if we cannot separate the subjective and objective aspects of the formalism, we cannot know what we are talking about; it is just that simple.” [1]

— E. T. Jaynes

Note: this post is about foundational issues in quantum mechanics, which means it is rather long and may be boring to non-experts (not to mention a number of experts). I’ve tried to use simple language so that the adventurous layman can nevertheless still get the gist of it, if he or she is willing (hey, fortune favours the brave).

As I’ve said before, I think research on the foundations of quantum mechanics is important. One of the main goals of work on foundations (perhaps the main goal) is to find a set of physical principles that can be stated in common language, but can also be implemented mathematically to obtain the model that we call `quantum mechanics’.

Einstein was a big fan of starting with simple intuitive principles on which a more rigorous theory is based. The special and general theories of relativity are excellent examples. Both are based on the `Principle of Relativity’, which states (roughly) that motion between two systems is purely relative. We cannot say whether a given system is truly in motion or not; the only meaningful question is whether the system is moving relative to some other system. There is no absolute background space and time in which objects move or stand still, like actors on a stage. In fact there is no stage at all, only the mutual distances between the actors, as experienced by the actors themselves.

The way I have stated the principle is somewhat vague, but it has a clear philosophical intention which can be taken as inspiration for a more rigorous theory. Of particular interest is the identification of a concept that is argued to be meaningless or illusory — in this case the concept of an object having a well-defined motion independent of other objects. One could arrive at the Principle of Relativity by noticing an apparent conspiracy in the laws of nature, and then invoking the principle as a means of avoiding the conspiracy. If we believe that motion is absolute, then we should find it mighty strange that we can play a game of ping-pong on a speeding train, without getting stuck to the wall. Indeed, if it weren’t for the scenery flying past, how would we know we were traveling at all? And even then, as the phrasing suggests, could we not easily imagine that it is the scenery moving past us while we remain still? Why, then, should Nature take such pains to hide from us the fact that we are in motion? The answer is the Zen of relativity — Nature does not conceal our true motion from us, instead, there is no absolute motion to speak of.

A similar leap is made from the special to the general theory of relativity. If we think of gravity as being a field, just like the electromagnetic field, then we notice a very strange coincidence: the charge of an object in the gravitational field is exactly equal to its inertial mass. By contrast, a particle can have an electric charge completely unrelated to its inertia. Why this peculiar conspiracy between gravitational charge and inertial mass? Because, quoth Einstein, they are the same thing. This is essentially the `Principle of Equivalence’ on which Einstein’s theory of gravity is based.


These considerations tell us that to find the deep principles in quantum mechanics, we have to look for seemingly inexplicable coincidences that cry out for explanation. In this post, I’ll discuss one such possibility: the apparent equivalence of two conceptually distinct types of probabilistic behaviour, that due to ignorance and that due to objective uncertainty. The argument runs as follows. Loosely speaking, in classical physics, one does not seem to require any notion of objective randomness or inherent uncertainty. In particular, it is always possible to explain observations using a physical model that is ontologically within the bounds of classical theory and such that all observable properties of a system are determined with certainty. In this sense, any uncertainty arising in classical experiments can always be regarded as our ignorance of the true underlying state of affairs, and we can perfectly well conceive of a hypothetical perfect experiment in which there is no uncertainty about the outcomes.

This is not so easy to maintain in quantum mechanics: any attempt to conceive of an underlying reality without uncertainty seems to result in models of the world that violate dearly-held principles, like the idea that signals cannot propagate faster than light, and experimenters have free will. This has prompted many of us to allow some amount of `objective’ uncertainty into our picture of the world, where even the best conceivable experiments must have some uncertain outcomes. These outcomes are unknowable, even in principle, until the moment that we choose to measure them (and the very act of measurement renders certain other properties unknowable). The presence of these two kinds of randomness in physics — the subjective randomness, which can always be removed by some hypothetical improved experiment, and the objective kind of randomness, which cannot be so removed — leads us into another dilemma, namely, where is the boundary that separates these two kinds of uncertainty?

E.T. Jaynes
“Are you talkin’ to me?”

Now at last we come to the `omelette’ that badass statistician and physicist E.T. Jaynes describes in the opening quote. Since quantum systems are inherently uncertain objects, how do we know how much of that uncertainty is due to our own ignorance, and how much of it is really `inside’ the system itself? Views range from the extreme subjective Bayesian (all uncertainty is ignorance) to various other extremes like the many-worlds interpretation (in which, arguably, the opposite holds: all uncertainty is objective). But a number of researchers, particularly those in the quantum information community, opt for a more Zen-like answer: the reason we can’t tell the difference between objective and subjective probability is that there is no difference. Asking whether the quantum state describes my personal ignorance about something, or whether the state “really is” uncertain, is a meaningless question. But can we take this Zen principle and turn it into something concrete, like the Relativity principle, or are we just by semantics avoiding the problem?

I think there might be something to be gained from taking this idea seriously and seeing where it leads. One way of doing this is to show that the predictions of quantum mechanics can be derived by taking this principle as an axiom. In this paper by Chiribella et. al., the authors use the “Purification postulate”, plus some other axioms, to derive quantum theory. What is the Purification postulate? It states that “the ignorance about a part is always compatible with a maximal knowledge of the whole”. Or, in my own words, the subjective ignorance of one system about another system can always be regarded as the objective uncertainty inherent in the state that encompasses both.

There is an important side comment to make before examining this idea further. You’ll notice that I have not restricted my usage of the word `ignorance’ to human experimenters, but that I take it to apply to any physical system. This idea also appears in relativity, where an “observer in motion” can refer to any object in motion, not necessarily a human. Similarly, I am adopting here the viewpoint of the information theorists, which says that two correlated or interacting systems can be thought of as having information about each other, and the quantification of this knowledge entails that systems — not just people — can be ignorant of each other in some sense. This is important because I think that an overly subjective view of probabilities runs the risk of concealing important physics behind the definition of the `rational agent’, which to me is a rather nebulous concept. I prefer to take the route of Rovelli and make no distinction between agents and generic physical systems. I think this view fits quite naturally with the Purification postulate.

In the paper by Chiribella et. al., the postulate is given a rigorous form and used to derive quantum theory. This alone is not quite enough, but it is, I think, very compelling. To establish the postulate as a physical principle, more work needs to be done on the philosophical side. I will continue to use Rovelli’s relational interpretation of quantum mechanics as an integral part of this philosophy (for a very readable primer, I suggest his FQXi essay).

In the context of this interpretation, the Purification postulate makes more sense. Conceptually, the quantum state does not represent information about a system in isolation, but rather it represents information about a system relative to another system. It is as meaningless to talk about the quantum state of an isolated system as it is to talk about space-time without matter (i.e. Mach’s principle [2]). The only meaningful quantities are relational quantities, and in this spirit we consider the separation of uncertainty into subjective and objective parts to be relational and not fundamental. Can we make this idea more precise? Perhaps we can, by associating subjective and objective uncertainty with some more concrete physical concepts. I’ll probably do that in a follow up post.

I conclude by noting that there are other aspects of quantum theory that cry out for explanation. If hidden variable accounts of quantum mechanics imply elements of reality that move faster than light, why does Nature conspire to prevent us using them for sending signals faster than light? And since the requirement of no faster-than-light signalling still allows correlations that are stronger than entanglement, why does entanglement stop short of that limit? I think there is still a lot that could be done in trying to turn these curious observations into physical principles, and then trying to build models based on them.

Halloween special: Boltzmann Brains

Author’s note: I wanted to wait before doing another post on anthropic reasoning, but this topic was just too good to pass up just after Halloween [1].

The Incredible Hercules #133

1: Are You A Disembodied Brain?

Our story begins with Ludwig Boltzmann’s thermodynamic solution to the arrow-of-time problem. The problem is to explain why the laws of physics at the microscopic scale appear to be reversible, but the laws as seen by us seem to follow a particular direction from past to future. Boltzmann argued that, provided the universe started in a low entropy state, the continual increase in entropy due to the second law of thermodynamics would explain the observed directionality of time. He thereby reduced the task to the lesser problem of explaining why the universe started in a low entropy state in the first place (incidentally, that is pretty much where things stand today, with some extra caveats). Boltzmann had his own explanation for this, too: he argued that if the universe were big enough, then even though it might be in a maximum entropy state, there would have to be random fluctuations in parts of the universe that would lead to local low-entropy states. Since human beings could only have come to exist in a low-entropy environment, we should not be surprised that our part of the universe started with low entropy, even though this is extremely unlikely within the overall model. Thus, one can use an observer-selection effect to explain the arrow of time.

Sadly, there is a crucial flaw in Boltzmann’s argument. Namely, that it doesn’t explain why we find ourselves in such a large region of low entropy as the observable universe. The conditions for conscious observers to exist could have occurred in a spontaneous fluctuation much smaller than the observable universe – so if the total universe was indeed very large and in thermal equilibrium, we should expect to find ourselves in just a small bubble of orderly space, outside of which is just featureless radiation, instead of the stars and planets that we actually do see. In fact, the overwhelming number of conscious observers in such a universe would just be disembodied brains that fluctuated into existence by pure chance. It is extremely unlikely that any of these brains would share the same experiences and memories as real people born and raised on Earth within a low-entropy patch of the universe, so Boltzmann’s argument seems to be unable to account for the fact that we do have experiences consistent with this scenario, rather than with the hypothesis that we are disembodied brains surrounded by thermal radiation.

Matters, as always, are not quite so simple. It is possible to rescue Boltzmann’s argument by the following rationale. Suppose I believe it possible that I could be a Boltzmann Brain. Clearly, my past experiences exhibit a level of coherence and order that is not typical of your average Boltzmann Brain. However, there is still some subset of Boltzmann Brains which, by pure chance, fluctuated into existence with an identical set of memories and past experiences encoded into their neurons so as to make their subjective experiences identical to mine. Even though they are a tiny fraction of all Boltzmann Brains, there are still vastly more of them than there are `really human’ versions of me that actually evolved within a large low-entropy sub-universe. Hence, conditional on my subjective experience thus far, I am still forced to conclude that I am overwhelmingly more likely to be a Boltzmann Brain, according to this theory.

2: Drama ensues

Quite recently, Sean Carroll wrote a paper (publicized on his blog) in which he and co-author Kim Boddy use the Higgs mechanism to “solve the problem of Boltzmann Brains” in cosmology. The setting is the  ΛCDM model of cosmology, which is a little different to Boltzmann’s model of the universe, but suffers a similar problem: in the case of ΛCDM, the universe keeps expanding forever, eventually reaching a maximum entropy state (aka “heat death”) and after which Boltzmann Brains have as much time as they need to fluctuate randomly out of the thermal noise. Such a model, argues Carroll, would imply that it is overwhelmingly likely that we are Boltzmann Brains.

Why is this a problem? Carroll puts it down to what he calls “cognitive instability”. Basically, the argument goes like this. Suppose you believe in a model of cosmology that has Boltzmann Brains. Then you should believe that you are most likely to be one of them. But this means that your reasons for believing in the model in the first place cannot be trusted, since they are not based on actual scientific evidence, but instead simply fluctuated into your brain at random. In essence, you are saying `based on the evidence, I believe that I am an entity that cannot believe in anything based on what it thinks is evidence’. A cognitively unstable theory therefore cannot both be true and be justified by observed evidence. Carroll’s solution to this problem is to reject the model in favor of one that doesn’t allow for the future existence of Boltzmann Brains.

Poor Carroll has taken a beating over at other blogs. Luboš Motl provided a lengthy response filled with the usual ad-hominems:

`…I really think that Carroll’s totally wrong reasoning is tightly linked to an ideology that blinds his eyes. As a hardcore leftist […] he believes in various forms of egalitarianism. Every “object” has the same probability.’

Jacques Distler wrote:

`…This is plainly nuts. How can a phase transition that may or may not take place, billions of years in the future, affect anything that we measure in the here-and-now? And, if it doesn’t affect anything in the present, why do I &#%@ care?’

Only Mark Srednicki seemed able to disagree with Carroll without taking the idea as a personal affront; his level-headed discussions with Carroll and Distler helped to clarify the issue significantly. Ultimately, Srednicki agrees with the conclusions of both Motl and Distler, but for slightly different reasons. The ensuing discussion can be summarized something like this:

Distler: A model of the universe does not allow you to make predictions by itself. You also need to supply a hypothesis about where we exist within the universe. And any hypothesis in which we are Boltzmann Brains is immediately refuted by empirical evidence, namely when we fail to evaporate into radiation in the next second.

Motl: Yep, I basically agree with Distler. Also, Boddy and Carroll are stupid Marxist idiots.

Srednicki: Hang on guys, it’s more subtle than that. Carroll is simply saying that, in light of the evidence, a cosmological model without Boltzmann Brains is better than one that has Boltzmann Brains in it. Whether this is true or not is a philosophical question, not something that is blindingly obvious.

Distler: Hmmpf! Well I think it is blindingly obvious that the presence or absence of Boltzmann Brains has no bearing on choosing between the two models. Your predictions for future events would be the same in both.

Srednicki: That’s only true if, within the Boltzmann Brain model, you choose a xerographic distribution that ensures you are a non-Boltzmann-brain. But the choice of xerographic distribution is a philosophical one.

Distler: I disagree – Bayesian theory says that you should choose the prior distribution that converges most quickly to the `correct’ distribution as defined by the model. In this case, it is the distribution that favours us not being Boltzmann Brains in the first place.

(Meanwhile, at Preposterous Universe…)

Carroll: I think that it is obvious that you should give an equal credence to yourself being any one of the observers in a model that have identical previous experiences as you. It follows that a model without Boltzmann Brains is better than a model with Boltzmann Brains due to cognitive instability.

Srednicki: Sorry Carroll – your claim is not at all obvious. It is a philosophical assumption that cannot be derived from any laws within the model. Under a different assumption, Boltzmann Brains aren’t a problem.

There is a potential flaw in the Distler/Motl argument: it rests on the premise that, if you are indeed a Boltzmann Brain, this can be taken as a highly falsifiable hypothesis which is falsified one second later when you fail to evaporate. But strictly speaking, that only rules out a subset of possible Boltzmann Brain hypotheses – there is still the hypothesis that you are a Boltzmann Brain whose experiences are indistinguishable from yours right up until the day you die, at which point they go `poof’, and the hypothesis that you are one of these brains is not falsifiable. Sure, there are vastly fewer Boltzmann Brains with this property, but in a sufficiently long-lived universe there are still vastly more of them than the `real you’. Thus, the real problem with Boltzmann Brains is not that they are immediately falsified by experience, but quite the opposite: they represent an unfalsifiable hypothesis. Of course, this also immediately resolves the problem: even if you were a Boltzmann Brain, your life will proceed as normal (by definition, you have restricted yourself to BB’s whose subjective experiences match those of a real person’s life), so this belief has no bearing on your decisions. In particular, your subjective experience gives you no reason to prefer one model to another just because the former contains Boltzmann Brains and the latter doesn’t. One thereby arrives at the same conclusion as Distler and Motl, but by a different route. However, this also means that Distler and Motl cannot claim that they are not Boltzmann Brains based on observed evidence, if one assumes a BB model. Either they think Boddy and Carroll’s proposed alternative is not a viable theory in it own right, or else they don’t think that a theory containing a vast number of unfalsifiable elements is any worse than a similar theory that doesn’t need such elements, which to me sounds absurd.

I think that Mark Srednicki basically has it right: the problem at hand is yet another example of anthropic reasoning, and the debate here is actually about how to choose an appropriate `reference class’ of observers. The reference class is basically the set of observers within the model that you think it is possible that you might have been. Do you think it is possible that you could have been born as somebody else? Do you think you might have been born at a different time in history? What about as an insect, or a bacterium? In a follow-up post, I’ll discuss an idea that lends support to Boddy and Carroll’s side of the argument.

[1] If you want to read about anthropic reasoning from somebody who actually knows what they are talking about, see Anthropic Bias by Nick Bostrom. Particularly relevant here is his discussion of `freak observers’.

The Adam and Eve Paradox

One of my favourite mind-bending topics is probability theory. It turns out that, for some reason, human beings are very bad at grasping how probability works. This is evident in many phenomena: why do we think the roulette wheel is more likely to come up black after a long string of reds? Why do people buy lottery tickets? Why is it so freakin’ hard to convince people to switch doors in the famous Monty Hall Dilemma?

Part of the problem is that we seem to think we understand probability much better than we actually do. This is why card sharks and dice players continue to make a living by swindling people who fall into common traps. Studying probability is one of the most humbling things a person can do. One area that has particular relevance to physics is the concept of anthropic reasoning. We base our decisions on prior knowledge that we possess. But it is not always obvious which prior knowledge is relevant to a given problem. There may be some cases where the mere knowledge that you exist — in this time, as yourself – might conceivably tell you something useful.

The anthropic argument in cosmology and physics is the proposal that some observed facts about the universe can be explained simply by the fact that we exist. For example, we might wonder why the cosmological constant is so small. In 1987, Steven Weinberg argued that if it were any bigger, it would not have been possible for life to evolve in the universe —  hence, the mere fact that we exist implies that the value of the constant is below a certain limit. However, one has to be extremely careful about invoking such principles, as we will see.

This blog post is likely to be the first among many, in which I meditate on the subtleties of probability. Today, I’d like to look at an old chestnut that goes by many names, but often appears in the form of the `Adam and Eve’ paradox.

(Kunsthistoriches Wien)
Spranger – Adam and Eve

Adam finds himself to be the first human being. While he is waiting around for Eve to turn up, he is naturally very bored. He fishes around in his pocket for a coin. Just for a laugh, he decides that if the coin comes up heads, he will refuse to procreate with Eve, thereby dooming the rest of the human race to non-existence (Adam has a sick sense of humour). However, if the coin comes up tails, he will conceive with Eve as planned and start the chain of events leading to the rest of humanity.

Now Adam reasons as follows: `Either the future holds a large number of my future progeny, or it holds nobody else besides myself and Eve. If indeed it holds many humans, then it is vastly more likely that I should have been born as one of them, instead of finding myself rather co-incidentally in the body of the first human. On the other hand, if there are only ever going to be two people, then it is quite reasonable that I should find myself to be the first one of them. Therefore, given that I already find myself in the body of the first human being, the coin is overwhelmingly likely to come up heads when I flip it.’ Is Adam’s reasoning correct? What is probability of the coin coming up heads?

As with many problems of a similar ilk, this one creates confusion by leaving out certain crucial details that are needed in order to calculate the probability. Because of the sneaky phrasing of the problem, however, people often don’t notice that anything is missing – they bring along their own assumptions about what these details ought to be, and are then surprised when someone with different assumptions ends up with a different probability, using just as good a logical argument.

Any well-posed problem has an unambiguous answer. For example, suppose I tell you that there is a bag of 35 marbles, 15 of which are red and the rest blue. This information is now sufficient to state the probability that a marble taken from the bag is red. But suppose I told you the same problem, without specifying the total number of marbles in the bag. So you know that 15 are red, but there could be any number of additional blue marbles. In order to figure out the probability of getting a red marble, you first have to guess how many blue marbles there are, and in this case (assuming the bag can be infinitely large) a guess of 20 is as good as a guess of 20000, but the probability of drawing a red marble is quite different in each case. Basically, two different rational people might come up with completely different answers to the question because they made different guesses, but neither would be any more or less correct than the other person: without additional information, the answer is ambiguous.

In the case of Adam’s coin, the answer depends on things like: how do souls get assigned to bodies? Do you start with one soul for every human who will ever live and then distribute them randomly? If so, then doesn’t this imply that certain facts about the future are pre-determined, such as Adam’s decision whether or not to procreate? We will now see how it is possible to choose two different contexts such that in one case, Adam is correct, and in the other case he is wrong. But just to avoid questions of theological preference, we will rephrase the problem in terms of a more real-world scenario: actors auditioning for a play.

Imagine a large number of actors auditioning for the parts in the Play of Life. Their roles have not yet been assigned. The problem is that the director has not yet decided which version of the play he wishes to run. In one version, he only needs two actors, while in the other version there is a role for every applicant.

In the first version of the play, the lead actor flips a coin and it comes up heads (the coin is a specially designed stage-prop that is weighted to always come up heads). The lead actress then joins the lead actor onstage, and no more characters are required. In the second version of the play, the coin is rigged to come up tails, and immediately afterwards a whole ensemble of characters comes onto the scene, one for every available actor.

The director wishes to make his decision without potentially angering the vast number of actors who might not get a part. Therefore he decides to use an unconventional (and probably illegal) method of auditioning. First, he puts all of the prospective actors to sleep; then he decides by whatever means he pleases which version of the play to run. If it is the first version, he randomly assigns the roles of the two lead characters and has them dressed up in the appropriate costumes. As for all the other actors who didn’t get a part, he has them loaded into taxis and sent home with an apologetic letter. If he decides on the second version of the play, then he assigns all of the roles randomly and has the actors dressed up in the costumes of their characters, ready to go onstage when they wake up.

Now imagine that you are one of the actors, and you are fully aware of the director’s plan, but you do not know which version of the play he is going to run. After being put to sleep, you wake up some time later dressed in the clothing of the lead role, Adam. You stumble on stage for the opening act, involving you flipping a coin. Of course, you know coin is rigged to either land heads or tails depending on which version of the play the director has chosen to run. Now you can ask yourself what the probability is that the coin will land heads, given that you have been assigned the role of Adam. In this case, hopefully you can convince yourself with a bit of thought that your being chosen as Adam does not give you any information about the director’s choice. So guessing that the coin will come up heads is equally justified as guessing that it will come up tails.

Let us now imagine a slight variation in the process. Suppose that, just before putting everyone to sleep, the director takes you aside and confides in you that he thinks you would make an excellent Adam. He likes you so much, in fact, that he has specially pre-assigned you the role of Adam in the case that he runs the two-person version of the play. However, he feels that in the many-character version of the play it would be too unfair not to give one of the other actors a chance at the lead, so in that case he intends to cast the role randomly as usual.

Given this extra information, you should now be much less surprised at waking up to find yourself in Adam’s costume. Indeed, your lack of surprise is due to the fact that your waking up in this role is a strong indication that the director went with his first choice – to run the two-person version of the play. You can therefore predict with confidence that your coin is rigged to land heads, and that the other actors are most probably safely on their way home with apologetic notes in their jacket pockets.

What is the moral of this story? Be suspicious of any hypothetical scenario whose answer depends on mysterious unstated assumptions about how souls are assigned to bodies, whether the universe is deterministic, etc. Different choices of the process by which you find yourself in one situation or another will affect the extent to which your own existence informs your assignation of probabilities. Specifying these details means asking the question: what process determines the state of existence in which I find myself? If you want to reason about counterfactual scenarios in which you might have been someone else, or not existed at all, then you must first specify a clear model of how such states of existence come about. Without that information, you cannot reliably invoke your own existence as an aid to calculating probabilities.