# Why a density matrix is not a probability distribution

I’m back! And I’m ready to hit you with some really heavy thoughts that have been weighing me down, because I need to get them off my chest.

Years ago, Rob Spekkens and Matt Leifer published an article in which they tried to define a “causally neutral theory of quantum inference”. Their starting point was an analogy between a density matrix (actually the Choi-Jamiołkowski matrix of a CPT map, but hey, lets not split hairs) and a conditional probability distribution. They argued that in many respects, this “conditional density matrix” could be used to define equations of inference for density matrices, in complete analogy with the rules of Bayesian inference applied to probability distributions.

At the time, something about this idea just struck me as wrong, although I couldn’t quite put my finger on what it was. The paper is not technically wrong, but the idea just felt like the wrong approach to me. Matt Leifer even wrote a superb blog post explaining the analogy between quantum states and probabilities, and I was kind of half convinced by it. At least, my rational brain could find no flaw in the idea, but some deep, subterranean sense of aesthetics could not accept this analogy.

In my paper with Časlav Brukner on quantum causal models, we took a diametrically opposite approach. We refused to deal directly with quantum states, and instead tried to identify the quantum features of a system by looking only at the level of statistics, where the normal rules of inference would apply. The problem is, the price that you pay in working at the level of probabilities is that the structure of quantum states slips through your fingers and gets buried in the sand.

(I like to think of probability distributions as sand. You can push them around any which way, but the total amount of sand stays the same. Underneath the sand, there is some kind of ontological structure, like a dinosaur skeleton or an alien space-ship, whose ridges and contours sometimes show through in places where we brush the sand away. In quantum mechanics, it seems that we can never completely reveal what is buried, because when we clear away sand from one part, we end up dumping it on another part and obscuring it.)

One problem I had with this probability-level approach was that the quantum structure did not emerge the way I had hoped. In particular, I could not find anything like a quantum Reichenbach Principle to replace the old classical Reichenbach Principle, and so the theory was just too unconstrained to be interesting. Other approaches along the same lines tended to deal with this by putting in the quantum stuff by hand, without actually `revealing’ it in a natural way. So I gave up on this for a while.

And what became of Leifer and Spekkens’ approach, the one that I thought was wrong? Well, it also didn’t work out. Their analogy seemed to break down when they tried to add the state of the system at more than two times. To put the last nail in, last year Dominic Horsman et.al presented some work that showed that any approach along the lines of Leifer and Spekkens would run into trouble, because quantum states evolving in time just do not behave like probabilities do. Probabilities are causally neutral, which means that when we use information about one system to infer something about another system, it really doesn’t matter if the systems are connected in time or separated in space. With quantum systems, on the other hand, the difference between time and space is built into the formalism and (apparently) cannot easily be got rid of. Even in relativistic quantum theory, space and time are not quite on an equal footing (and Carlo Rovelli has had a lot to say about this in the past).

Still, after reading Horsman et. al.’s paper, I felt that it was too mathematical and didn’t touch the root of the problem with the Leifer-Spekkens analogy. Again, it was a case of technical accuracy but without conveying the deeper, intuitive reasons why the approach would fail. What finally reeled me back in was a recent paper by John-Mark Allen et. al in which they achieve an elegant definition of a quantum causal model, complete with a quantum Reichenbach Principle (even for multiple systems), but at the expense of essentially giving up on the idea of defining a quantum conditional state over time, and hence forfeiting the analogy with classical Bayesian inference. To me, it seemed like a masterful evasion, or like one of those dramatic Queen sacrifices you see in chess tournaments. They realized that what was standing in the way of quantum causal models was the desire to represent all of the structure of conditional probability distributions, but that this was not necessary for defining a causal model. So they were able to achieve a quantum causal model with a version of Reichenbach’s Principle, but at the price of retaining only a partial analog of classical Bayesian inference.

This left me pondering that old paper of Leifer and Spekkens. Why did they really fail? Is there really no way to salvage a causally neutral theory of quantum inference? I will do my best to answer only the first question here, leaving the second one open for the time being.

One strategy to use when you suspect something is wrong with an idea, is to imagine a world in which the idea is true, and then enter that world and look around to see what is wrong with it. So let us imagine that, indeed, all of the usual rules of Bayesian inference for probabilities have an exact counterpart in terms of density matrices (and similar objects). What would that mean?

Well, if it looks like a duck and quacks like a duck … we could apply Ockham’s razor and say that a density matrix must actually represent a duck probability distribution. This is great news! It means that we can argue that density matrices are actually epistemic objects — they represent our ignorance about some underlying reality. This could be the chance we’ve been waiting for to sweep away all the sand and reveal the quantum skeleton underneath!

The problem is that this is too good to be true. Why? Because it seems to imply that a density matrix uniquely corresponds to a probability distribution over the elements of reality (whatever they are). In jargon, this means that the resulting model would be preparation non-contextual. But this is impossible because — as Spekkens himself proved in a classic paper — any ontological model of quantum mechanics must be preparation contextual.

Let me try to simplify that. It turns out that there are many different ways to prepare a quantum system, such that it ends up being described by the same statistics (as determined by its density matrix). For example, if a machine prepares one of two possible pure states based on the outcome of a coin flip (whose outcome is unknown to us), this can result in the same density matrix for the system as a machine that deterministically entangles the quantum system with another system (which we don’t have access to). These details about the preparation that don’t affect the density matrix are called the preparation context.

The thing is, if a density matrix is really interpretable as a probability distribution, then which distribution it represents has to depend on the preparation context (that is what Spekkens proved). Since Leifer and Spekkens are only looking at density matrices (sans context), we should not expect them to behave just like probability distributions — the analogy has to break somewhere.

Now this is nowhere near a proof — that’s why it is appearing here on a dodgy blog instead of in a scientific publication. But I think it does capture the reason why I felt that the approach of Leifer and Spekkens was just `wrong’: they seemed to be placing density matrices into the role of probabilities, where they just don’t fit.

Now let me point out some holes in my own argument. Although a density matrix can’t be thought of as a single probability distribution, it can perhaps be thought of as representing an equivalence class of distributions, and maybe these equivalence classes could turn out obey all the usual laws of classical inference, thereby rescuing the analogy. However, there is absolutely no reason to expect this to be true — on the contrary, one would expect it to be false. To me, this would be almost like if you tried to model a single atom as if it were a whole gas of particles, and found that it works. Or, to get even more Zen, it would be like an avalanche consisting of one grain of sand.

It is interesting that the analogy can be carried as far as it has been. Perhaps this can be accounted for by the fact that, even though they can’t serve as replacements for probability distributions, density matrices do have a tight relationship with probabilities through the Born rule (the rule that tells us how to predict probabilities for measurements on a quantum system). So maybe we should expect at least some of the properties of probabilities to somehow rub off on density matrices.

Although it seems that a causally neutral theory of Bayesian inference cannot succeed using just density matrices (or similar objects), perhaps there are other approaches that would be more fruitful. What if one takes an explicitly preparation-contextual ontological model (like the fascinating Beltrametti-Bugajski model) and uses it to supplement our density matrices with the context that they need in order to identify them with probability distributions? What sort of theory of inference would that give us? Or, what if we step outside of the ontological models framework and look for some other way to define quantum inference? The door remains tantalizingly open.

# The trouble with Reichenbach

(Note: this blog post is vaguely related to a paper I wrote. You can find it on the arXiv here. )

Suppose you are walking along the beach, and you come across two holes in the rock, spaced apart by some distance; let us label them ‘A’ and ‘B’. You observe an interesting correlation between them. Every so often, at an unpredictable time, water will come spraying out of hole A, followed shortly after by a spray of water out of hole B. Given our day-to-day experience of such things, most of us would conclude that the holes are connected by a tunnel underneath the rock, which is in turn connected to the ocean, such that a surge of water in the underground tunnel causes the water to spray from the two holes at about the same time.

Now, therein lies a mystery: how did our brains make this deduction so quickly and easily? The mere fact of a statistical correlation does not tell us much about the direction of cause and effect. Two questions arise. First, why do correlations require explanations in the first place? Why can we not simply accept that the two geysers spray water in synchronisation with each other, without searching for explanations in terms of underground tunnels and ocean surges? Secondly, how do we know in this instance that the explanation is that of a common cause, and not that (for example) the spouting of water from one geyser triggers some kind of chain reaction that results in the spouting of water from the other?

The first question is a deep one. We have in our minds a model of how the world works, which is the product partly of history, partly of personal experience, and partly of science. Historically, we humans have evolved to see the world in a particular way that emphasises objects and their spatial and temporal relations to one another. In our personal experience, we have seen that objects move and interact in ways that follow certain patterns: objects fall when dropped and signals propagate through chains of interactions, like a series of dominoes falling over. Science has deduced the precise mechanical rules that govern these motions.

According to our world-view, causes always occur before their effects in time, and one way that correlations can arise between two events is if one is the cause of the other. In the present example, we may reason as follows: since hole B always spouts after A, the causal chain of events, if it exists, must run from A to B. Next, suppose that I were to cover hole A with a large stone, thereby preventing it from emitting water. If the occasion of its emission were the cause of hole B’s emission, then hole B should also cease to produce water when hole A is covered. If we perform the experiment and we find that hole B’s rate of spouting is unaffected by the presence of a stone blocking hole A, we can conclude that the two events of spouting water are not connected by a direct causal chain.

The only other way in which correlations can arise is by the influence of a third event — such as the surging of water in an underground tunnel — whose occurrence triggers both of the water spouts, each independently of the other. We could promote this aspect of our world-view to a general principle, called the Principle of the Common Cause (PCC): whenever two events A and B are correlated, then either one is a cause of the other, or else they share a common cause (which must occur some time before both of these events).

The Principle of Common Cause tells us where to look for an explanation, but it does not tell us whether our explanation is complete. In our example, we used the PCC to deduce that there must be some event preceding the two water spouts which explains their correlation, and for this we proposed a surge of water in an underground tunnel. Now suppose that the presence of water in this tunnel is absolutely necessary in order for the holes to spout water, but that on some occasions the holes do not spout even though there is water in the tunnel. In that case, simply knowing that there is water in the tunnel does not completely eliminate the correlation between the two water spouts. That is, even though I know there is water in the tunnel, I am not certain whether hole B will emit water, unless I happen to know in addition that hole A has just spouted. So, the probability of B still depends on A, despite my knowledge of the ‘common cause’. I therefore conclude that I do not know everything that there is to know about this common cause, and there is still information to be had.

It could be, for instance, that the holes will only spout water if the water pressure is above a certain threshold in the underground tunnel. If I am able to detect both the presence of the water and its pressure in the tunnel, then I can predict with certainty whether the two holes will spout or not. In particular, I will know with certainty whether hole B is going to spout, independently of A. Thus, if I had stakes riding on the outcome of B, and you were to try and sell me the information “whether A has just spouted”, I would not buy it, because it does not provide any further information beyond what I can deduce from the water in the tunnel and its pressure level. It is a fact of general experience that, conditional on complete knowledge of the common causes of two events, the probabilities of those events are no longer correlated. This is called the principle of Factorisation of Probabilities (FP). The union of FP and PCC together is called Reichenbach’s Common Cause Principle (RCCP).

In the above example, the complete knowledge of the common cause allowed me to perfectly determine whether the holes would spout or not. The conditional independence of these two events is therefore guaranteed. One might wonder why I did not talk about the principle of predetermination: conditional on on complete knowledge of the common causes, the events are determined with certainty. The reason is that predetermination might be too strong; it may be that there exist phenomena that are irreducibly random, such that even a full knowledge of the common causes does not suffice to determine the resulting events with certainty.

As another example, consider two river beds on a mountain slope, one on the left and one on the right. Usually (96% of the time) it does not rain on the mountain and both rivers are dry. If it does rain on the mountain, then there are four possibilities with equal likelihood: (i) the river beds both remain dry, (ii) the left river flows but the right one is dry (iii) the right river flows but the left is dry, or (iv) both rivers flow. Thus, without knowing anything else, the fact that one river is running makes it more likely that the other one is. However, conditional that it rained on the mountain, if I know that the left river is flowing (or dry), this does not tell me anything about whether the right river is flowing or dry. So, it seems that after conditioning on the common cause (rain on the mountain) the probabilities factorise: knowing about one river tells me nothing about the other.

Now we have a situation in which the common cause does not completely determine the outcomes of the events, but where the probabilities nevertheless factorise. Should we then conclude that the correlations are explained? If we answer ‘yes’, we have fallen into a trap.

The trap is that there may be additional information which, if discovered, would make the rivers become correlated. Suppose I find a meeting point of the two rivers further upstream, in which sediment and debris tends to gather. If there is only a little debris, it will be pushed to one side (the side chosen effectively at random), diverting water to one of the rivers and blocking the other. Alternatively, if there is a large build-up of debris, it will either dam the rivers, leaving them both dry, or else be completely destroyed by the build-up of water, feeding both rivers at once. Now, if I know that it rained on the mountain and I know how much debris is present upstream, knowing whether one river is flowing will provide information about the other (eg. if there is a little debris upstream and the right river is flowing, I know the left must be dry).

Before I knew anything, the rivers seemed to be correlated. Conditional on whether it rained on the mountain-top, the correlation disappeared. But now, conditional that it rained on the mountain and on the amount of debris upstream, the correlation is restored! If the only tools I had to explain correlations was the PCC and the FP, then how can I ever be sure that the explanation is complete? Unless the information of the common cause is enough to predetermine the outcomes of the events with certainty, there is always the possibility that the correlations have not been explained, because new information about the common causes might come to light which renders the events correlated again.

Now, at last, we come to the main point. In our classical world-view, observations tend to be compatible with predetermination. No matter how unpredictable or chaotic a phenomenon seems, we find it natural to imagine that every observed fact could be predicted with certainty, in principle, if only we knew enough about its relevant causes. In that case, we are right to say that a correlation has not been fully explained unless Reichenbach’s principle is satisfied. But this last property is now just seen as a trivial consequence of predetermination, implicit in out world-view. In fact, Reichenbach’s principle is not sufficient to guarantee that we have found an explanation. We can only be sure that the explanation has been found when the observed facts are fully determined by their causes.

This poses an interesting problem to anyone (like me) who thinks the world is intrinsically random. If we give up predetermination, we have lost our sufficient condition for correlations to be explained. Normally, if we saw a correlation, after eliminating the possibility of a direct cause we would stop searching for an explanation only when we found one that could perfectly determine the observations. But if the world is random, then how do we know when we have found a good enough explanation?

In this case, it is tempting to argue that Reichenbach’s principle should be taken as a sufficient (not just necessary) condition for an explanation. Then, we know to stop looking for explanations as soon as we have found one that causes the probabilities to factorise. But as I just argued with the example of the two rivers, this doesn’t work. If we believed this, then we would have to accept that it is possible for an explained correlation to suddenly become unexplained upon the discovery of additional facts! Short of a physical law forbidding such additional facts, this makes for a very tenuous notion of explanation indeed.

The question of what should constitute a satisfactory explanation for a correlation is, I think, one of the deepest problems posed to us by quantum mechanics. The way I read Bell’s theorem is that (assuming that we accept the theorem’s basic assumptions) quantum mechanics is either non-local, or else it contains correlations that do not satisfy the factorisation part of Reichenbach’s principle. If we believe that factorisation is a necessary part of explanation, then we are forced to accept non-locality. But why should factorisation be a necessary requirement of explanation? It is only justified if we believe in predetermination.

A critic might try to argue that, without factorisation, we have lost all ability to explain correlations. But I’m saying that this true even for those who would accept factorisation but reject predetermination. I say, without predetermination, there is no need to hold on to factorisation, because it doesn’t help you to explain correlations any better than the rest of us non-determinists! So what are we to do? Maybe it is time to shrug off factorisation and face up to the task of finding a proper explanation for quantum correlations.

# Death to Powerpoint!

There is one thing that has always baffled me about academia, and theoretical physics in particular. Here we have a community of people whose work — indeed, whose very careers — depend on their ability to communicate complex ideas to each other and to the broader public in order to secure funding for their projects. To be an effective working physicist, you basically have to do three things: publish papers, go to conferences, and give presentations. LOTS of presentations. In principle, this should be easy; we are usually talking to a receptive audience of our peers or educated outsiders, we presumably know the subject matter backwards and many of us have had years of experience giving public talks. So can someone please tell me why the heck so many physicists are still so bad at it?

Now before you start trying to guess if I am ranting about anyone in particular, let me set your mind at ease — I am talking about everybody, probably including you, and certainly including myself (well, up to a point). I except only those few speakers in physics who really know how to engage their audience and deliver an effective presentation (if you know any examples, please post names or links in the comments, I want to catalog these guys like rare insects). But instead of complaining about it, I am going to try and perpetuate a solution. There is an enemy in our midst: slide shows. We are crippling our communication skills by our unspoken subservience to the idea that a presentation that doesn’t contain at least 15 slides with graphs and equations does not qualify as legitimate science.

*(I am guilty of this, but I balance it out by asking an equal number of really dumb questions).

I don’t want questions from people who have understood my talk perfectly and are merely demonstrating this fact to everyone else in the room: I want dumb questions, obvious questions, offensive questions, real questions that strike at the root of what is going on. Life is too short to beat around the bush, let’s just cut to the chase and do some damn physics! You don’t know what that symbol means? Ask me! If I’m wrong I’m wrong, if your question is dumb, it’s dumb, but I’ll answer it anyway and we can move on like adults.

Today I trialed a new experiment of mine: I call it the “One Slide Wonder”. I gave a one hour presentation based on one slide. I think it was a partial success, but needs refinement. For anyone who wants to get on board with this idea, the rules are as follows:

1. Thou shalt make thine presentation with only a single slide.

2. The slide shalt contain things that stimulate discussions and invite questions, or serve as handy references, but NOT detailed proofs or lengthy explanations. These will come from your mouth and chalk-hand.

3. The time spent talking about the slide shalt not exceed the time that could reasonably be allotted to a single slide, certainly not more than 10-15 minutes.

4. After this time, thou shalt invite questions, and the discussion subsists thereupon for the duration of the session or until such a time as it wraps up in a natural way.

To some people, this might seem terrifying: what if nobody has any questions? What if I present my one slide, everyone coughs in awkward silence, and I have still 45 minutes to fill? Do I have to dance a jig or sing aloud for them? It is just like my childhood nightmares! To those who fear this scenario, I say: be brave. You know why talks always run overtime? Because the audience is bursting with questions and they keep interrupting the speaker to clarify things. This is usually treated like a nuisance and the audience is told to “continue the discussion in question time”, except there isn’t any question time because there were too many fucking slides.

So let’s give them what they want: a single slide that we can all discuss to our heart’s content. You bet it can take an hour. Use your power as the speaker to guide the topic of discussion to what you want to talk about. Use the blackboard. Get covered in chalk, give the chalk to the audience, get interactive, encourage excitement — above all, destroy the facade of endless slides and break through to the human beings who are sitting there trying to talk back to you. If you want to be sure to incite discussion, just write some deliberately provocative statement on your slide and then stand there and wait. No living physicist can resist the combined fear of an awkward silence, coupled to the desire to challenge your claim that the many-worlds interpretation can be tested. And finally, in the absolute worst case scenario, nobody has any questions after your one slide and then you just say “Thank you” and take a seat, and you will go down in history as having given the most concise talk ever.