A paradox about two boxes

Most of the best probability paradoxes are set in game-shows. Perhaps, like the clandestine communication^[1] of Alice and Bob in cryptography, it's just a conventional way of providing an almost-plausible set of motivations to give people a better intuitive feel for the problem at hand. (The wonderful example of the Monty Hall Problem could have started the trend.) Or perhaps the garish colours typical of the sets (not to mention the presenters' clothes) directly attack the part of the brain responsible for considering certainty and uncertainty, and when emitting the resulting paradox it naturally frames it with a story based on whatever it was viewing at the time.

One such paradox goes as follows^[2]. You are a contestant on a game-show, where there are two closed boxes of money. You are told that one box contains twice as much money as the other, but you are not told which is which. As a reward for your knowledge of 1970s popular music, you are allowed to look inside one box, and then decide which box you would like to receive and keep. The question for the contestant is then this: do you keep the box that you've opened, or swap? And can it matter which you do? The intuitively-appealing answer is that it can't matter, as you no more know which box has more money now than you did before opening one, so you've got not reason to prefer the other. On the other hand, a student of probability theory can reason like this: let the amount of money seen in the opened box be M. Then our expected return from not swapping boxes is M. But if we swap boxes we are equally likely to get M/2 or 2M, so our expected return is one half of M/2 plus one half of 2M, which is 5M/4, greater than if we'd stuck with the first box. Hence it looks as if we should swap, even though it also looks as if it can't matter. Here lies the paradox.

Let's examine the second approach first. Suppose that the opened box contains ten pounds. Then if we believe it to be equally likely that the other box contains five pounds or twenty pounds, it must pay, on average, to swap. It would, after all, be a fairly-priced double-or-nothing sort of arrangement if the box were equally likely to contain twenty pounds or no money, and we know that it's better than that^[3]. This solution, though, is deeply unsatisfactory, because we could have known, even before we had opened the box, that we were going to apply this reasoning, so we could—if it's true—have known before opening the first box that we ought to prefer the second. But that argument could just as well apply to the second box: if we pretend that we're to open that, we should prefer the first. We're left with an absurd approach to maximizing our expected return: first select one at random (the one that we would have opened, but now needn't) and then choose the other one to keep. Clearly something has gone wrong, and the only thing that can have gone wrong is this: the assumption that, on seeing ten pounds in the opened box, the other box was equally likely to contain five pounds or twenty pounds.

The purpose of any good probability paradox is to teach us that Bayes was right^[4], so it shouldn't be surprising that we need to consider the prior probability distribution. In other words (and here, as always, the Bayesian recipe arises naturally from taking into account things that we must take into account), we need to think about how the game-show producer selects how much money to put in the boxes. The implicit assumption that we've been making is that he's equally likely to have chosen M and 2M (say ten pounds and twenty pounds) as to have chosen M and M/2 (say ten pounds and five pounds) but we've shown above that this can't be true, or at least can't be true for all M. The assumption is what is known as a uniform prior. Let's suppose that the amount of money can be any real positive number^[5]: in this case, a uniform prior means a probability density function for (say) the smaller amount of money that is a constant.

It can be tempting to take uniform priors in all cases to express complete ignorance as to possible values, but it often isn't the right thing to do, and we know that it can't be right in this case. One alternative starting point is the log-uniform prior. This is the assumption that values in the range 1 to 10 are as likely as values in the range 10 to 100, and so on. This would look like a uniform distribution if plotted on log-linear graph paper, hence the name. The cumulative distribution function must be proportional to the logarithm of the random variable, so the probability density function—its derivative—is proportional to the reciprocal of the random variable. A log-uniform prior makes sense if we not only don't know the value of a number, but don't even know what sort of order of magnitude to expect, as it allows all orders of magnitude to be equally likely. This perhaps fits better with our game-show situation, so let's apply it. Seeing an amount of money M, a log-uniform prior tells us that M and 2M in the two boxes is half as likely as M and M/2. Since it must be one or the other, we assign probabilities of 1/3 and 2/3 respectively. Then the expected return if we swap is 1/3 of 2M plus 2/3 of M/2, giving M, as we'd hoped all along.

It's tempting at this point to imagine that the problem is solved, and that the correct prior is a log-uniform one. It is true that, if the producer does choose his amounts according to a log-uniform distribution over some range, then seeing an amount well within that range, in the opened box, leaves the contestant in a position where they neither gain nor lose, on average, from swapping. However, there's nothing forcing the producer to choose his numbers this way, and even the log-uniform distribution generates a paradox: it's just not the same paradox as before. For if the probability, on seeing M, of having M and M/2 in the two boxes is 2/3, this means that as soon as we open the box we have reason to think that it's more likely that we've got the larger amount of the two. Just as before, this argument applies even before we've opened the box, as, whatever it contains, we could all it M. Hence we claim to be able to select the box containing the larger amount two-thirds of the time simply by pointing at it. And this cannot be.

The uniform and log-uniform priors, then, merely lead to different paradoxes. With the former we admit to being equally likely to have the larger or smaller amount, but must on average gain by switching; in the latter, we are no better or worse off by switching, but are more likely to have picked the box with more inside. Neither is possible, so neither prior is possible. What should we use as a prior? Ideally we want to know what probability distribution the producer is using, and employ that as a prior. Perhaps we could study previous episodes of the game-show assiduously. We know, though, that he isn't using an infinite uniform or log-uniform distribution, though, because they can't be normalized, meaning that the probability of selecting a number bigger than any given number (say a trillion) is arbitrarily close to one. In many applications of uniform and log-uniform priors this isn't a problem, as we're only interested in relative probabilities, and have to normalize our posterior probability distribution in any case^[6]. It was in this spirit that I assigned the probabilities 1/3 and 2/3 above. However, in any real problem, there must be some range outside which the uniformity (or log-uniformity) breaks down, and in some cases, this one included, the theory only works if we take this into account.

As soon as we admit upper and lower limits, however extreme, to the amounts of money in the boxes, the worst aspect of the paradox—the idea that we know something about a box before we've opened it—goes away. Suppose that we think that there's a uniform distribution of possible amounts from one pound and two pounds to one billion and two billion. Then on opening the first box, provided we see an amount smaller than a billion pounds we know that we can increase our expected return by swapping, as argued above. However, if we see a larger amount than that, we could not possibly gain by swapping, as it can't be the smaller amount, so we shouldn't swap. It's clear that we can do rather better than this if we make a more realistic^[7] assumption about the upper limit: if we can't watch any previous editions, we should at least compare the prize budgets of similar game-shows. As so often with the construction of priors, we might not be able to do it as scientifically as we'd like, but we'll do better by making some guesses based on such information as we do have than trying, in the interests of some spurious generality, to pretend that we don't have any information at all.

Matthew Smith, 3.i.2013

[1] ^ I nearly wrote "clandestine passion", but it strikes me that we can't be sure. They do, after all, use excellent cryptography. Those of narrow, conventional minds, who have suspected them of plotting infidelity, will perhaps be very sorry when it turns out that Alice and Bob have really been working out, between them, how to summon dragons.

[2] ^ It turns out that this paradox is what Wikipedia calls the two envelopes problem, and that a great deal of ink has been spilt on it. Having written the above I wondered whether it was too long and rambling to be of any use; further discovering that it was discussed elsewhere indicated that someone must have done so more clearly and concisely. Such does not, however, appear to be the case, and learned authorities have even managed to disagree, so perhaps I'll let it stand.

[3] ^ I'm assuming a limitless appetite in the contestant for gambling when the odds are in their favour, i.e. that you are just looking to maximize your expected return, where I'm using "expected" in the technical, probability-theoretic sense. This is realistic for a rational person who likes money if the amounts are small and the situation, or others like it, oft-repeated. It needn't be if the amounts are large and the situation infrequent, as, for example, on a game-show. No rational person, otherwise, would ever buy insurance: the expected return must be less than the premium, otherwise the actuaries would starve. However, we'll ignore this effect for the moment. It may have some relevance in a later footnote..

[4] ^ The purpose of any good ethical paradox, on the other hand, is to show that Bentham was wrong.

[5] ^ In reality it must be a whole number of pence, because Lady Thatcher, in one of her less evil acts, abolished the ha'penny in 1984. Then there's the danger that contestants, seeing an odd number of pence, will know that they must gain by swapping, spoiling the surprise. The wily producer, then, will use only even numbers of pence. However, if contestants start to notice that, they will elect to swap whenever faced with a number of pence which is not a multiple of four, and so on. The issue isn't entirely solved even by using amounts of pence which are exact powers of two, on account of the question of where to start, though it can be rendered arbitrarily infrequent. This, though, isn't the heart of the paradox, which would apply to quantities that could take arbitrary real values, so I shan't consider it further.

[6] ^ The question of whether we are allowed to use a prior that cannot be normalized, known as an improper prior, turns out to be one of the key points of discussion amongst more serious people. Any good Bayesian must know that we can't, because the prior is really a probability distribution. The use of an improper prior really means that we want to use a prior that is uniform (or log-uniform, or something) over some large range, and cut off outside of that, possibly (for want of any preferable assumption) by a Gaussian envelope. If the cut-off is sufficiently far away from the region in which we're interested, then for many problems the result (in terms of posterior probabilities) is independent of quite where we imagine that cut-off to be, being the one that we'd predict from the improper prior. That doesn't mean, though, that we are really positing an improper prior, just that it's a convenient way of avoiding having to find an estimate for something that we don't in fact need to know. In problems, like ours, where the position of the cut-off does matter, it isn't allowed^[8]. As mentioned above, the improper prior, if taken seriously as a limiting form, would lead to a negligible probability that the number in the box was smaller than some arbitrarily given number, which means that no possible representational scheme could ever describe the numbers produced, making it impossible to run the game even with real numbers that we have a way of representing, let alone amounts of money.

[7] ^ I can only hope that it remains unrealistic; but the Bank has now discovered its ex-nihilo-generation lever, so who can tell whither inflation?

[8] ^ All of this rather assumes the the paradox is fully resolved once we disallow improper priors. If we insist on a hard cut-off for the prior, as must occur when it's operated in an actual game show with actual money, or even a Gaussian-tail cut-off, then the resolution stands, but apparently one John Broome has discussed an example with a proper prior—a probability distribution for the smaller amount of money whose probabilities sum to one—which still generates the paradox. The details are on the Wikipedia page^[2], where there are also some ideas towards a resolution given. The essential point is that the expected return from either envelope is undefined (infinite, if you like), meaning that it doesn't make sense to use an expectation-based test to compare possible returns from the two envelopes before opening them. In a sense, though, the worrying part of the paradox isn't about expected values, a mathematical abstraction about which one may be prepared to have one's intuition refined, but about people operating best strategies for getting most money. Introduce, then, a rational, money-loving individual—let us call him Fred—who is allowed to play the game lots of times, and for whom the amounts of money involved are fairly small ^[9]. For a given amount of money that Fred sees in an envelope, it is easy to show that he will choose to switch on seeing it, as this increases his expected return. It's tempting, then, to say that Fred will switch envelopes even before opening one, restoring the paradox. However, the specification of Fred required that he regarded the amounts of money as small and could make the relevant choice lots of times, and this will only apply to amounts of money up to a certain value. However many times Fred gets to play the game as a whole, there is an amount of money that he will on average see only once, and at this point mere rationality doesn't oblige him to take the gamble. Perhaps he will and perhaps he won't, but in any case we can't be sure that he will before he's even opened the envelope.

[9] ^ If you like to model all rational behaviour as maximizing the value of some utility function which is a suitably-behaved function of amount of any given owned commodity (such as money), then this amounts to saying that we are zoomed into a small enough portion of Fred's utility curve that it appears linear. It seems to me, though, that the behaviour posited for Fred is a weaker assumption than that built into maximizing utility, because if the amounts are small enough (for Fred) and the repetitions often enough then his probability of missing out on an amount of money significant to him becomes negligible.