# On Understanding Probability Puzzles

The theory of probability tries to assign a number to our belief that something is “true”. Having its roots in gambling, probability is an intricate part of everyone’s life in this age. The insurance industry, finance industry, casinos, health care industry, double blind experiments, election exit polling etc are just a few instances where probability plays a crucial role. Before going into the abstract theory, one of the best ways to understand probability is to see it from an experimental perspective. Let’s consider the following concrete example.

— 1. Girl born on Tuesday problem —

Problem 1 If a couple has their first girl child born on a Tuesday, what is the probability that their second child is also a girl?

You might wonder the presence of Tuesday in the question. Just like most other probability puzzles, this one is ill posed. And just like most other probability puzzles, there are numerous ways in which to make the problem well posed. Once this is done, will land up with a unique answer. And different people, depending upon how they make the problem well posed typically land up with different and correct answers. The funny part is I have seen people arguing very passionately about how their solution is the “correct” one. Let’s first simulate a thought experiment in our mind. We need to generate an experiment whose outcomes contain our events of interest. Let’s ignore the Tuesday information for a moment and assume for simplicity that everyone has just two children. Consider the following experiment.

There are four outcomes of the above experiments which are ${(B,B),(B,G),(G,B),(G,G)}$. Therefore, for this interpretation of Problem 1, the solution for the puzzle without the Tuesday information is ${\frac{1}{4}}$. Now, let’s put back in the Tuesday for our puzzle. The setup for our thought experiment is as follows.

Solution: For this new experiment, the outcomes ${(C[1],D[1],C[2],D[2])}$ corresponding to two pairs of gender and day of the week can take ${2\times 7 \times 2 \times 7 = 196}$ values. Now, we are already told that the first child is a girl born on Tuesday. Therefore, our sample space size is reduced to ${1\times 1\times 2\times 7}$. The event of interest to us, i.e., both girls, can take ${1\times 1\times 1 \times 7}$ values corresponding to ${(Girl,Tuesday,Girl, Anyday)}$. Therefore, under this interpretation, our answer is ${1/2}$. $\Box$

I am sure there are other ways to interpret the question. But once you have made the interpretation and set up your experiment, you should get a unique answer. Let’s change the problem a bit as follows.

Problem 2 If a couple has a girl child born on a Tuesday, what is the probability that their second child is also a girl?

Solution: In this case, we are only told that the couple has a girl child born on a Tuesday. It could be the first girl child born on a Tuesday or the second girl child born on a Tuesday. Let’s compute the sample space size first. Out of all the 196 outcomes, only ${2\times 7 + 2\times 7 -1 = 27}$ outcomes have a girl child born on a Tuesday. And out of all the 196 possible outcomes, only ${1\times 7 + 1\times 7 -1 = 13}$ outcomes have both girls, of which one is born on a Tuesday. Therefore, the required probability is ${13/27}$. $\Box$

— 2. The Month Hall problem —

One of my favorite probability puzzles is the Monty Hall problem. Here is the description.

Problem 3 Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

Solution: Let’s set up our experiment for this problem as follows.

The above experiment returns the probability of winning. If one is not switching doors after being shown a goat door, then one sees that you win if ${InitialPick = CarDoor}$ which happens with probability ${1/3}$. Therefore, with switching strategy, the probability of winning is the complementary event with value ${2/3}$. In other words, with the switching strategy, you win if you made the bad choice initially. And with probability ${2/3}$, you are bound to make a bad choice at the beginning. To appreciate this statement better, consider an extreme Monty Hall problem with 1000 doors and the host shows you 998 goats. You are asked if you want to change your initial pick. In this case, your chances of making a bad choice initially is ${999/1000}$, i.e., your chances of winning with the swap strategy is ${999/1000}$. The point being that the host is not opening a door for you at random. S/he knows what is behind each door and one is essentially exploiting this information with the swap strategy. $\Box$

— 3. The concentric circles and chord problem —

Let’s consider a more serious looking problem. I have posed this problem to a couple of my friends and have never got the same answer.

Problem 4 Consider two concentric circles of radii 1 and 2 respectively. If you pick a chord on the outer circle at random, what is the probability that the chord will touch or intersect the inner circle?

Solution: Here is one way to make the problem well posed.

For this interpretation of Problem 4, one gets a unique answer which is ${1/3}$. I am aware of at least two other ways to make the problem well posed and which will give you (correct) answers ${1/2}$ and ${1/4}$. The corresponding experiments are different ways to make the phrase “pick at random” precise. For example, one can pick a number ${L}$ uniformly from the interval ${\left[ 0,4 \right]}$, an angle ${\theta}$ uniformly from ${\left[ 0,\pi \right]}$ and consider a chord of length ${L}$ and orientation ${\theta}$. Or one can sample a point ${P}$ uniformly inside the disc of radius ${2}$ and consider the (almost) unique chord for which ${P}$ is the mid point. $\Box$

To summarize, do not panic if a probability puzzle seems out of reach for you. Stay calm and run a thought simulation of the problem in your mind and things will become clear. As an exercise, try to solve the following problems by making them well-posed.

— 4. Exercise —

Problem 5 Let ${A}$ be the ordered array of integers ${\{1,\dots,N\}}$ where ${N}$ is unknown but surely between 1 and 100. A person picks up an element of ${A}$ at random and it turns out to be 10. What is the expected value of ${N}$?

As slightly related problem is the following.

Problem 6 Let ${A}$ be the ordered array of integers ${\{1,\dots,N\}}$ where ${N\ge 1}$. A person picks up an element of ${A}$ at random and it turns out to be 10. What is the expected value of ${N}$?

Problem 7 If the probability of seeing a bus at a particular intersection in a 20 minute window is 0.9, what is the probability of seeing a bus at the same intersection in a 5 minute window?

Problem 7 seems to be very popular in the puzzle world. It is quite interesting and most people who ask this don’t realize the subtlety because they themselves heard it from someone else or saw it in a book and never thought deeply about it. Again, think of an experiment and simulate it in your mind. You can come with with at least three different interpretations for the underlying experiment and hence at least three correct answers.

The following two sections are a bit more technical and not really in the puzzle category. I will assume knowledge of some basic results in probability.

— 5. The Brownian motion —

Let’s try to understand the one-dimensional Brownian motion, denoted by ${X_t}$, from an experimental point of view. The formal construction for the Brownian motion can be found here for example. And you will get a good historical perspective of the Brownian motion here. One of the basic results of Paley, Wiener and Zygmund in Brownian motion is the following

Theorem 1 (Paley, Wiener, Zygmund) Almost surely, the Brownian motion is nowhere differentiable, i.e., the map ${t\rightarrow X_t}$ is almost surely nowhere differentiable.

We will take this fact for granted but lets see what it means experimentally. Lets have two indices, ${\omega}$ and ${t}$. The index ${\omega }$ represents the index of our experiment output and the index ${t}$ represents the time index for for an instance of the experiment. For example, say I have 100 petri dishes with water and I place a pollen grain in the middle of each one of them at ${t=0}$ and let it go. In this case, my ${\omega }$ ranges from 1 to 100 and for each ${\omega }$, the time ranges from 0 to ${\infty}$. There are various interesting questions which one could ask. For example,

• What is the probability that the pollen grain will hit the boundary of the petri dish?
• What is the expected time for the pollen grain to hit the boundary?
• What is the probability that the pollen grain, starting from the middle of the petri dish, will hit a particular point on the boundary?
• What is the mean and variance of the location of the pollen grain in time?

One could of course conduct millions of physical or numerical experiments and empirically compute these numbers. The theory of Brownian motion is an attempt to model this experimental setup into a rigorous mathematical framework so that these numbers can be computed without too much effort. The way this is achieved is by considering (uncountably large) number of petri dishes, placing a pollen grain in the middle of each one of them and letting it go. If ${\Sigma = \left[ 0,\infty \right)}$ and ${\omega \in \mathbb{R}}$ is the index for the ${\omega^{\rm th}}$ petri dish, then the underlying experimental space is of size ${\mathbb{R}^{\Sigma}}$. Now, it is a highly nontrivial task to show that one can indeed think of ${\mathbb{R}^{\Sigma}}$ as a probability space, i.e., a measure space with a finite measure called the Wiener measure. That is the content of Kolmogorov Extension theorem which can be found in any graduate level probability textbook. For the Brownian motion, one can consider a single experimental output, ${X_t}$, to be an element of ${\mathbb{R}^{\Sigma}}$ for some index ${\omega }$. In other words, ${B(\omega ,t)\in \mathbb{R}^{\Sigma}}$ denotes the position of the pollen grain in the ${\omega^{\rm th}}$ petri dish at time ${t}$.

So the statement ${t\rightarrow X_t}$ is almost surely nowhere differentiable has the following experimental setup. Let ${N}$ be a large integer and let ${\omega_1,\dots,\omega_N}$ be ${N}$ petri dishes with a pollen grain in the middle of each one of them at ${t=0}$. Let the experiment run for time ${t\in \Sigma}$. Let ${D_N}$ be the number of petri dishes such that the corresponding pollen grain trajectory ${X_t}$ is differentiable at time ${t}$. Then, ${D_N/N\rightarrow 0}$ as ${N\rightarrow \infty}$.

Construct an experiment to solve the following.

Problem 8 We saw above that for each ${t}$, the brownian motion ${X_t}$ is almost surely not differentiable. Does this imply that almost surely every ${t}$ is a point of nondifferentiability? In other words, does { for all t } and { almost surely } commute?

— 6. Compressive Sensing —

Let me conclude by discussing the experimental setup in the compressive sensing (CS) scenario. I will stick to the notation used in this Candes-Tao paper. In compressive sensing, the underlying setup is the following. One has a signal known to be sparse in some particular basis. The question is then to quantify the minimum number of “random” samples that need to be taken to guarantee that one can reconstruct the original signal with high “probability”. The problem reduces to “solving” an equation of the form ${Ax=B}$. See these slides of Terry Tao for a high level overview. The nontrivial thing about CS is in making all the above quotes mathematically precise. The two versions of results in the CS literature are the uniform results and the non-uniform results. I will now explain the difference when seen from an experimental point of view. Here is the statement taken from Tao’s blog. In the theorem below, the signal ${f:\mathbb{Z}/N\mathbb{Z}\rightarrow \mathbb{C}}$, thought of as a complex vector of size ${N}$. And ${\Lambda\subset \mathbb{Z}/N\mathbb{Z}}$ is the set of observed frequencies.

Theorem 2 Suppose ${\Lambda}$ is a random set with ${\Lambda\gg S\log N}$. Then any given ${S}$-sparse function ${f}$ will be recovered exactly by ${l^1}$ minimisation with overwhelming probability. If one makes the stronger hypothesis ${|{\Lambda}|\gg S\log^4 N}$ , then with overwhelming probability {\rm all ${S}$-sparse} functions will be recovered exactly by ${l^1}$ minimisation. (Again, ${N}$ is not required to be prime.)

The big difference between these two results from an experimental point of view is the following. In the stronger hypothesis case, one allows for both ${A}$ and ${x}$ to be random. In the weaker hypothesis case, ${x}$ is fixed and ${A}$ is chosen to be random. The weaker hypothesis experiment is

The stronger hypothesis experiment is

As one can see, the underlying sample space has dimensions

• ${\sum_{i=1}^k{N\choose i}\leq k{N\choose k}}$ in the weaker hypothesis case
• ${{N\choose |{\Lambda}|}\left(\sum_{i=1}^k{N\choose i}\right)\leq {N\choose |{\Lambda}|}k{N\choose k}}$ in the stronger hypothesis case

Therefore, the sample space is exponential in ${k}$ for the weak case and is exponential in ${|{\Lambda}|+k}$ for the strong case. The proof for the strong case for Gaussian ensembles is relatively easy because of exponential tail bounds for minimum and maximum singular values. A simple union bound argument suffices. See Section 4.1 of this paper. But because of this large dimensionality, the proof for the Fourier measurement case becomes quite nontrivial. See CT and RV for more technical details.