The law of large numbers and the central limit theorem

 

This blog entry is mostly a precursor to my next one which is going to be a review of various Monte Carlo (MC) and Quasi Monte Carlo (qMC) methods used to compute integrals of functions of various smoothness. To understand the technical details behind MC techniques, one first needs to understand two basic results in probability theory, the law of large numbers (LLN) and the central limit theorem (CLT). There are excellent resources on the net for LLN and CLT. For example, this and this are highly recommended readings. This blog will play a complementary with figures and animations to gain an intuitive understanding.

— 1. The Law of Large Numbers —

Let {X} be a real-valued random variable and let {X_1,X_2,\dots} be an infinite sequence of independent and identically distributed copies of {X}. Let

\displaystyle \bar X_n = \frac{X_1+\dots+X_n}{n} \ \ \ \ \ (1)

 

be the empirical averages of this sequence at stage {n}. Then, we have

Theorem 1 (Weak law of large numbers (WLLN): ) Let {|X|} have finite first moment. Then, the random variable {\bar X_n}, converges in probability to {\mathbb{E}X}. In other words,

\displaystyle \lim_{n\rightarrow \infty}\mathbb{P} \left( |{\bar X_n - \mathbb{E}X}|\ge \varepsilon \right) = 0 \ \ \ \ \ (2)

 

for all {\varepsilon>0}.

Theorem 2 (Strong law of large numbers (SLLN): ) Let {|X|} have finite first moment. Then, the random variable {\bar X_n}, converges in almost surely to {\mathbb{E}X}. In other words,

\displaystyle \mathbb{P} \left( \lim_{n\rightarrow \infty}{\bar X_n = \mathbb{E}X}\right) = 1 \ \ \ \ \ (3)

Click here for the proofs of the above theorems. The SLLN implies the WLLN but not the other way around.

Convergence in probability

The figure above illustrates a sequence of random variables F_n which converge in probability to zero and which does not converge almost everywhere to zero.

Convergence almost everywhere

The figure above is an example of an almost everywhere converging sequence of random variables G_n which converges to zero.

Under a stronger moment assumption, a much simpler proof can be obtained.

Exercise 1 Using the Chebyshev’s inequality the Borel-Cantelli lemma, and assuming {\mathbb{E}|{X}|^4<\infty}, prove the strong law of large number.

As in my previous blog, let’s try to understand the WLLN and SLLN from an experimental point of view. The way to interpret LLN is as follows. The underlying sample space {\mathbb{S}} is the space of all sequences {Z = (X_1,X_2,\dots)} of infinite length. In other words, {\mathbb{S}} is the common extension consisting of all possible outcomes of all possible lengths. Given a value of {n,\varepsilon}, let {V_{n,\varepsilon} = \left\{ Z \left| |\frac{1}{n}\sum_{i=1}^nZ(i)-\mathbb{E}X |\ge\varepsilon\right. \right\}}. Then, the WLLN says that measure of {V_{n,\varepsilon}} goes to zero for all {\varepsilon>0} as {n\rightarrow\infty}. And the SLLN says that the set of all {Z\in \mathbb{S}} such that {\lim_{n\rightarrow \infty}\frac{1}{n}\sum_{i=1}^nZ(i)\neq \mathbb{E}X} has measure zero in {\mathbb{S}}.

Example 1 Let each {X_i\in\left\{ 0,1 \right\}} with equal probability (representing heads and tails of a coin flip). Then the set of all possibilities can be identified with the unit interval {\left[ 0,1 \right]} (or {\mathbb{R}}) using binary representation (for example {0.X_1X_2\dots}). The interpretaion of SLLN is that almost surely, any real number you pick will have 0 and 1 distributed equally in its binary representation.

 

— 2. The Central Limit Theorem —

We need the following definition before stating the CLT.

Definition 3 (Convergence in distribution:) The sequence of random variables {X_1,X_2,\dots} converges in distribution to a random variable {X} if the cumulative distribution functions {F_n(\lambda)} converges pointwise to the cumulative distribution function {F(\lambda)} of {X} at all {\lambda\in \mathbb{R}} for which {F} is continuous.

In other words,

\displaystyle \lim_{n\rightarrow \infty}F_n(\lambda) = F(\lambda) \ \ \ \ \ (4)

for all {\lambda} at which {F} is continuous.

Convergence in distribution does not imply convergence of distribution

As the figure above illustrates, convergence of random variables in distribution does not imply that the distribution of the random variables converge in any sense. Convergence in distribution is a very weak notion of convergence. In the above figure, the green curves correspond to the pdf and the cdf of the uniform distribution on [0,1] interval respectively. The blue curves are the analogous curves for the sequence of random variables on [0,1] interval with pdf given by

\displaystyle F_n(x) = \left(1 - \cos(2\pi nx)\right)\ \ \ \ \

Thus we see an example where the cdf converges but the pdfs do not. We are now in a position to state the Central Limit Theorem.

Theorem 4 (Central Limit Theorem) Let {X} be a real valued random variable with mean {\mu} and variance {\sigma^2}. Let {X_1,X_2,\dots} be an infinite sequence of independent and identically distributed copies of {X}. Let {S_n = X_1 + \dots + X_n}. Then, if

\displaystyle Z_n = \frac{S_n - n\mu}{\sqrt n \sigma} \ \ \ \ \ (5)

we have that {Z_n} converges in distribution to the normal distribution with mean zero and variance one.

Note that using the i.i.d assumption, we see that the mean and variance of {Z_n} is 0 and 1 respectively. What the CLT guarantees is that {Z_n} in fact converges in distribution to {N(0,1)}, the standard Normal distribution. Before proceeding, consider the following figure consisting of histograms for {Z_{n,\sigma} = \frac{S_n}{n^{\alpha}}} for various values of {\alpha} for mean zero and variance one {X}.

On the top left, is the histogram for {\frac{S_n}{n^{0.25}}}, on the top right is the histogram for {\frac{S_n}{n^{0.5}}}, the bottom left is for {\frac{S_n}{n^{0.75}}} and the bottom right is the histogram for {\frac{S_n}{n}}. This figure illustrates the fact that when {\alpha>0.5}, {Z_{n,\alpha}} converges to a deterministic number. When {\alpha<0.5}, the variance of {Z_{n,\alpha}} keeps growing and when {\alpha=0.5}, we have a balanced situation where {Z_{n,0.5}} converges to a normal distribution. The CLT says tells us that it converges in distribution to the normal distribution. It is also true that the CLT cannot be made stronger, i.e., {Z_n} cannot converge in probability or almost sure sense. (See Exercise 1 of Tao’s blog).

Here is an illustration of the CLT working in practice. The random variables are chosen to be uniformly distributed (shifted and scaled to have  mean zero and variance one).

Before concluding, we note that there is also a more quantitative version of CLT, called the Berry-Esseen Theorem which also tells us the rate at which {Z_n} converges to {N(0,1)}. To be precise, the Berry-Esseen Theorem states

Theorem 5 (Berry-Esseen Theorem) Let {X} be mean zero and variance one random variable and let {X_i} be iid copies of {X}. Let {Z_n} be {\frac{X_1+\dots+X_n}{\sqrt n}} and {\mu_n} be the distribution of {Z_n}. Then

\displaystyle \int_{-\infty}^a d\mu_n =\int_{-\infty}^a \frac{1}{\sqrt{2\pi}} \exp(-x^2/2)dx + O\left( \frac{1}{\sqrt n} (\mathbb{E}|{X}|^3) \right) \ \ \ \ \ (6)

 

where we assume that {X} has finite third moment.

 

Therefore, we see that the rate of convergence of {Z_n} to {N(0,1)} is governed by the skewness of {X}. The Berry-Esseen effect can be seen in practice as shown below.

 

In the above figure, the random variables X are chosen to have a high skew values. The Xs are generated using the MATLAB command pearsrnd with parameters chosen to be

mu = 0, sigma = 1, skew = 3, kurt = 12

Compared to the CLT illustration for the uniform distribution, we see that for the skewed distribution, the number of iterations taken before one starts seeing the CLT working is larger. We again remind the reader that convergence of pdf shown in the lower plot in the above two figures are just a coincidence not guaranteed by the CLT.

Tags: , ,

Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>