This blog entry is mostly a precursor to my next one which is going to be a review of various Monte Carlo (MC) and Quasi Monte Carlo (qMC) methods used to compute integrals of functions of various smoothness. To understand the technical details behind MC techniques, one first needs to understand two basic results in probability theory, the law of large numbers (LLN) and the central limit theorem (CLT). There are excellent resources on the net for LLN and CLT. For example, this and this are highly recommended readings. This blog will play a complementary with figures and animations to gain an intuitive understanding.
— 1. The Law of Large Numbers —
be the empirical averages of this sequence at stage . Then, we have
Click here for the proofs of the above theorems. The SLLN implies the WLLN but not the other way around.
The figure above illustrates a sequence of random variables which converge in probability to zero and which does not converge almost everywhere to zero.
The figure above is an example of an almost everywhere converging sequence of random variables which converges to zero.
Under a stronger moment assumption, a much simpler proof can be obtained.
As in my previous blog, let’s try to understand the WLLN and SLLN from an experimental point of view. The way to interpret LLN is as follows. The underlying sample space is the space of all sequences of infinite length. In other words, is the common extension consisting of all possible outcomes of all possible lengths. Given a value of , let . Then, the WLLN says that measure of goes to zero for all as . And the SLLN says that the set of all such that has measure zero in .
Example 1 Let each with equal probability (representing heads and tails of a coin flip). Then the set of all possibilities can be identified with the unit interval (or ) using binary representation (for example ). The interpretaion of SLLN is that almost surely, any real number you pick will have 0 and 1 distributed equally in its binary representation.
— 2. The Central Limit Theorem —
We need the following definition before stating the CLT.
Definition 3 (Convergence in distribution:) The sequence of random variables converges in distribution to a random variable if the cumulative distribution functions converges pointwise to the cumulative distribution function of at all for which is continuous.
As the figure above illustrates, convergence of random variables in distribution does not imply that the distribution of the random variables converge in any sense. Convergence in distribution is a very weak notion of convergence. In the above figure, the green curves correspond to the pdf and the cdf of the uniform distribution on interval respectively. The blue curves are the analogous curves for the sequence of random variables on interval with pdf given by
Thus we see an example where the cdf converges but the pdfs do not. We are now in a position to state the Central Limit Theorem.
Note that using the i.i.d assumption, we see that the mean and variance of is 0 and 1 respectively. What the CLT guarantees is that in fact converges in distribution to , the standard Normal distribution. Before proceeding, consider the following figure consisting of histograms for for various values of for mean zero and variance one .
On the top left, is the histogram for , on the top right is the histogram for , the bottom left is for and the bottom right is the histogram for . This figure illustrates the fact that when , converges to a deterministic number. When , the variance of keeps growing and when , we have a balanced situation where converges to a normal distribution. The CLT says tells us that it converges in distribution to the normal distribution. It is also true that the CLT cannot be made stronger, i.e., cannot converge in probability or almost sure sense. (See Exercise 1 of Tao’s blog).
Here is an illustration of the CLT working in practice. The random variables are chosen to be uniformly distributed (shifted and scaled to have mean zero and variance one).
Before concluding, we note that there is also a more quantitative version of CLT, called the Berry-Esseen Theorem which also tells us the rate at which converges to . To be precise, the Berry-Esseen Theorem states
Therefore, we see that the rate of convergence of to is governed by the skewness of . The Berry-Esseen effect can be seen in practice as shown below.
In the above figure, the random variables are chosen to have a high skew values. The s are generated using the MATLAB command pearsrnd with parameters chosen to be
mu = 0, sigma = 1, skew = 3, kurt = 12
Compared to the CLT illustration for the uniform distribution, we see that for the skewed distribution, the number of iterations taken before one starts seeing the CLT working is larger. We again remind the reader that convergence of pdf shown in the lower plot in the above two figures are just a coincidence not guaranteed by the CLT.