Several different sources (starting from Wikipedia) state that the Galton box is a (visual) demonstration of the Central Limit Theorem. This claim is actually bothering me a little because this result is only incidental.
Indeed, the Galton box simulates the outcomes of a binomial variable by dropping several balls across an interleaved grid of pegs, showing that when the number of balls becomes large their bottom arrangement yields a very good approximation of binomial distribution. In other words, it represents an empirical proof of the Law of Large Numbers.
On the other hand, my previous visualization shows that increasing the number of trials in a sequence of theoretical binomial distributions the normal one comes into view very quickly. This is the real sense of the Central Limit Theorem, in its simplest version. And the reason why the Galton machine works.
Let me rephrase the above distinction formally by writing that if Fm(Xn) is the sample distribution of a collection of n binomial variables which count the number of successfully events having constant probability p in m trials, then it holds, with some abuse of notation:
The first relation describes the Law of Large Numbers; the second one the Central Limit Theorem.
If you think about them statically, the Galton box points out that for a large value of n, even a moderate value for m is enough to obtain a sample distribution very close to the normal one.
But here I insist that the key word to really appreciate the meaning of the two laws is: convergence. From this point of view, in my opinion there is a misunderstanding about the Galton box: it refers to the former law but it is credited to the latter: increasing the number of the balls more and more, their distribution gets closer and closer to the binomial distribution, whose approximation to the normal one is good but it cannot get better because the number of peg levels doesn’t change.
Is it possible to imagine a different mechanism to highlight the distinction between the two convergence laws?
This is the challenge I have decided to face after writing my first visual presentation of the Central Limit Theorem. So, I’ve worked out the animated process which I am going to describe.
At the beginning, a collection of 5000 squares is arranged in a grid of 25 rows and 200 columns matched by as many underlying barriers. The squares are given a dark or light color randomly and therefore they can be viewed as the outcomes of just as many as boolean (bernullian) random variables.
Then the matrix is deconstructed: while the light squares vanish, the dark ones fall downward pushing down the buffers one step and when for each column all the squares have gone away, the buffers fall leftward building the distribution of the number of dark squares in the matrix columns.
So, the initial utter randomness of the matrix gives way to the final regolarity of the normal distribution.
Now, the crucial part: the matrix deconstruction can be done column-by-column or row-by-row.
In the first case, the matrix represents a sample of 200 columns, each of which has 25 outcomes. Throughout the fall, the buffers count the dark squares one column at a time, and hence the process simulates a sample of increasing size, from 1 to 200 observations, from a binomial distribution of 25 trials. In other words it is a Monte Carlo simulation of the convergence described by the Law of Large Numbers.
In the second case, the matrix represents a sample of 25 rows, each of which has 200 outcomes. Through the fall, the buffers count the dark squares one row at a time, and hence the process describes a sequence of samples of 200 observations from binomial variables with an increasing number of trials, from 1 to 25: this is a finite simulation of the convergence stated by the Central Limit Theorem.
Of course the result of the two ways to deconstruct the initial grid of squares is the same even if it represents a finite state of the two different convergence processes that can be described formally as below:
As an animated picture is worth a thousand words, I invite you to play the animation on its github page. It’s possible to vary the frequency of dark squares in the grid and also to choose among three different speed levels. I hope you enjoy it.
In my first version, in addition to column-by-column deconstruction (Law of Large Numbers) and row-by-row deconstruction (Central Limit Theorem) I added a third mode mainly to complete the animation more quickly than the first two cases. Such a mode hasn’t a specific meaning, and therefore I’ve dismissed it in my canvas definitive version.
Some vague information about the (il)logic of the code is available here.