The Lebesgue Integral: a Curious Tea Party (Part 2)

In our last post, we described an interesting Tea Party with rather strange rules governing the use of sugar. As a quick reminder the set up was:

Each table has two sugar bowls. The first contains 5 sugar cubes (each made of one teaspoon of sugar) and the second contains 5 teaspoons of infinitesimally small granules of sugar. The rules of the party state that each table must use all 5 cubes before using any of the granulated sugar.

The interesting distribution  for the usage of this sugar can be seen below:

A distribution that is both discrete and continous

A distribution that is both discrete and continous

What makes this situation interesting is that this probability problem consists of both discrete and continuous parts. For all of the probability problems we have covered we used one way of thinking about discrete probability and another for continuous probability, but now what do we do?

The Integral Thing!

We want a unified view of both discrete and continuous probability. To do this we need to solve both discrete and continuous probability problems in such a way that we can solve them individually or together at the same time. When it comes down to it, almost everything we do in Probability is really "summing things up". If we have a consistent way to add up parts of our probability distribution, then we're done!

For Discrete Probability Distributions, we solve this problem by adding up each of our discrete possibilities leading to a Probability Mass Function, and for Continuous Probability Distributions we just integrate over a Probability Density Function. So how are we going to combine these two? What we need is a more powerful Integral that can seamlessly combine these two forms of adding things together.

We've already discussed how awesome the Riemann Integral is for solving problems that the Fundamental Theorem of Calculus can't, so maybe it can solve our problem!

Limitations of the Riemann Integral

In our post on the Riemann Integral we discussed how can integrate certain functions, like the absolute value function, by approximating the Area Under the Curve.  We can imagine this integration as building a bunch of towers under a function.

The Riemann Integral can be visualized as building towers under the curve

The Riemann Integral can be visualized as building towers under the curve

If we better approximate the true area under the curve as we shrink the width of the towers, then our function is Riemann integrable.  It is important to remember that everything that we can integrate using the Fundamental Theorem of Calculus we can also integrate using the Riemann Integral. If the Riemann Integral can't solve our problem, then we know the Fundamental Theorem of Calculus also won't help us either.

Using the "tiny towers" understanding of the Riemann Integral it seems like we're in luck! After all look at this image from the Wikipedia article on Discrete Probability Distributions, it's literally a tower of dice!

We frequently represent Discrete distributions as bar charts, if we can build a bar chart then clearly we can build a tower! But before we accept this conclusion we have to think about what's really happening here. When we picture Discrete distributions as bar charts, we're  pretending that they are behaving like a Continuous Function. We do this looking at each discrete value as a range of values and treating them all the same. 

Sometimes pretending discrete distributions are continuous works....

Sometimes pretending discrete distributions are continuous works....

In the image above we're really saying that all values \([0,1)\) are the same as \(P(0)\), and \([1,2)\) the same as \(P(1)\) etc. We can use the Riemann Integral to take the area of each step multiplied by its height, \(P(x)\) then we can integrate just like a typically continuous function! For convenience we're being a little sloppy with what we're saying, but is seems to work. This blog is hardly one to complain about taking some shortcuts for a clear understanding, so what's the problem?

We've already agreed that we need to have some very clearly defined ideas of Events \(\mathcal{F}\) and Sample Spaces \(\Omega\) in order to make sure we're talking about something sane. If we pretend Discrete functions are continuous in this way it only works when the values [0,5] are all part of our Event Space \(\mathcal{F}\). However, we went through a lot of troubles to specifically say that this is not the case!

To stick with our example of the tea party: to use the continuous approximation trick we have to say that 3.25 teaspoons is a possible amount of sugar, but our house rules explicitly forbid this! We can't build towers under our function because all of our discrete values would approach 0 as the base of the towers shrink since our rules disallow the in-between values to be counted. This would lead to very wrong answers if we used the Riemann Integral to reason about our Probability Space.

Finally the Lebesgue Integral

 The Lebesgue Integral is a very sophisticated way to arrive at the straight-forward conclusion: "Hey let's add the discrete parts up and then integrate over the continuous parts and put them together!"

"Ever get the feeling you've been cheated?"

"Ever get the feeling you've been cheated?"

This solution might almost seem too easy and obvious! For all the build up this is the heart of how the Lebesgue Integral handles this problem. The hard work in coming up with the Lebesgue Integral is the idea of a Measurable Space in the first place. We know that our \(\Omega\) (the possible uses of sugar) is measurable because we can build \(\mathcal{F}\). Because \(\Omega\) is measurable then the pieces that make it up must also be measurable. Once again mathematically saying "yes we can measure this" has some very practical implications. It allows us to combine our intuition of how to solve this problem with mathematical formalism. If we can measure, in the most practical sense, the distribution of the cube sugar bowl and the distribution of granular sugar bowl by combining both measurements we can measure them together.

If you are a dedicated reader of the blog, you might recall we touched on the Lebesgue Integral many posts ago when covering Expectation and Variance from High School to Grad School. In that post we end up with the Lebesgue Integral as our final model of Expectation:
$$E[X] =\int_{\Omega} X(\omega)P(d\omega)$$
This formula has some slightly different notation (though this notation is not necessarily universal) than we are used to when dealing with Integration. For starters, there's just the \(\Omega\) at the bottom, rather than lower and upper bounds. This serves to remind us that we are looking at a set and not simply a function. Next we have that \(P(d\omega)\), why do we have that little \(d\) there? My reasoning is that this is a reminder that not every "little piece" here means the same thing as it does when we're dealing with a traditional notion of Integration. Normally \(dx\) can be imagined as an infinitesimally small piece of \(x\), but for our discrete values this clearly doesn't work. Not all \(d\)s are alike, so notationally we have a reminder that when we're integrating we have to be careful.


While the intuition behind the Lebesgue Integral may seem rather simple, the mathematics building it up are not. As with all of this blog's posts regarding Measure Theory we've tried to convey the heart of complex mathematical ideas without getting lost in the details. If you would like a more formal dive into the topic, I can't recommend Lebesgue Integration on Euclidean Space enough! 

For those not interested in the rigorous Mathematics, this whole journey may seem a bit silly, summing over the discrete parts and integrating the continuous parts seems obvious, why make it hard at all? Making the less elegant but functional solution hold for rigorous mathematics is in fact very difficult, but the value is that now we have a formal way to use this more obvious notion of the integral. The deeper into abstract math one goes the less raw calculation matters, but the more robust generalizations are needed. The Lebesgue Integral gives us an extraordinarily powerful formalism that allows us to reason about Integration in many places where it would formerly be quite difficult, including some not particularly hard to find problems in Probability.

If you enjoyed this post please subscribe to keep up to date and follow @willkurt!