In my theoretical statistics class we’re covering tail bounds. Tail bounds are inequalities that give an upper or lower bound on the probability that a random variable X is larger than some value, or the probability that X deviates from its mean by some value. These are useful for proving other inequalities and when we don’t know/want to compute an exact tail probability.
Tail bounds are tricky because there often isn’t much intuition put into deriving them; it’s simply algebra and optimization. The professor keeps throwing around words like “squared exponential decay” as though he has some intuition for the behavior of different tail bounds, so I wanted to actually visualize them and see how they behave myself.
The most simple tail bounds are based on the moments of a random variable. Most people learn these in elementary probability. They include Markov’s inequality for non-negative random variables,
Chebyshev’s inequality which is obtained by manipulating Markov’s,
and Chernoff’s bound, which uses the moment generating function (MGF) instead of the raw moments
Notice that the Chernoff bound involves an optimization problem. If we know something about the MGF of X, we can solve for a good choice of to make the bound tighter (closer to the truth).
Sub-Gaussian random variables are those whose MGF is dominated by the MGF of a Gaussian random variable. Alternatively, you can think of it as a random variable whose tails decay faster than the normal distribution or whose distribution can be dominated by a normal distribution with some correctly-specified mean and variance. There are special tail bounds for sub-Gaussians: see the sub-Gaussian bound, Hoeffding bound, etc. Sub-exponential random variables are a relaxation of sub-Gaussianity; their MGF only needs to be dominated by a Gaussian MGF in a finite interval, rather than on the whole real line. They too have their own special tail bounds: the sub-exponential bound, Bernstein bound, etc. What it all boils down to is that sub-Gaussian tails decay faster than sub-exponentials: on the order of versus . (I will refer you to our class text if you want details on any of these bounds).
The first plot compares a sub-Gaussian distribution (normal) with a sub-exponential (Cauchy). The comparison of densities shows that the Cauchy distribution has heavier tails and we’re more likely to sample from the tail of the Cauchy than the normal. The tail probability drops off sharply for the normal – this is the “squared exponential decay”.
I further looked at two-sided tail bounds for a normal distribution (sub-Gaussian) and a chi-squared distribution (sub-exponential). The Chernoff bound in these two examples is tighter as we go further out into the tails – wonder if this is a general result. (code for the plots is available on GitHub)