Apr 11, 2023

Concentration Bound

Credit: https://en.wikipedia.org/wiki/Chernoff_bound

Concentration bound studies the probability that a random variable \( X \) is bounded within a certain range (concentration). Namely, it studies the probability \( \mathrm{P}(X\le a) \) and \( \mathrm{P}(X\ge a) \) for some constant \( a \).

Markov's Inequality

The Markov's Inequality claims that for non-negative random variable \( X \) with expectation \( \mathrm{E}(X) \) and some \( a>0 \), the probability we seek is bounded by

\[ \mathrm{P}(X\ge a)\le\frac{\mathrm{E}(X)}{a}. \]

The intuition is that to maximize \( \mathrm{P}(X\ge a) \) for some non-negative \( X \) with expectation \( \mathrm{E}(X) \), the best approach is to make \( X=a \) with probability \( \frac{\mathrm{E}(X)}{a} \) and \( X=0 \) otherwise. This gives an expectation of \( \mathrm{E}(X) \).

Chebyshev's Inequality

When \( X \) is not non-negative, a easy way to work around is to notice that \( X^2 \) is non-negative and

\[ \mathrm{P}(X^2\ge a^2)\le\frac{\mathrm{E}(X^2)}{a^2}. \]

If we center \( X \) with respect to its expectation \( \mathrm{E}(X) \), we get the Chebyshev's Inequality

\[ \mathrm{P}\left(\left|X-\mathrm{E}(X)\right|\ge a\right)=\mathrm{P}\left(\left(X-\mathrm{E}(X)\right)^2\ge a^2\right)\le\frac{\mathrm{Var}(X)}{a^2}. \]

Chernoff Bound

Since exponential functions are always non-negative, a more advanced technique to derive a more precise bound is to substitute \( X\ge a \) with \( e^{tX}\ge e^{ta} \) for some \( t>0 \). This give the following bound:

For every \( t>0 \)

\[ \mathrm{P}(X\ge a)=\mathrm{P}(e^{tX}\ge e^{ta})\le\frac{\mathrm{E}(e^{tX})}{e^{ta}}. \]

In practice this means we can pick some \( t \) such that this inequality gives the tightest bound.

Practice Problem

Assume I keep tossing a coin until I see \( n \) heads. Denote \( S_n \) as the number of coin tosses I make. Upper bound

\[ \mathrm{P}(S_n-2n\ge l) \]

for some \( l>0 \).

Solution

We start with the Chernoff Bound

\[ \mathrm{P}(S_n-2n\ge l)\le\frac{\mathrm{E}(e^{tS_n})}{e^{t(l+2n)}}. \]

Observe that

\[ \mathrm{E}(e^{tS_n})=\mathrm{E}(e^{tS_1})^n=\left(\sum_{i=1}^\infty 2^{-i}e^{ti}\right)^n=(\frac{e^t}{2-e^t})^n .\]

Now plug the result back and let \( y(t)=\frac{\mathrm{E}(e^{tS_n})}{e^{t(l+2n)}}=e^{-t(l+n)}(2-e^t)^{-n} \).

We want to find \( \min_t y(t) \), so it makes sense to solve for \( y'(t)=0 \). We can do so in Mathematica:

Reduce[D[E^(-t (l + n)) (2 - E^t)^(-n), t] == 0 && l > 0 && n >= 1 && 0 <= t <= Log[2], t]

We obtain the result \( t=\log \left(\frac{2 (l+n)}{l+2 n}\right) \). Plug it back and we get the result

\[ \mathrm{P}(S_n-1.5n\ge l)\le 2^{-l-n} \left(\frac{l+n}{l+2 n}\right)^{-l-n} \left(2-\frac{2 (l+n)}{l+2 n}\right)^{-n}. \]