Apr 11, 2023
Credit: https://en.wikipedia.org/wiki/Chernoff_bound
Concentration bound studies the probability that a random variable \( X \) is bounded within a certain range (concentration). Namely, it studies the probability \( \mathrm{P}(X\le a) \) and \( \mathrm{P}(X\ge a) \) for some constant \( a \).
The Markov's Inequality claims that for non-negative random variable \( X \) with expectation \( \mathrm{E}(X) \) and some \( a>0 \), the probability we seek is bounded by
\[ \mathrm{P}(X\ge a)\le\frac{\mathrm{E}(X)}{a}. \]The intuition is that to maximize \( \mathrm{P}(X\ge a) \) for some non-negative \( X \) with expectation \( \mathrm{E}(X) \), the best approach is to make \( X=a \) with probability \( \frac{\mathrm{E}(X)}{a} \) and \( X=0 \) otherwise. This gives an expectation of \( \mathrm{E}(X) \).
When \( X \) is not non-negative, a easy way to work around is to notice that \( X^2 \) is non-negative and
\[ \mathrm{P}(X^2\ge a^2)\le\frac{\mathrm{E}(X^2)}{a^2}. \]If we center \( X \) with respect to its expectation \( \mathrm{E}(X) \), we get the Chebyshev's Inequality
\[ \mathrm{P}\left(\left|X-\mathrm{E}(X)\right|\ge a\right)=\mathrm{P}\left(\left(X-\mathrm{E}(X)\right)^2\ge a^2\right)\le\frac{\mathrm{Var}(X)}{a^2}. \]Since exponential functions are always non-negative, a more advanced technique to derive a more precise bound is to substitute \( X\ge a \) with \( e^{tX}\ge e^{ta} \) for some \( t>0 \). This give the following bound:
For every \( t>0 \)
\[ \mathrm{P}(X\ge a)=\mathrm{P}(e^{tX}\ge e^{ta})\le\frac{\mathrm{E}(e^{tX})}{e^{ta}}. \]In practice this means we can pick some \( t \) such that this inequality gives the tightest bound.
Assume I keep tossing a coin until I see \( n \) heads. Denote \( S_n \) as the number of coin tosses I make. Upper bound
\[ \mathrm{P}(S_n-2n\ge l) \]for some \( l>0 \).
We start with the Chernoff Bound
\[ \mathrm{P}(S_n-2n\ge l)\le\frac{\mathrm{E}(e^{tS_n})}{e^{t(l+2n)}}. \]Observe that
\[ \mathrm{E}(e^{tS_n})=\mathrm{E}(e^{tS_1})^n=\left(\sum_{i=1}^\infty 2^{-i}e^{ti}\right)^n=(\frac{e^t}{2-e^t})^n .\]Now plug the result back and let \( y(t)=\frac{\mathrm{E}(e^{tS_n})}{e^{t(l+2n)}}=e^{-t(l+n)}(2-e^t)^{-n} \).
We want to find \( \min_t y(t) \), so it makes sense to solve for \( y'(t)=0 \). We can do so in Mathematica:
We obtain the result \( t=\log \left(\frac{2 (l+n)}{l+2 n}\right) \). Plug it back and we get the result
\[ \mathrm{P}(S_n-1.5n\ge l)\le 2^{-l-n} \left(\frac{l+n}{l+2 n}\right)^{-l-n} \left(2-\frac{2 (l+n)}{l+2 n}\right)^{-n}. \]