somewhere near the beginning.

One direction in Khintchine’s inequality for Rademacher sums

Filed under: Mathematics — Alex @ 12:11 am 9/26/2008

The Khintchine inequality for Rademacher sums says that given a real vector a = (a_1, \ldots, a_n), the p-th moment of the random sum  \sum_{i=1}^n \epsilon_i a_i is equivalent to the length of a:

 \displaystyle A_p \|a\|_2 \leq \left( \mathbb{E} \left| \sum_{i=1}^n \epsilon_i a_i \right|^p \right)^{1/p} \leq B_p \|a\|_2,

where A_p, B_p are constants which depend on p. Here the expectation is taken w.r.t to the \epsilon_i, which are i.i.d Bernoulli \pm 1 r.v.s — i.e. Rademacher r.v.s

Today I’ll derive the upper bound; when/if I figure out the lower bound, I’ll post on it later. There are two approaches to the upper bound that I know of. The simplest cheats, in that it appeals to Khintchine’s inequality for Gaussian sums, but at the same time, it illustrates a connection between gaussian r.v.s and Rademacher r.v.s that is often useful. Khintchine’s inequality for Gaussian sums is

 \displaystyle A_p^\prime \|a\|_2 \leq \left( \mathbb{E} \left| \sum_{i=1}^n \nu_i a_i \right|^p \right)^{1/p} \leq B_p^\prime \|a\|_2,

where the \nu_i are \mathcal{N}(0,1).

Note that since Gaussian r.v.s are symmetric, \nu_i and \epsilon_i |\nu_i| have the same distribution; therefore we can replace each \nu_i with \epsilon_i |\nu_i| in the inequality, and apply Jensen’s inequality to bring the conditional expectation with respect to the |\nu_i| inside the absolute values. That is,

 \displaystyle \mathbb{E} \left| \sum_{i=1}^n \nu_i a_i \right|^p = \mathbb{E}_\epsilon \mathbb{E}_{|\nu|} \left| \sum_{i=1}^n \epsilon_i |\nu_i| a_i \right|^p \geq \sqrt{\pi/2} \mathbb{E} \left| \sum_{i=1}^n \epsilon_i a_i \right|^p ,

where we use the fact that \mathbb{E} |\nu_i| = \sqrt{\pi/2} . This implies that the upper Khintchine bound holds for Rademacher variables with B_p at most \sqrt{2/\pi} B_p^\prime. The Rademacher-Gaussian connection is a powerful tool: e.g. Latala uses it to extend a bound on the norm of a random matrix with independent mean-zero gaussian entries to one that applies to virtually *any* random matrix with independent mean-zero entries.

I suspect, but haven’t given it any thought, that Khintchine’s inequality probably holds for any sequence of i.i.d. subgaussian variables.

The second approach, the one from scratch, is to note that you have a Chernoff bound which is a differential form of the upper Khintchine bound (I think this form is also sometimes called a Khintchine inequality):

P\left(\left|\sum_{i=1}^n \epsilon_i a_i \right| > C\right) \leq 2 \exp(-C^2/(2 \|a\|_2^2))

The derivation of this inequality is standard: note P\left(\left|\sum_{i=1}^n \epsilon_i a_i \right| > C\right) = 2 P\left(\sum_{i=1}^n \epsilon_i a_i > C\right) then use Markov’s inequality to bound this by  2 \mathbb{E} \exp\left( \lambda \left(\sum_{i=1}^n \epsilon_i a_i\right)\right)/ \exp(\lambda C) where the \lambda is to be determined. By independence, the numerator factors:

 2 \displaystyle \prod_{i=1}^n \mathbb{E} \exp(\lambda \epsilon_i a_i) = 2 \prod_{i=1}^n \cosh(\lambda a_i) \leq 2 \prod_{i=1}^n (1+(\lambda a_i)^2/2)

which we dominate by \exp(\lambda^2 \|a\|_2^2). Plugging this back in and finding the \lambda which minimizes the bound gives us the desired Chernoff bound.

From this Chernoff bound, we have

P\left(\left|\sum_{i=1}^n \epsilon_i a_i \right|^p > C\right) \leq 2 \exp(-C^{2/p}/(2 \|a\|_2^2)),

so using the integral expression for expectation, \mathbb{E}|X|^p = \int_0^\infty P(|X|^p > C) \,dC, and doing some simplifications of the integral, we wind up with

 \left( \mathbb{E} \left| \sum_{i=1}^n \epsilon_i a_i \right|^p \right)^{1/p} \leq 2^{1/2} \left( \Gamma(1+p/2) \right)^{1/p} \|a\|_2.

Possibly relevant posts:

1 Comment »

RSS feed for comments on this post. TrackBack URL

Leave a comment