One direction in Khintchine’s inequality for Rademacher sums
The Khintchine inequality for Rademacher sums says that given a real vector
the
-th moment of the random sum
is equivalent to the length of
:
where
are constants which depend on
. Here the expectation is taken w.r.t to the
, which are i.i.d Bernoulli
r.v.s — i.e. Rademacher r.v.s
Today I’ll derive the upper bound; when/if I figure out the lower bound, I’ll post on it later. There are two approaches to the upper bound that I know of. The simplest cheats, in that it appeals to Khintchine’s inequality for Gaussian sums, but at the same time, it illustrates a connection between gaussian r.v.s and Rademacher r.v.s that is often useful. Khintchine’s inequality for Gaussian sums is
where the
are
.
Note that since Gaussian r.v.s are symmetric,
and
have the same distribution; therefore we can replace each
with
in the inequality, and apply Jensen’s inequality to bring the conditional expectation with respect to the
inside the absolute values. That is,
,
where we use the fact that
. This implies that the upper Khintchine bound holds for Rademacher variables with
at most
The Rademacher-Gaussian connection is a powerful tool: e.g. Latala uses it to extend a bound on the norm of a random matrix with independent mean-zero gaussian entries to one that applies to virtually *any* random matrix with independent mean-zero entries.
I suspect, but haven’t given it any thought, that Khintchine’s inequality probably holds for any sequence of i.i.d. subgaussian variables.
The second approach, the one from scratch, is to note that you have a Chernoff bound which is a differential form of the upper Khintchine bound (I think this form is also sometimes called a Khintchine inequality):
The derivation of this inequality is standard: note
then use Markov’s inequality to bound this by
where the
is to be determined. By independence, the numerator factors:
which we dominate by
Plugging this back in and finding the
which minimizes the bound gives us the desired Chernoff bound.
From this Chernoff bound, we have
so using the integral expression for expectation,
and doing some simplifications of the integral, we wind up with
of bits
At each time
the forecaster first makes his guess
for
Then the true bit is revealed and the forecaster finds out whether his prediction was correct. To compute
the forecaster listens to the advice of
experts. This advice takes the form of a binary vector
where
is the prediction that expert
makes for the next bit. Assume we are told in advance that, on this particular sequence of outcomes, there is an expert
for some
but we do not know for which
that is, to bound the number of mistakes made by the forecaster.
mistakes, you know who the prescient expert is, so it is less than this. In fact, it is also much less than
– as one of the people I shared this problem with said, that really only leaves one choice
approximates 
the
norms approach the
here’s a quick justification.
approximately one; I took ‘approximate’ to mean bounded above and below by constants. Each summand is lesser than or equal to one, so the most the quotient can be is
and the least it can be is one. So really, we just want to find a small
is a bounded by a constant; the easiest choice is
, in which case we have 

matrix
, we call the directed graph
on
which contains edge
iff
the structure of
This notion is useful in designing algorithms for sparse matrix computation: given only the structure of the input, you’d like to determine the worst-case structure of the output, so you know which entries you need to compute. Or in graph theoretical terms, given
and a function
on matrices, you’d like to compute
,
is the complete graph, which means
of
iff there is a path from
in
.
. Just from simulations, I’m getting that the transitive closure is with high probability the complete directed graph.
people are removed from
be an increasing, convex function,
be a compact set, and
be real contractions such that
, then
with the identity, this says that if we shrink the coordinates and then take random Bernoulli averages, the expected value of the maximum deviation is going to be less than that for the original coordinates. Therefore, it makes sense to expect that an appropriately modified version of this theorem holds for the case where
and the
are complex contractions.
which satisfies, for a given 
exists without explicitly specifying it, using the marriage theorem.
.) But the question is, is it even reasonable to expect this?
, is
, where
is a positive diagonal matrix satisfying
?
, the inequality is true, which is why it might be reasonable to expect it in the general case.