This post is a brief overview of the technique introduced in Joel’s User Friendly technical report (the User Friendly preprint is more intricate, but also more powerful, because it generalizes the results in the technical report to the case of martingales). I realize I’ve mentioned this paper in passing in several posts. I’m pretty obsessed with it at this point— not only is it the major tool in my current research project, but it’s a really powerful and simple idea.
Recall the Laplace transform method (aka the Chernoff bounding method) for bounding sums of scalar random variables: use Markov’s inequality and the fact that the moment generating function of the sum splits into the product of the moment generating functions of the individual summands:
Assuming that the calculations are tractable, from here you simply optimize over
to get the best such tail bound.
Attempting to generalize this to the case where the random variables are independent random self-adjoint matrices, and considering events of the type
, you immediately run into the problem that the ersatz moment generating function
does not turn sums into products.
Where do we go from here?
Well, it turns out that the right thing to do is apply the spectral mapping theorem to see that
and then dominate the maximum eigenvalue with the trace; this gives
At this point, the clever realization of the paper is that we can get good results by invoking a deep technical result due to Lieb (see Appendix A of Rusaki’s review of inequalities for quantum entropy for a flawed but conceptually enlightening proof) : if
is a fixed self-adjoint matrix then the function
is concave on the positive definite cone, so if
is a random positive-definite matrix, then
Of course this fact remains true if we use conditional expectation.
Now assume we have bounds on the cumulant generating functions of the summands, of the form
, where
is a psd matrix for each
. This is equivalent to having bounds on the moment generating functions of the summands,
.
Using conditional expectation and the technical result above, Joel shows that we have
: let
denote expectation conditional on
. Then
If we overestimate the trace with the maximum eigenvalue and apply the spectral mapping theorem once more, we get the generic bound
where
is the size of the summands.
To apply this generic bound, you have to find
– a simple procedure for the cases considered in the technical report–, which depend upon assumptions you make about the summands: e.g. if they’re bounded or how their moments grow. Then you can optimize over
to find the best such bounds.
See the technical report for the generalization of several standard scalar probability inequalities, like upper and lower Chernoff bounds for sums of bounded random matrices, and the Bernstein bound for sums of bounded random matrices with finite “variance”.
Possibly relevant posts:
No tags for this post.