The Noncommutative Bernstein inequality: the right tool for the job
I spent several days trying to compute how many random rank one sign matrices (each entry in
) you need to ensure that the projection of a fixed matrix onto their span,
satisfies
with high probability for some
fixed.
First I noticed that the easiest way to approach this is to ask what
needs to be for
to be full rank; equivalently, take
Then I noticed that since
(by the SVD), this is equivalent to asking when
is full rank.
My first approach, to bound the moments of
using symmetrization and the trace method to reduce to estimation of
, then Markov's inequality to get a tail bound, was promising at first. Since each
where
and
are uniformly distributed random sign vectors, a lot of cancellation occurs, and basically you're left with a counting problem (count which terms in the expectation don't vanish). Unfortunately, this counting problem's a tough nut to crack, at least for me--- attempting to do so is what led me to the nasty recursion I mentioned in a previous post.
Then I came across Recht's paper, "A Simpler Approach to Matrix Completion", in which he proves the following Noncommutative Bernstein inequality:
Let
be independent zero-mean random matrices of dimension
. Suppose
and
almost surely for all
Then for any
![]()
(apparently related Operator Bernstein and Vector Bernstein inequalities are provided in Gross’s “Recovering low-rank matrices from few coefficients in any basis.”)
With this tool, the problem’s trivial: take
. Then
and by plugging through the calculations, when 
is small, so
is full rank with high probability.
I’m satisfied with the sharpness of this estimate for
: the smallest any collection of matrices spanning
could be is
, and it seems unlikely that with the restriction of them being rank sign matrices, you’d easily find such a collection of random matrices. The
factor seems like a reasonable price to pay; in fact, because of the coupon collector phenomena, maybe it’s the same price you’d need to pay for a collection of random matrices selected from the standard euclidean basis matrices. In that light, this is an unexpectedly good result.
Possibly relevant posts:
- One direction in Khintchine’s inequality for Rademacher sums (9/26/2008)
- A lower bound on the
norm of a useful class of matrices (1/12/2009) - Is uniqueness important? (2/17/2010)
be independent zero-mean random matrices of dimension
. Suppose
and
almost surely for all
Then for any 
, the sum of the top
eigenvalues of
, as the SDP
is convex (since a function of the form
, i.e. the supremum of linear functions, is convex). Since a convex function on an open set is continuous, we see that
is continuous on the positive semidefinite cone, so
is continuous on the same set also.
norm of a PSD matrix
of having a new roommate, so naturally I’m planning what I’ll do with the extra $625/month I’ll be saving. Guffaw. The first two months are spoken for already— it’s past time for new glasses, and I haven’t paid off my Caltech dining account in several months—, then I’ll start saving up to get a car, since it looks like I’m going to be here another 3 years or so. After that, I dream of actually adding to my savings account for the first time in over a year.
which realizes
and then round the entries of
to signs somehow, take
to be an appropriate scaling matrix, and use
as your approximate sign decomposition of
)– I implemented this method, simply replacing the entries of
with reasonable probability at each step (take this with a grain of salt: I think this is easy to show, but I've not checked that it's true), which is all that you can hope to say even if you use the
decomposition. That is to say, there's nothing inherently special or useful about the
, so
. Therefore we can find a wide range of factorizations achieving
: take
to be any orthonormal matrix. So there's definitely not anything special about the factors, or the signs you'd round them to, in the general case.
can be far from sparse), although the factorizations I've seen returned with my particular greedy scheme have reasonably many terms, there's no kind of guarantee of sparseness.
norm ball
can be more simply written as
subject to
,
which solves this seems to be in the
1/2-ball. It’s easy to see that this ball is included in the feasible points, but no reason to think that it is the entire set of feasible points.
s.t.
,
.
is the SDP
s.t.
.
,

,
, then
and its components break down as follows:
,
,
,
,
, and 
is a matrix in the unit
norm