Eigenvalues of sums of random matrices

Say I have a sum of random Hermitian matrices  X = \sum_j X_j . I’d like to know what the k-th eigenvalue of X tends to look like. In the case that the X_j are positive-semidefinite, the upper and lower Chernoff tail bounds in User Friendly Tail Bounds bound the relative deviation of \lambda_1(X) from \lambda_1(\mathbb{E}X) and \lambda_n(X) from \lambda_n(\mathbb{E}X) respectively, so you might expect that \lambda_k tends to look like \lambda_k(\mathbb{E}X). However, it’s easy to see that this isn’t the case, even for \lambda_n and \lambda_1: assume \mathbb{E}X has all eigenvalues equal to c. Then it seems that an instance of X is likely to have \lambda_1 > c and \lambda_n < c . More formally, we expect the same results from Jensen's inequality.

To illustrate my point graphically, I took A to be the first m rows of an m^2 \times m^2 Fourier matrix; let a_j refer to the j-th column of A, and take X = \sum_j \delta_j a_ja_j^t where \delta_j are i.i.d Bernoulli(p) random variables. Then the eigenvalues of X are the squares of the singular values of a matrix you get by tossing out each of the columns of A independently with probability 1-p. On average, the singular values of this column subsampled version of A are all \sqrt{p}, but you can see that empirically the extreme singular values do not cluster around \sqrt{p} (indicated by the red lines).

empirical singular value distribution of column subsampled fat submatrix of a fourier matrix

empirical singular value distribution of a column subsampled matrix with 30 orthogonal rows

It seems that whatever \lambda_k(X) clusters around is comparable to \lambda_k(\mathbb{E}X) from the bounds in User Friendly, but it’s not clear that one can’t do much better.

Possibly relevant posts:

Tags: , , , , , ,

Positivity of a “Matrix Grammian” and other such quantities

Let A_1, \ldots, A_m be conformal matrices, then it’s clear that the matrix


\displaystyle
[\text{Tr}(A_i^\star A_j)]_{i,j=1,\ldots,m}

is positive semidefinite, because if you consider A_1, \ldots, A_m as vectors, it is their Grammian.

It is not true in general that for a fixed positive integer k


\displaystyle
[\text{Tr}((A_i^\star A_j)^k)]_{i,j=1,\ldots,m}

is positive semidefinite. However, it might be true that if we add these matrices for each k together that the result is positive. So the question is, if A_1, \ldots, A_m are strictly contractive, is


\displaystyle
\begin{gathered}
\sum\nolimits_{k=0}^\infty [ \text{Tr}((A_i^\star A_j)^k)]_{i,j=1,\ldots,m} = \\
\left[\text{Tr}\left(\sum\nolimits_{k=0}^\infty (A_i^\star A_j)^k \right)\right]_{i,j=1,\ldots,m} = \\
\left[\text{Tr}\left(I - A_i^\star A_j)^{-1} \right)\right]_{i,j=1,\ldots,m}
\end{gathered}

positive semidefinite?

One reason to believe this may be true is that it is true if the A_i are scalars in the open unit disc. In that case, we are looking at the matrix


\displaystyle
\left[\frac{1}{1-A_i^\star A_j} \right]_{i,j=1,\ldots,m} = \sum_{k=0}^\infty [ (A_i^\star A_j)^k]_{i,j=1,\ldots,m}.

This is the sum of Hadamard product powers of the grammian of the vector (A_1, \ldots, A_m). By the Schur Product Theorem (which states that the Hadamard product of two positive semidefinite matrices is positive semidefinite), we see that the matrix is positive semidefinite.

Possibly relevant posts:

Tags: , ,
Aug 3rd, 2010 | Filed under Mathematics

Failed WordPress automatic updates

For the past couple weeks, my WordPress installation hasn’t been able to automatically update itself via ftp. I got messages like “Error: There was an error connecting to the server, Please verify the settings are correct.” I didn’t find any useful entries in the wordpress.org forums.

It turns out the problem was with the ProFTP setup on my sever. It was configured in standalone mode when it should have been in inetd mode — I have no idea how this got changed, maybe through an automatic upgrade? At any rate, it’s easy enough to check if this is your problem, and then to fix it. To see if this is your problem, check your ProFTP log, in my case, in /var/log/proftpd/proftpd.log; if you see messages like

Aug 02 17:45:06 tangentspace.net proftpd[21060] biharmonic.org: Failed binding to ::, port 21: Address already in use
Aug 02 17:45:06 tangentspace.net proftpd[21060] biharmonic.org: Check the ServerType directive to ensure you are configured correctly.

then you need to open up your proftp configuration file, in my case /etc/proftpd/proftpd.conf, and set your ServerType to ‘inetd’ instead of ‘standalone’. As I understand it, the issue is that inetd has a lock on port 21, so it has to call ProFTP when a call comes in, instead of ProFTP trying to control the port itself.

Possibly relevant posts:

No tags for this post.
Aug 2nd, 2010 | Filed under General, Meta
Tags:

Spectral spread of Graphs

Nikiforov’s July 2010 arXiv paper on the application of Ky Fan norms to graphs poses, among others, the following question: what is the maximal spectral spread of a simple graph of order n?

Here when we refer to the spectrum of a graph, we mean the spectrum of the adjacency matrix of that graph, so we are interested in finding

 \displaystyle
\chi(n) = \max_{|v(G)| = n} \mu_1(G) - \mu_n(G).

The adjacency matrix of a graph is symmetric, so its singular values are simply the absolute values of its eigenvalues. Recall that the Ky Fan k-norm of a matrix is the sum of its top k singular values; it follows that the Ky Fan k-norm of a graph is the sum of the largest k moduli of its eigenvalues. In particular, the Ky Fan two-norm of a graph is given by

 \displaystyle
\|G\|_{(2)} = \max\{\mu_1(G) + |\mu_2(G)|, \mu_1(G) + |\mu_n(G)| \}.

The paper shows that in fact, if n is sufficiently large, then in the graph which achieves \chi(n), |\mu_n(G)| > |\mu_2(G)| so that \|G\|_{(2)} = \mu_1(G) - \mu_n(G) , and the spectral spread of the graph is given by the Ky-fan two norm. So whatever we know about Ky Fan norms can be used.

Here’s a possible avenue towards answering this question, almost entirely due to the observations of one of my colleagues. First, it turns out that the Ky Fan k-norm is the infimal convolution of the trace norm (aka the Ky Fan n-norm) and a scaled spectral norm (aka the Ky Fan 1-norm):


\displaystyle
\|A\|_{(k)} = \inf \{ \|U\|_{(n)} + k \|V\|_{(1)} \,:\, A = U + V \}.

Some results on infimal convolutions  f = \|\cdot\|_1 \diamond \|\cdot\|_2 of norms:

  1. f is a norm
  2. \displaystyle f^\star = \|\cdot\|_1^\star + \|\cdot\|_2^\star = I_{B_{\|\cdot\|_1^\star}} + I_{B_{\|\cdot\|_2^\star}} = I_{B_{\|\cdot\|_1^\star} \cap B_{\|\cdot\|_2^\star}}
  3. where I_A(x) = \begin{cases} 0 & x \in A \\ \infty & x \not\in A \end{cases} is the convex indicator function of a set.

which imply that the unit ball of the norm (trace) dual to f is B_{\|\cdot\|_1^\star} \cap B_{\|\cdot\|_2^\star}. The spectral and trace norms are trace dual, so we have


\displaystyle
B_{\|\cdot\|_{(k)}^\star} = B_{\|\cdot\|_{(2)}} \cap \left(k B_{\|\cdot\|_{(n)}}\right).

Pulling these results together, for n sufficiently large,


\displaystyle
\begin{aligned}
\chi(n) & = \max_{|v(G)| = n} \mu_1(G) - \mu_n(G) = \max_{A} \|A\|_{(2)} = \max_{A} \max_{D \in B_{\|\cdot\|_{(2)}^\star}} \langle A, D \rangle \\
 & = \max_{D \in B_{\|\cdot\|_{(2)}^\star}} \max_A \langle A, D \rangle,
\end{aligned}

where A is in the set of symmetric hollow 0-1 matrices. Since our ambient space is the set of symmetric matrices, the D are also symmetric. It follows that we can achieve the maximum over A by choosing A_{ij} to be 0 zero when D_{ij} is negative, and 1 otherwise. Recalling that A is also hollow, this gives


\displaystyle
\chi(n) = \max_{D \in B_{\|\cdot\|_{(2)}^\star}} \sum_{i\neq j} D_{ij}^{+} = \max_{\{D \,:\, \|D\|_{(1)} \leq 1 \text { and } \|D\|_{(n)} \leq 2\}} \sum_{i \neq j} D_{ij}^+.

This seems like a more geometric reformulation of the problem. Potentially, this bound could let us find lower bounds for \chi(n) by choosing particular choices of D, and could even lead to a direct resolution of the problem.

A side note: if J : X \hookrightarrow Y is the inclusion operator between X, the space of hollow symmetric matrices with the \|\cdot\|_{1 \rightarrow \infty} norm, and Y, the space of symmetric matrices with the Ky Fan 2-norm, then \|J\| \leq 2 \chi(n). I doubt this will ever be a useful observation :)

Possibly relevant posts:

Tags: , , , , , , ,
Jul 30th, 2010 | Filed under Mathematics

Sharpness of Moment bounds, exponential example

Just for fun, let’s compare the tail bounds you get from the moment method with the explicit tail bound of the exponential distribution.

First, we need the moments: if X \sim \text{Exp}(\alpha), then \mathbb{E} X^n = \tfrac{n!}{\alpha^n} for n \geq 1. You can compute these using induction and integration by parts. By Markov’s inequality and Stirling’s approximation,


\displaystyle
\begin{aligned}
\mathbb{P}(X \geq t) & = \mathbb{P}(X^n \geq t^n) \leq \frac{\mathbb{E}X^n}{t^n} = \frac{n!}{(\alpha t)^n} \\ &< \sqrt{2\pi n} \left(\frac{n}{\alpha t e}\right)^n = e^{[\log(2\pi) + \log(n)]/2 + n\log(n) - n\log(\alpha t e)}
\end{aligned}

for n sufficiently large.

The derivative of the final quantity is


\displaystyle
\left(\frac{1}{2n} + \log(n) - \log(\alpha t) \right) e^{[\log(2\pi) + \log(n)]/2 + n\log(n) - n\log(\alpha t e)};

if we consider only n \geq 1, the quantity in the parentheses is monotonic:


\displaystyle
\frac{\partial}{\partial n} \left(\frac{1}{2n} + \log(n) \right - \log(\alpha t)) = \frac{1}{n} - \frac{1}{2n^2} = \frac{1}{n}\left(1 - \frac{1}{2n} \right) > 0.

It follows that if \log(\alpha t) is sufficiently large that the quantity in the parentheses is negative for some n \geq 1, then we can optimize the approximate tail bound by setting n to be the root of the quantity in the parentheses.

If 2 \alpha t > e, then the root is given by the Lambert W function:

 \displaystyle
n = \frac{-1}{2W\left(\frac{-1}{2\alpha t}\right)}.

Putting it all together, the moment method gives the following tail bound for a random variable X \sim \text{Exp}(\alpha):


\displaystyle
\mathbb{P}(X > t) \leq \sqrt{\frac{-\pi}{W\left(\frac{-1}{2\alpha t} \right)} } \left(-2 \alpha t e W\left(\frac{-1}{2\alpha t}\right) \right)^{\frac{-1}{2W\left(\frac{-1}{2 \alpha t}\right)}}, \quad \text{for } t > \frac{e}{2\alpha}.

How does this compare to the actual bound \mathbb{P}(X > t) \leq e^{-\alpha x}? Just eyeballing it, it looks like asymptotically, the moment bound gives you an exponential tail bound with a worse constant. If true, that’s pretty amazing!

Semilogy plot of the analytic and moment method tail bounds for an Exp(1/2) random variable

Semilogy plot of the analytic and moment method tail bounds for an Exp(1/2) random variable

Hmmm. On second thought, this is only amazing because I forgot I already worked this out in a more general setting.

Possibly relevant posts:

Tags: ,
Jul 29th, 2010 | Filed under Mathematics

Online note taking?

Occasionally I’ll find myself looking for some way to pass a note to myself. Say I’ll come across a reference to a paper while using the internet from home, and want to store the link until I can access the paper through the school’s online subscriptions. When it was free, I used backpackit.com, although it was a hassle to have to login and deal with the structure they imposed on my notes. Lately I’ve been using google docs, but that involves a lot of overhead to login and then navigate to the documents I use for note taking, and the interface doesn’t make note taking a comfortable experience.

Tonight I discovered a new site, www.aypwip.org/webnote, which is almost perfect. To access your note workspace, you simply enter the name at the front page; there’s no validation needed, since you could ostensibly give your workspace an obscure or random name. If the site supported embedded pictures, live links, and LaTeX, it would be perfect for my needs. The source code is also available, so you can roll your own modifications in.

Possibly relevant posts:

No tags for this post.
Jul 14th, 2010 | Filed under General
Tags:

Question about change in the location of the spectrum of a fixed complex matrix, under a random pinching

Let \sigma(A) denote the spectrum of an invertible matrix. Here’s the question: if \sigma(A) is contained in the upper open half plane and P is a projection matrix, then is \sigma(PAP) contained in the upper closed half plane?

This is turning out not to be a straightforward problem (or maybe I just don’t see the easy way to go about this), so I ran a dozen or so Mathematica experiments with random A and P. The results suggest that the implication is true, but it may be the case that it’s simply true with high probability for random matrices:

when A is a 1000x1000 random matrix and P is rank 500, you see the eigenvalues remain in the upper half plane.

In this example A is 1000×1000, P is rank 500, red points are in the spectrum of PAP, and blue are in the spectrum of A. For smaller rank projections, the red rectangle is still centered, but smaller. It’d be interesting to quantify the behavior of the spectrum for random P but fixed A … but let’s not get off track.

BTW, I *did* run the simulation for smaller sized A in case the ‘everything performs better with higher dimensional random matrices’ deal was going on, and it doesn’t look like that’s the case.

Possibly relevant posts:

Tags: , , ,
Jul 14th, 2010 | Filed under Mathematics

Puzzled by holomorphic functions of a matrix

Ok, explain this to me: Epstein gives the following definition of what it means to take f(A) when A is a symmetric matrix whose spectrum lies in the domain of holomorphiticity of f (if that’s the correct phrase to use)

 \displaystyle f(A) = \frac{1}{2\pi i} \oint_{\mathcal{C}} f(z) (z-A)^{-1} \,dz,

where \mathcal{C} is a contour surrounding the spectrum of A.

Then he defines z^\alpha for 0 < \alpha <1 on the cut plane \{z \,:\, \text{Im }z \neq 0 \text{ or } \text{Re } z > 0\} by the formula


\displaystyle
z^\alpha = \frac{\sin \alpha\pi}{\pi} \int_0^\infty t^\alpha \left(\frac{1}{t} - \frac{1}{t+z} \right)\, dt

and immediately turns around and states that this implies that if C is positive then


\displaystyle
C^\alpha = \frac{\sin \alpha\pi}{\pi} \int_0^\infty t^\alpha \left(\frac{1}{t} - (t+C)^{-1} \right)\, dt.

How, if at all, does this follow from his original definition? And where can I read more about the details of this kind of complex analysis on C^\star algebras? ( I’m only interested in the matrix case, but general theory is better than none)

Update:
I think I get it now. I think the issue is that z^\alpha is not uniquely defined in general, so the integral expression is used to get a unique analytic continuation on the cut plane, then you can proceed as usual:


\displaystyle
f(A) = U f(\lambda(A)) U^\star =
 U \begin{pmatrix}
\lambda_1^\alpha & & \\
 & \ddots & \\
 & & \lambda_n^\alpha
\end{pmatrix} U^\star

\displaystyle
= U \begin{pmatrix}
\frac{\sin \alpha\pi}{\pi} \int_0^\infty t^\alpha \left(\frac{1}{t} - (t+\lambda_1)^{-1} \right)\, dt & & \\
 & \ddots & \\
 & & \frac{\sin \alpha\pi}{\pi} \int_0^\infty t^\alpha \left(\frac{1}{t} - (t+\lambda_n)^{-1} \right)\, dt
\end{pmatrix} U^\star

\displaystyle
= \frac{\sin \alpha\pi}{\pi} \int_0^\infty t^\alpha \left(\frac{1}{t} - (t + A)^{-1} \right)\, dt

The only noteworthy part of this is that the definition of f(A) he gives is equivalent to the usual diagonalization definition which is easier to work with.

Possibly relevant posts:

No tags for this post.
Jul 13th, 2010 | Filed under Mathematics
Tags:

Cumulants of projected random matrices are projected matrices

Consider a fixed projection P and a random matrix X, then you’d expect that \log(\mathbb{E}\exp(PXP)) = P\tilde{X} P for some \tilde{X}. Here’s a proof:

First, use a taylor series expansion to get the recursive relationship


\displaystyle
\begin{aligned}
\mathbb{E} \exp(PXP) & = I + \sum_{k=1}^\infty \mathbb{E} \frac{(PXP)^k}{k} \\
&  = I + \sum_{k=1}^\infty P \mathbb{E} \frac{(PXP)^k}{k!} P \\
& = (I-P) + \sum_{k=0}^\infty P \mathbb{E} \frac{(PXP)^k}{k!} P \\
& = (I-P) + P \mathbb{E} \exp(PXP) P \\
& = P_\perp I P_\perp + P X_p P \\
& = \beta_\perp \beta_\perp^\star + \beta\beta^\star X_p \beta \beta^\star.
\end{aligned}

In the final line, X_p = \mathbb{E} \exp(PXP) and P = \beta \beta^\star where \beta is an n \times \text{rank P} matrix with orthonormal columns, given by the SVD of P. Likewise P_\perp = \beta_\perp \beta_\perp^\star.

This decomposition implies that \mathbb{E} \exp(PXP) is block diagonal in the basis given by \beta, \beta_\perp:


\displaystyle
\mathbb{E} \exp(PXP) = \begin{pmatrix} \beta & \beta_\perp \end{pmatrix} \begin{pmatrix} \beta^\star X_p \beta & 0 \\ 0 & I \end{pmatrix} \begin{pmatrix} \beta^\star \\ \beta_\perp^\star \end{pmatrix}.

Take the log of both sides and factor out the unitary matrices to see


\displaystyle
\begin{aligned}
\log(\mathbb{E} \exp(PXP)) & = \begin{pmatrix} \beta & \beta_\perp \end{pmatrix} \begin{pmatrix} \log(\beta^\star X_p \beta) & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} \beta^\star \\ \beta_\perp^\star \end{pmatrix} \\
 & = \beta \log(\beta^\star X_p \beta) \beta^\star,
\end{aligned}

so


\displaystyle
\begin{aligned}
P \log(\mathbb{E} \exp(PXP)) P & = \beta \beta^\star \beta \log(\beta^\star X_p \beta) \beta^\star \beta \beta^\star \\
& = \beta \log(\beta^\star X_p \beta) \beta^\star \\
& = \log(\mathbb{E} \exp(PXP)).
\end{aligned}

Possibly relevant posts:

Tags: , , ,
Jul 12th, 2010 | Filed under Mathematics