spectral graph theory and two CS applications

May 4th, 2008

I just came across the initial lecture for a course on spectral graph theory that did a great job of illustrating how interesting the graph laplacian is. You might’ve heard of the Fiedler vector– its magical properties are what I was searching for an explanation for–, but did you know that the top two eigenvectors of planar graphs (sometimes, most of the time?) form the coordinates of nice planar embeddings? In the same vein, the bottom eigenvectors can provide good colorings. Two incredible facts that have yet to be explained.

The video lectures that inspired.me to look up spectral graph theory are pretty interesting in their own right. The first explained a simple but apparently powerful technique for edge-preserving image smoothing based on the heat equation on graphs. The second was about a seed-based technique for image segmentation based on random walks on graphs– this looks very powerful, but the lecturer is awfully boring; it may be more interesting to just peruse his publications.

Research agenda

May 2nd, 2008

Yesterday Tropp gave me three problems to work on over the summer. The first one concerns sparse approximation: there’s general interest in finding probabilistic schemes for coming up with a sparse, quantized matrix X which approximates a given matrix A in some sense; he gave me two papers to read on previous approaches– A Fast Random Sampling Algorithm for Sparsifying Matrices, and Fast Computations of Low-Rank Matrix Approximations– and a preprint of a result he derived which gives a bound on the approximation error for any general scheme. My job is mainly to compare and contrast the performance of these difference schemes and the attendant error bounds, to see for what schemes and properties of A his bound is tighter than the others. This will be my main concern for the next couple months.

The second and third are more theoretical. One is to find a bound on the expected Ky Fan k-norm of a standard Gaussian matrix, and the other is to find a bound on the expected value of \| R_k F R_l \| where F is a DFT matrix and R_k,R_l are random restrictions to k,l coordinates. Sweet stuff!

Random Matrix studies

April 21st, 2008

I’ve been studying (nonasymptotic) random matrix theory from Vershynin’s lecture notes, which are pretty awesome despite the typos and at least one flawed argument. After just the first three lectures, I was able to almost construct a nice proof of the Johnson-Lindenstrauss lemma — there was one detail that eluded me.

The techniques used are simple:

  • Markov’s inequality in its various guises (especially as the “Laplace transform method”);
  • the interchangeability of tail bounds, moment bounds, and existence of characteristic functions;
  • union bounds;
  • concentration of measure on the sphere and in gaussian space;
  • and, the most sophisticated (if you take the isoperimetric arguments that give you concentration of measure as givens), chaining arguments that exploit the link between metric spaces and the deviations of random variables attached to points in the spaces.

If you’re interested in seeing how some of those magical ‘with high probability’ results are derived, I strongly recommend those notes. In fact, now that I’ve seen some of these types of results, a little bit of the mystery of Vapnik’s results linking training error to overall risk using the VC dimension has evaporated– it seems like the type of result you could get using these techniques.

Overall, the magic of these methods is two-fold: first, that you can get any practical results from Markov’s inequality, and second, that concentration of measure exists. It’s fascinating and beautiful stuff!

As a concrete example of these principles in application, here’s a problem I’m working on today:

Let \Phi be an m \times N random gaussian matrix. Show that \frac{1}{\sqrt{m}} \Phi has the restricted isometry property of order s, with \delta_s = \frac{1}{2}, when m \geq s \log\left(\frac{N}{s}\right).

The RIP is useful in compressed sensing applications, where it says that \Phi will act as an almost-isometry on vectors that are s/2 sparse.

I Like You

April 4th, 2008

Amy Sedaris, from Strangers with Candy, wrote a book on hospitality– who would have guessed that she is a frequent hostess?– and etiquette. I Like You: Hospitality under the Influence isn’t your conventional how-to-be-a-host guide: it doesn’t suggest you put on airs, subtly or otherwise; she talks about hospitality from her personal experiences… she lives in an apartment with a rabbit, and keeps a tips jar present at her parties, so that’s saying a lot.

Even more surprising, it actually offers good advice, along with the expected humor; sometimes she even manages to mix the two in what might be a worthy idea dressed up as deadpan humor, or just good deadpan humor.

A good trick is to fill your medicine cabinet with marbles. Nothing announces a nosey guest better than an avalanche of marbles hitting a porcelain sink. Plus you’ll know which guest is a junkie whore or gutter hype, and you’ll know what else to hide. Count your stash or remove the labels from your prescription bottles.

Farewell 202

April 2nd, 2008

Ha ha! I just finished my comment for the student evaluation survey for CDS202, the course I’ve been complaining about for way too long. I spent 45 minutes crafting this, because I’m sure he’s heard essentially the same complaints the entire time he’s been teaching the course, and yet, from what I can determine, has done nothing to address them. Hopefully I got my message across.

I struggled with this course, almost entirely due to the choice of text: for several reasons, MTA was a bad choice.

MTA’s most egregious deficiency, given the introductory nature of the course, is that the intuition behind the definitions and theorems are unexamined, making it hard to integrate the new concepts. While Marsden did an excellent job of motivating what concepts he covered in class, there was too much material for him to provide such context for most of the reading– that made it a bitch trying to absorb the 80 or so new pages of material each week.

Also, MTA provides more generality (e.g. splitting subspaces and other infinite-dimensional considerations) than is optimal given the pace and length of the course– ultimately, I ended up referring to several of the recommended texts to get a handle on the ideas, and only looking in MTA for the notation conventions and the homeworks.

I really can’t stress how much of a chore it was to read MTA; the combination of high generality, absence of intuition, and brevity of exposition– which make for a good encyclopedic reference– made it a monotonous, unrewarding choice as an introductory text.

If Marsden wrote a book that captured the same kind of intuition that made his lectures a pleasure to attend, it would be a good choice for the course text. Until then, I think MTA should be relegated to a supplementary role; students would better gain an appreciation for and facility with differential geometry from a more introductory book, such as Warner’s.

I’m stumped– you give it a try

April 1st, 2008

I’m going to add problems as I get stumped.

  1. A bilinear functional on a hilbert space is said to be elementary if there are vectors x,y such that F(u,v) = \langle x, u \rangle \langle y, v \rangle. Suppose x,y,z are linearly independent vectors, what are necessary and sufficient conditions on w so that F(u,v) = \langle x, u \rangle \langle y, v \rangle + \langle z, u \rangle \langle w, v \rangle is elementary?
  2. Let \{ A_\alpha \} be a family of mutually commuting operators. Then there is a common Schur basis for \{ A_\alpha \}– that is, there exists a unitary Q such that for all \alpha, Q^\star A_\alpha Q is upper triangular.
  3. The elementary tensors x\ocirc \cdots \ocirc x, with all factors equal, are all in the subspace \vee^k H. Do they span it?
  4. If \dim H = 3, then \dim \otimes^3 H = 27, \dim \wedge^3 H = 1 and \dim \vee^3 H = 10 . In terms of an orthonormal basis of H, write an element of (\wedge^3 H \oplus \vee^3 H)^\perp

On the 4th question, I could construct one such element, but I’m trying to get the form of a general element. I’m stuck on finding a basis for (\wedge^3 H)^\perp– so far I know e_i \otimes e_j \otimes e_k are in it if i,j,k aren’t distinct, and e_i \otimes e_j \otimes e_k - e_j \otimes e_k \otimes e_i are in it if i,j,k are distinct–, and I imagine finding a basis for (\vee^3 H)^\perp is even more painful.

Convex Optimization and the Barrier Method

April 1st, 2008

I wrote this attempt at an equationless explanation of the principles of convex optimization a while back, partly because I promised it, and partly because I’d intended to contribute to the Carnival of Mathematics. I had it sitting in the queue, intending to revise and fact check it before posting, but … it seems like I’m going to be shitting work for a while, so here it goes with caveats. It’s based on an optimization course I took two terms ago.

“…in fact, the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity” - R. Tyrrell Rockafellar, in SIAM Review, 1993

Convex optimization problems take the form of minimizing some convex cost function f_0 over a feasible set defined as the points which satisfy m inequalities f_i(x) \leq 0 where f_i are convex, and p affine equalities a_i^Tx + b_i = 0. Standard linear programs are one example of convex optimization problems, and the most powerful general class of convex optimization problem that I saw as a standard ‘class’ of programs in the course is the semidefinite program, to minimize an affine cost function given linear equalities Ax = b and the component wise inequalities F(x) = F_0 + x_1 F_1 + \ldots + x_n F_n \leq 0 where the F_i are symmetric matrices. Semidefinite programs (SDPs) subsume several popular classes of convex optimization problems.

Newton’s algorithm for the unconstrained minimization of differentiable objective functions is extremely easy to use: you construct a minimizing sequence x_{k+1} = x_k - t_k \left(\nabla^2 f\right)^{-1} \nabla f(x_k) , and as  k \rightarrow \infty, x_k \rightarrow x^\star, a minimizer of the cost. Of course, this is given some limiting assumptions on your function. The first cool thing about convex problems is that convergence to such a minimizer is guaranteed, and under some stronger but reasonable conditions, the minimizer is unique.

Constrained convex optimization problems, say minimizing f_0 subject to the inequality constraints f_i(x) \leq 0 can be approximated by unconstrained convex problems of the form f_0(x) + 1/t \sum_i I(f_i(x)) where I/t is a smooth convex barrier function which approximates the indicator function  f(x) = 0 when x \leq 0 and  f(x) = \infty when  x > 0, and such that the approximation improves as t \rightarrow \infty. Intuitively, then, as t \rightarrow \infty , the solution to the unconstrained convex optimization problem will approach that of the original. The problem is, given that you want an answer that is accurate to within \epsilon of the true optimal cost, how do you determine t? Is it even possible?

Digression to cool fact number two about convex optimization problems– there exists a dual problem to the primal problem, namely the problem of maximizing a dual function g. This dual function is concave, so in the same way that a convex function always has a minimum value, g always has a maximal value. Furthermore, this maximal value is pretty darn useful, in that it provides a lower bound on the optimal value of the primal problem. If certain technical conditions are satisfied (and they usually are, from what I’ve seen, in practical problems), the optimal values of the primal and dual problems are identical.

Now we have all the pieces to construct an interior point solver for convex optimization problems. First, we need a starting value of t and a strictly feasible (all the inequality constraints are strictly satisfied) starting guess for x, x_0, then we solve the first approximating unconstrained optimization problem using Newton’s method. We then increase t and solve the next approximate optimization problem, and repeat until we decide that the solution to the approximation problem is close enough to the optimal value of the original problem. How do we do this? Well, we choose our I cleverly so that each guess x_k also gives us a guess for a feasible point in the dual problem \lambda_k. Since we know that g(\lambda_k) is an underestimate of the optimal value and f_0(x_k) is an overestimate, we can bound the error at each step, and simply stop when this error has decreased sufficiently. This barrier method is one of a class of interior point methods– so called because the constructed minimizing sequence is strictly in the interior of the feasible regions, as opposed to say the simplex method, where the minimizing sequence is on the vertices of some polyhedron. The path x_k traced out by the successive approximations is known as the central path.

Algorithmically and theoretically, interior point methods are impressive. Theoretically because they show that not only can a linear program be solved in polynomial time, all convex optimization problems can (at least in principle– I imagine it might be difficult to construct appropriate barrier functions for different classes of programs). Algorithmically because they allow you to solve convex optimization problems in just roughly 40 times the amount of time it’d take to solve a least squares problem of equivalent size (the cost of the algorithm is dominated by computation of the product \left(\nabla^2 f\right)^{-1} \nabla f, and it takes only about 40 newton steps to solve a convex problem, largely independent of the problem size).

Pretty incredible! For more details, check out Boyd and Vandenberghe’s freely available textbook on convex optimization.

Progress on the advisor front

April 1st, 2008

I finally sent an email off to Schroeder, who I had expressed an interest in working with before I decided to work with Tropp. He gave me a paper to read and invited me to two of his group meetings– in the first he developed a notion of discrete conformal mappings, and in the second a visitor gave a lecture on an ingenious geometric method for constructing approximate isometries (for applications like texture mapping, or position interpolation in animations). Unfortunately, I could not follow the paper, and the point of it bored me– something about constructing bases for forms on discrete manifolds so exterior differentiation and subdivision commute?– even though a lot of what his group does is right up my alley. Luck of the draw.

I could have inquired after other potential research directions (something like the approximate isometries idea would really capture my interest), but I think that working with Tropp is better in the long run– because the mathematics involved is foundational (in the sense of arising in the posing of many problems), there’s more potential for engaging deep but tractable mathematical problems, and a wider field that I could tap for future avenues of research. I feel the mathematics Schroeder’s group uses is, while slightly more asthetically pleasing to me, somehow less essential.

Anyhow, today I hope to get the final details of my courses for this term settled. Most importantly, when my reading course with Tropp is scheduled, and what I should get started on reading.

Bhatia’s Matrix Analysis, Chapter 1

March 31st, 2008

I’ve been reading the first chapter of Bhatia’s “Matrix Analysis” to review (multi)linear algebra, and it’s been a pleasure. In just four pages or so, he proves the existence of QR decompositions and Schur decompositions, gives the Cholesky Decomposition as an exercise, proves the Singular Value Decomposition, and gives the Polar decomposition and its equivalence to the SVD as an exercise. The brevity of the proofs are the consequence of them being the most straightforward arguments I’ve seen for proving these decompositions. The exercises which occur inline with the text are challenging, but doable; as an examples

If A is a contraction, show that A^\star (I - AA^\star)^{1/2} = (I-A^\star A)^{1/2}A^\star.

All in all, this is a great chapter– I can’t wait to work through the problems at the end.