Archive for July, 2006

The one-month gap

Thursday, July 27th, 2006

This Friday I officially stop working with Papadakis’ group. It’s been fun, and I’ll miss it, but I also recognize the need to move on, to clear my palate and rest it before the next meal. In fact, I probably should have made the cut sooner than this. I’ll still be around, since it seems I’m the only one who wants to prepare the Markov Chain Monte Carlo seminar :) Sometimes it seems the seminar is continuing solely based on inertia, but it’s fun to teach, and getting to help Shikha understand something is a wonderful feeling (because she’s so damn smart, otherwise). Since I read and reread, then write up notes to keep the presentation organized, and all the while worry about maintaining a motivation for every new development, I’m retaining more of this material than I would otherwise. As Papadakis pointed out, this is knowledge that can only serve me well in grad school. We’re up to the good stuff now: I believe I can cover simulated annealing and constrained optimization in the next seminar.

So, the question is: how am I going to spend my first and only month of free time for the past five years, and possibly the next five? I actually want to *not* do any math, or even think about it, but alas that is unrealistic. I will limit myself instead to reviewing vanilla material I should know for grad school– and considering where I’m going and what I’m majoring in, I have a pretty good idea of what that entails: ODEs and PDEs, complex analysis, probability and stochastic processes, real and functional analysis. These are the topics I will tango with for the next month. Other than that, I plan on exercising, doing a lot of hanging out with my friends before I leave, and very little spending of money. Oxymoronic? Maybe, but since I’m not working anymore, this is going to be a low-budget vacation. And of course, I will be planning for CalTech: mostly this entails figuring out what I’ll be eating, and deciding which books to take with me.

Diffusion maps and wavelets

Friday, July 21st, 2006

I just took a look at the latest issue of the Applied and Computational Harmonic Analysis Journal, and was surprised to see a seemingly out of place emphasis on Markov kernels. This month’s release is a special issue focusing on diffusion maps and diffusion wavelets, new research directions which hold promise for dimensional reduction, pattern recognition, and general dealing with high dimensional data. I couldn’t get any further than the first pages of the first paper, but the ideas sound genuinely new, exciting, and above all useful. The ingenious idea that seems to be at the heart of these methods is to represent your data using a symmetric graph structure with a symmetric, positive weighting kernel k(x,y) which encodes the affinity of your data points. Then you can associated a Markov kernel P with a random walk on this graph, where P(x,y) = k(x,y)/d(x) and d(x) = \sum_{y \in G} k(x,y) . Then P encodes information on the local geometry of the graph, and this information can be ‘diffused’ by taking successive powers of P: the authors point out this can be viewed as an analog of the process of integrating a differential equation to obtain global information from local information. Analysis of the eigenfunctions and values of this kernel gives you useful information on your data. As an example of the power of this methodology, consider the interpretation of clustering on a graph in the context of a random walk: a cluster is a region which the walk leaves with low probability. So if we use P^t to determine the subgraphs of our data sets which can be considered clusters, increasing t gives us larger(?) clusters. This seems like a generalization of the concept of MRAs.

Sweet stuff. See http://www.math.yale.edu/~sl349/tutorials/diffusion/diffusion_top.html for an online synopsis of the article.

Things I have forgotten

Tuesday, July 18th, 2006

You know how people sometimes say ‘I have forgotten more about X than you’ll ever know!’? I was just looking over my previous blog entries and realized that I’ve forgotten more over the course of my undergraduate career than I’ve retained. Which isn’t to say that I’ve learned nothing, just that of every 10 things I’ve learned I’ve forgotten about 7.

Examples: I used to love Perl and Scheme, but I’ve completely forgotten Perl and I would say I’ve completely forgotten Scheme, except its syntax is too simple to forget. I was once much more facile with reasoning about sequences and series, but now I tend to struggle with them. Most alarmingly, I just took a real analysis course a year ago, and I forgot about half of it already… most of the measure theory, and a good bit of the functional analysis/topological vector space stuff.

The problem as I see it is that I need to keep actively using knowledge, more so than most people I’ve met, or I’ll lose it. Probably a side effect of the amount of reading I do. Apparently, reading math books can be bad for you.

A Scientist’s faith

Thursday, July 13th, 2006

Where someone failed, another has succeeded; what was unknown in one century, the next has discovered; science and the arts do not grind themselves into uniformity, but gain shape and regularity by carving and polishing repeatedly […] What my own strength has not been able to uncover, I cease not from working at and trying out and, by reshaping and solidifying this new material, in moulding and heating it, I bequeathe to him who follows some facility and make it the more supple and malleable for him. The second will do the same for the third, which is why difficulty does not make me despair, nor of my own weakness…

Montaigne, Les Essais, Livre II, Chapitre XII.

A role model

Wednesday, July 12th, 2006

I watched Nova last night: the episode was about Einstein’s search for the grand unified theory and how quantum theory pretty much invalidated his approach to finding a theory of everything, and how string theory represents a possible answer to the question of how to make gravity, which dominates the universe on the large scale, compatible with quantum theory, which dominates the universe on the small scale. I have to interject at this point– why is it that we celebrate Einstein’s theory of gravitation, saying it provides a ‘true’ understanding of gravity, when it doesn’t work at the subatomic level? How is that any more a true model of gravitation than Newton’s, when it too is just an approximation, albeit a much more accurate one? And does that mean that the spacetime continuum established in his theories are in turn just an approximation of some even more strange phenomenon? On an unrelated note, it seems that Einstein’s wife was also a brilliant physicist, and may have played a nontrivial supporting role in the development of his theories.

But, these points and a reawakened interest in physics aside, there was another interesting aspect to that episode: one of the talking heads was an articulate, black, theoretical physicist. How many black physicists have you seen lately, much less theoretical physicists who moonlight as popular science expositors? I tip my hat to Dr. Sylvester James Gates, Jr. of the University of Maryland, a truly bad-ass black man.

Cholesky decomposition

Tuesday, July 11th, 2006

I just learned about the Cholesky algorithm for testing whether a matrix  A \in M_n is positive/ factoring a positive matrix in terms of a lower triangular matrix  L \in M_n as  A = LL^\star . This is something we should have learned in numerical methods for engineers– it seems extremely useful from what I’ve seen from just browsing the internet, and listening to the lecturer. In particular, the matrix that I was trying to show is positive definite in an earlier post:

 \displaystyle A = \begin{pmatrix} t_1 & t_1 & \ldots & t_1 \\ t_1 & t_2 & \ldots & t_2 \\ &  & \vdots & \\ t_1 &  t_2 & \ldots & t_n \end{pmatrix},

in the special but representative case where t_i = i, has a very simple Cholesky decomposition: L is the lower triangular matrix of ones (with ones on the diagonal also).

2d convolution via frequency domain multiplication

Tuesday, July 11th, 2006

I spent this morning tangoing with convolution and the discrete fourier transform. My goal when I first came in was to see if applying a linear filter to a Markov field gives you another Markov field (intuitively, it seems like it should give you one with neighborhoods of larger width). Since applying a linear filter is the same as convolution, I needed a nice way to handle the application of an energy function to convolution products– this seemed like the good time to apply the oft quoted ‘convolution is multiplication in the frequency domain’ trick: since we’re dealing with finite everything (the field is on a finite graph, and has finite states), I could use matrix multiplication to express the convolution.

Surprisingly, I couldn’t find a single reference which had a matrix formulation of the DFT; I’m recording it here for anyone who might find it useful. Let \omega = exp^{-j \frac{2\pi}{n}} be the n-th primitive root of unity, and define the Fourier matrix \Omega by \Omega_{ik} = \omega^{(i-1)(k-1)}, then the (2d) DFT of the matrix A \in M_n(\C) is

 \hat{A} = \Omega A \Omega

and the IDFT is

 \check{A} =  \frac{1}{n^2} \overline{\Omega}A\overline{\Omega}.

That is one step in finding a matrix formulation of convolution. Assume we have convolution kernel F \in M_m and we’d like to calculate F \star A where A \in M_n is an image, and  n > m as is usual. If we naively take the DFTs, then \hat{F} and \hat{A} are not compatible, but notice that convolving with F is the same as convolving with a bigger kernel \mathcal{Z}_n(F), where we define the zero-padding operator to satisfy

\mathcal{Z}_n(F)} = \begin{pmatrix} F & 0_{n-m} \\ 0_{n-m} & 0_{n-m} \end{pmatrix} \in M_n.

A further point to notice is that we need to worry about aliasing, since we are dealing with discrete sums of frequencies. In particular, if when we multiply in the frequency domain we get frequencies (u,v) that don’t fit in the range {0, \ldots, n-1}^2, when we attempt to reconstruct, we’ll get higher frequencies masquerading as lower frequencies. The solution is to see that the highest frequencies you can get are the sum of the highest frequencies in both the kernel and the matrix. So if you zeropad the two matrices appropriately and then perform the frequency domain multiplication, you should be set.

Programming language semantics

Monday, July 10th, 2006

I have been trying to read some stuff on exact computation with real numbers, mostly stuff by Edalat and Potts, off and on for the past couple of years. My latest attempt was Potts’ PhD thesis, which uses some concepts from category theory and domain theory. So I figured I might as well bite the bullet and pick that stuff up. Hence the last post on category theory. I’ve since admitted to myself that pure category theory is entirely too abstract for me at this point, so I’m approaching it in a piecemeal way through slightly less esoteric applications, at least until the subject becomes more ‘naturally motivated’ to me. The best such approach seems to be through computer science, specifically in the application of category theory to the study of the semantics of computer languages (i.e. in forming formal models of the meanings of computer programs).

I have virtually no knowledge in the area of programming language theory– the award for most theoretical computer science book I’ve ever read goes to either the Dragon book or Dick Grune’s parser book; neither of which I’ve made significant dents in– but so far, I’m not doing too bad at understanding this semantics business. I almost read the entire first chapter last night, just two pages remaining. As usual with unfamiliar material however, the early stuff is fuzzy, so I’m going to reread the chapter at least once before moving on.

This promises to be a long detour from my main intention, but it might turn out to be scenic. And who knows, maybe even useful in the long run.

Haggstrom vs. Winkler

Wednesday, July 5th, 2006

I’ve suddenly noticed that the ‘London Mathematical Society Student Texts’ are really quite good. As one case of particular relevance to me at this time, Olle Haggstrom’s ‘Finite Markov Chains and Algorithmic Applications’ is a superb introduction to markov chains. It is serving as a good supplement to Winkler’s more focused ‘Image Analysis, Random Fields and Markov Chain Monte Carlo Methods’. Haggstrom’s proof of the existence and uniqueness of invariant distributions is not as concise as I’d like– he uses coupling arguments instead of contraction coeffecient arguments– so I skipped it and referred to Winkler’s book for that instead. But by far that is made up for by his presentation of the Propp-Wilson algorithm for exact sampling. Winkler also covers that material, but I couldn’t understand even his motivating comments; Haggstrom does a much better job of making the ideas transparent.

Winkler bandies about the terms coupling from the past and coalescing in his introductory comments on exact sampling, but this just confuses the reader, because he/she hasn’t yet seen a concrete example of what coalescing or coupling means. I got the impression that he thought using these terms conveyed a sense of the motivation of the algorithm, but it didn’t– you can’t appeal to people’s intuitions by introducing terminology. On the other hand, Haggstrom introduces the subject without using any terminology. This is a general rule of good mathematical writing that other authors would do well to remember: introduce terminology only after the ideas have already been absorbed; generalize only after the specific is understood.