Archive for June, 2006

Category Theory: useful or not?

Monday, June 26th, 2006

What is category theory good for, practically speaking? From what I understand of it, it allows you to make overarching statements about mathematical structures that are in a sense just semigroups: sets with associative binary relations. Probably there are some other technical requirements, but that’s the essence, right? So you can now treat Top, the category of topological spaces with homeomorphisms as the binary relation, as a category, and Grp, the category of groups with group morphism as the binary relation, as a category– look at them as different instantiations of the same type.

Why do you care? It seems like there’s a slight chance you could find some categorical theorem that could be translated into a nontrivial result in multiple categories, but I’ve never heard of it happening; I suspect there are two reasons for that: 1) recognizing a useful category theoretical result is probably hard, given the abstractness of the concepts involved, and 2) useful results seem (in my naivete, perhaps) to depend on the features that are abstracted away by category theory. Not that I’m saying such a useful result hasn’t been found, just I haven’t heard of it, which leads me to think if such results exist, they are rather technical and nonprofound.

Consider the first line from the introduction to Saunders Mac Lane’s ‘Categories for the Working Mathematician’:

Category theory starts from the observation that many properties of mathematical systems can be unified and simplified by a presentation with diagrams of arrows.

Is that supposed to motivate further study? … because I’m not feeling it.

Markov Random Fields

Monday, June 26th, 2006

I’ve been reading a lot on stochastic processes lately, markov random fields in particular. This morning I started reading the seminal paper in using Gibbs sampling / simulated annealing for image processing, by Geman and Geman; it’s available for download from one of the authors’ site (entitled ‘Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images’). The first several sections do a good job of explaining what all the terms involved mean, and their usefulness– I’d say it’s on par with any modern introductory book in the area (at least the ones I’ve come across). But that’s the simple stuff, and this paper quickly jumps to the convoluted mathematics…

Positive definiteness of a certain matrix

Thursday, June 15th, 2006

Let  0 < t_1 < \cdots < t_n . Then why is A positive definite when A_{ij} = \min(t_i, t_j) = t_{\min(i,j)}? I think it is ‘because’

\displaystyle A =
\begin{pmatrix} t_1 & t_1 & \ldots & t_1 \\
                           t_1 & t_2 & \ldots & t_2 \\
                                 &  & \vdots & \\
                           t_1 &  t_2 & \ldots & t_n
\end{pmatrix}

can be written as the sum of an upper triangular matrix that is positive definite and a strictly lower triangular matrix that is positive semidefinite. But then I have to prove those statements…

Musings on the matrix problem

Thursday, June 15th, 2006

In the last post, I asked if there exist square matrices A,B such that AB-BA = I. This came up in class just after we discussed the spectral mapping theorem, so I was attempting to use that to attack this, but it turns out (as Shikha pointed out) the easiest way is to just look at the trace of both sides. I still haven’t figured out a proof using spectrums, but Dr. Singh assures us the proof will be straightforward once we see a little more machinery.

I found out something interesting this morning, while browsing through ‘Vector Calculus, Linear Algebra, and Differential Forms’ by Hubbard and Hubbard. Recall that the Frechet derivative of a function f : X \rightarrow Y at x\in X, where X,Y are normed linear spaces, is the unique linear transformation Df(x) : X \rightarrow Y satisfying

\displaystyle \lim_{h\rightarrow 0} \frac{\|f(x+h)-f(x)-Df(h)\|_Y}{\|x\|_X} = 0.

You can show that if you consider M_n (the set of square matrices of order n) as a normed space and look at the squaring function S: M_n \rightarrow M_n defined by S(A)=A^2, then

 [DS(A)](B) = AB - BA.

This shines a new light on the fact that there are no A,B such that AB-BA = I, but what are the consequences of this fact on DS and S?

On a not-so-related note, the same book shows a nice generalization of the derivation of the convergence of geometric series to matrices. If \|A\| < 1 and S_n = \sum_{k=0}^n A^k, then S_n \rightarrow (I-A)^{-1}. Pretty neat!

(In all the arguments above, they used the Frobenius norm– I don’t think that matters, however, since M_n is finite-dimensional, so all norms are equivalent.)

Spectral Mapping Theorem

Wednesday, June 14th, 2006

Let A \in M_n(\C) be a square matrix of order n, and p a polynomial, then show \sigma(p(A)) = \{p(\lambda) : \lambda \in \sigma(A)\}, where \sigma(A) is the spectrum of A– the set of eigenvalues. This result is known as the spectral mapping theorem, and can be written more concisely as

\sigma(p(A)) = p(\sigma(A))

One inclusion is trivial (\Leftarrow), the other is trickier.

Now use the spectral mapping theorem to characterize the spectrum of an idempotent matrix A (i.e., A^2 = A).

Here’s a harder one: can we have matrices A,B \in M_n(\C) such that AB - BA = I?

Markov Chains

Friday, June 9th, 2006

I gave a mini-seminar today on stochastic processes, and was surprised at how we got hung up on the very concept that we’re studying stochastic processes for: Markov processes. To be more specific, Markov chains; the question was, does the Markovianity property P(X_n | X_0, \ldots, X_{n-1}) = P(X_n | X_{n-1}) imply that X_n is independent of X_k for k = 0, \ldots n-2? I don’t think so…

The next thing I am working on is showing that stationarity and Markovianity do not contradict each other. This is obvious for one-dimensional finite state Markov chains– take the initial distribution to be the stationary distribution for the chain, but I don’t really understand anything but one-dimensional Markov chains.

Inner products

Wednesday, June 7th, 2006

I’m auditing a matrix theory class this summer. The first problem, assigned yesterday, was to find non-trivial examples of inner products in \R^2 and \R^3.

Recall a (real-valued) inner product on a vector space is a function \langle \cdot, \cdot \rangle: V \times V \rightarrow \R with the following properties:

  1. \langle x, y \rangle = \langle y, x \rangle (symmetry)
  2. \langle x, x \rangle \geq 0 and \langle x, x \rangle = 0 \Leftrightarrow x = 0 (positive definiteness)
  3. \langle x + y, z \rangle = \langle x, z\rangle + \langle y, z \rangle (linearity)
  4. \langle \alpha x, y \rangle = \alpha \langle x, y \rangle (linearity).

The canonical inner product in \R^n is \langle x, y \rangle = \sum_{i=1}^n x_i y_i ; you can show that in the case n=2, this satisfies the relationship \cos \theta_{xy} = \frac{\langle x, y \rangle}{\|x\| \|y\|} where \|x\| denotes the usual length of a vector. Similarly, for any inner product, if we define the norm of a vector appropriately, \|x\| = \sqrt{\langle x, x \rangle}, then we have  \left | \frac{\langle x, y \rangle}{\|x\| \|y\|} \right| \leq 1 , so the inner product supplies a concept of the angle between two vectors in an arbitrary vector space (you can take the angle to be arccos of that quotient).

The first result that came to me is that inner products on finite dimensional vector spaces are the bilinear maps determined by positive definite matrices, in the sense that if we take \mathcal{B} = \{ e_1, \ldots, e_n\}[tex] to be a basis of [tex]V, a real vector space, then by linearity, we have

\langle x, y \rangle = \sum_{i=1}^n \sum_{j=1}^n x_i y_j \langle e_i, e_j \rangle = x^t Ey,

where in the last equality, I’m identifying x and y with their representations as column vectors relative to \mathcal{B}, and  E_{ij} = \langle e_i, e_j \rangle . The fact that E is positive definite follows from property 2 of inner products. It is easy to see that a positive definite matrix A determines an inner product, hence the two objects (inner products and positive definite matrices) are in a sense equivalent.

I don’t know much about positive definite matrices in general, so I couldn’t find any more concrete conditions on matrices that would allow you to construct an inner product, but I was able to find a sufficient condition in the case of \R^2:
if A = \begin{pmatrix}a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}, then it is sufficient that a_{11}, a_{22} > 0 and a_{11},a_{22} > |a_{12} + a_{21}| for A to be positive definite and therefore determine an inner product.

But that is a pretty strong condition, in my opinion, and I don’t see any obvious way to extend it to n>2. Shihka had the idea of just modifying the arguments to the standard inner product by applying an invertible linear transformation to get a new inner product:

\langle x, y \rangle_\prime = \langle Ax, Ay \rangle = x^tA^tAy

As you can see, this neat idea has the advantage of showing rather simply that A^tA is positive definite (is this easier than the standard proof?), and gives a quick way to generate nontrivial inner products in any finite dimensional vector space.

Now, the question is, is every positive definite matrix of the form A^tA for some invertible matrix A? That is, are there exactly as many inner products as invertible linear transformations, or more?

Probabilistic word problem

Wednesday, June 7th, 2006

I’ve been quiet for a while because I’ve been studying probability, with an eye towards learning Markov fields and the Gibbs sampler, so I’ll be able to contribute to the new tissue model. This is not as easy as I thought– definitely not just another branch of measure theory– so I’ve been consulting several different books. Here’s a problem from one that I posted to the math group’s mailing list:

Mrs. Jones has made a steak and kidney pie for her sons. Eating more
than half of it will give indigestion to anyone. While she is away
having tea with a neighbor, the older son helps himself to a piece of
the pie. Then, the younger one comes and has a piece of what is left.
When Mrs. Jones returns, she finds that more than half of the pie is
gone.

Assuming that the size of each of the two pieces eaten is random and
uniformly distributed over what is currently available, what is the
probability that neither of Mrs. Jones’ sons will get indigestion?

If we let X be the amount of pie the older brother eats, and Y the amount the younger brother eats, the hard part of this problem is seeing that the density p(X=x,Y=y) is not constant.