Archive for November, 2006

Probability tricks

Tuesday, November 28th, 2006

I’ve been working my butt off for the past several days trying to complete the latest stochastic processes homework, but that looks like a losing fight, so I’ve pretty much relaxed. Now I’m just doing it because every so often I learn a way to do something I thought was incredibly hard.

Example #1: Let V be a continuous random variable, and U be a random variable uniformly distributed on [0,t]; assume V and U are independent of each other. Calculate P[V(t-U) \leq x] where x is a fixed number. That is, find the cdf of the random variable V(t-U). I’m surprised it’s even possible to get a nice closed form solution to this, and sad to say the particular trick used to get it isn’t from me. It’s from one of the guys in the class, either Nathaniel or Yuan, or both. The basic idea is to:

  • Find the pdf of the random variable t-U
  • Use the independence of t-U and V to find the joint pdf of t-U,V
  • The clever idea is here: use a transformation of random variables argument to find the joint pdf of V(t-U), V
  • Integrate out V to get the marginal distribution for V(t-U)

The answer, after working out all this stuff, is P[V(t-U) \leq x] = \frac{1}{t} E[\frac{1}{V} \min(x, Vt)]. Amazing. What’s even more amazing is that this is only one incidental calculation on the way to solving a larger problem. Perhaps there was a way to avoid this unpleasantness altogether?

Example #2: If U_1, \ldots, U_k are independent uniformly distributed random variables on [0,1], then the pdf of the random variable \min(U_1, \ldots, U_k) is f(t) = k(1-t)^{k-1}. This one is easy to show, but nonetheless a neat result. In particular, it means the expected value of the random variable is \frac{1}{1+k}– not terribly intuitive.

Winter term courses

Sunday, November 19th, 2006

The registration period for winter term courses has begun. Here’s a list of the courses that I’m considering, in no particular order:

  • ACM 216: Markov chains, discrete stochastic processes, and applications.
  • ACM 118: Methods in Applied Statistics and Data Analysis
  • ACM 104: Linear Algebra
  • ACM 210: Numerical Methods for PDEs
  • EE 163: Communication Theory
  • CDS 202: Geometry of Nonlinear Systems
  • APh/EE 131: Optical Wave Propagation

I’m aiming to take all the probability related ACM courses, because I started getting interested in serious probability and stochastic processes over the summer, and what I’ve seen in class and read on my own this term has made it seem like an interest worth pursuing. The PDE course I would take, just because I know nothing about PDEs and they seem useful and interesting– plus there’s the possibility of me eventually working with stochastic pdes, which I know absolutely nothing about except that the idea is awfully cool. However, I don’t know if the numerical methods course will be an effective intro to the subject; I’ll have to ask around.

The linear algebra course is required, unfortunately. It puzzles me that you take numerical linear algebra before the linear algebra course. Admitted, the linear algebra course is almost definitely more technical, but the ordering still seems weird. The communication theory course and the optics course I’m not too jazzed about either, but I need to take an applied course, and those were the closest to interesting. The geometry course I would love to take to fulfill my applied course requirement, but I’m not sure I can slip that one by: even though it’s being offered by the control and dynamical systems option, it’s essentially a differential geometry class… not incredibly applied.

I did note as I was looking for classes that an awful lot of excellent, interesting courses that I’d love to take aren’t being offered next term.

Stochastic Integration

Wednesday, November 15th, 2006

Integration is one of those mathematical operations that should conform to common-sense: no matter what method of integration you’re using, no matter how high-falutin’, the area of a square should be the square of the length of its sides. And that’s the way its been up to now: Riemann, Lebesgue-Stieltjes, and Lebesgue integration… they’re all cut from the same cloth, all succeedingly more careful generalizations of the intuitive notion of measure.

Yesterday we were briefly introduced to a form of stochastic integration, namely that of integration of L^2(\R) functions with respect to Brownian noise:

 \int_0^t f(s) dB_s = \sum_{i=1}^\infty \langle \chi_{[0,t]} f, \varphi_i \rangle \xi_i

where \varphi is an orthonormal basis of L^2 and \xi_i is a sequence of i.i.d. N(0,1) r.v.s, so the integral is a Gaussian r.v. for each t. This definition does not seem to have much to do with the concept of measurement. Is this even well-defined? By the Riemann-Lebesgue lemma the sum converges, but apparently any o.n.b. can be used, so does changing it affect the integral? Despite the obscurity of its origins, I like the simple form of this integral.

Today, I found an online book which does a great job of motivating stochastic integrals. To abbreviate the example he gives, consider the problem of sending a probe to the moon. We’ll describe the state of the system as X(t) = (\ell, v, \rho)^t \in \R^9, where \ell, v, \rho are the location, velocity, and momentum of the probe. In the ideal case, we know X’(t) = (v, d\rho/dt, F(\ell, v, \rho))^t where F is the force applied to the probe, so we can solve for

 X(t) = X(0) + \int_0^t X’(s) \: ds.

In the real world, there will be random effects we can’t account for deterministically, such as minor fluctuations in the gravitational field, collision with debris, etc., all of which we will call noise Z(t) (note that this model doesn’t account for noise which is partially caused by the state X(t), noise is a function of t– I don’t know if this is a limitation of the theory, or just this example). Then we’d like to say that how much the noise affects the state of the system varies depending on the state of the system, giving the new equation

 X(t) = X(0) + \int_0^t X’(s) \: ds + \int_0^t b(X(s)) \: dZ(s).

Since the noise is random, Z(s) = Z(s,\omega) is in fact a stochastic process. Each elementary event \omega determines a different sample path. Therefore if Z is such that the Lebesgue-Stieltjes integral is defined on each sample path (cadlag?), we can actually solve the above equation, and use it to calculate useful quantities. For instance, if T(\omega) represents the time when the probe reaches the moon (\infty if it misses), then we can calculate the expected distance from the target point s, given a particular noise model:

 \mathbb{E}[\|s - X(T)\|].

Pretty cool!

The problem is, apparently in general the sample paths of stochastic processes are not nice, so Lebesgue-Stieltjes integration cannot be done. Then you need to break out the power tools

Eikonal equation

Tuesday, November 14th, 2006

The eikonal equation is a non-linear PDE related to the least action principle and the Hamilton-Jacobi equations of mechanics:

 |\nabla u(x)| = \frac{1}{F(x)}.

One application of the equation is describing the propagation of a wavefront through a medium where the local speed is a function of location F(x). In this case, the level sets of u correspond to the wavefronts at different times; that is, the set \{x : u(x)=t} is the wavefront at time t. The derivation of this equation is straightforward: let \sigma(x) represent the differential path of the light ray emitted from point x at time t, which reaches the new wavefront at time t + dt. This path is perpendicular to the wavefront it starts from, and has length |F(x) dt|, so

dt = \left|\frac{du}{d\sigma}(x)\right| = \left| \langle \nabla u(x) , \sigma \rangle\right| = |\nabla u(x)|\cdot |\sigma(x)| = |\nabla u(x)| \cdot |F(x)| \cdot dt \Rightarrow |\nabla u(x)| = \frac{1}{|F(x)|}

yields the eikonal equation.

Note that if F=1 identically and u=0 on the boundary of a region \Omega, then for x outside \Omega, the solution u(x) to the eikonal equation measures the distance from x to \Omega. At least, I think it should :)

Challenge

Friday, November 10th, 2006

It’s known that every square matrix is unitarily similar to an upper triangular matrix: A \in \C^{n \times n} \Rightarrow \exists Q, T: A = QTQ^\star . Prove, using linear algebra, that this decomposition cannot in general be computed with a finite number of basic matrix operations (mult, add).

This result is known, but I believe only as a corollary of a theorem in another branch of mathematics.

Another fall from grace

Monday, November 6th, 2006

Apparently it’s true: Ted Haggard is a polesmoking hypocrite. What kind of twisted mind does it take to make a profession out of self-hatred? Or perhaps he never really believed anything he said?

Republicans are really starting to make homosexuals look bad… just not quite in the way their fundie base would like. Who I’d really like to see fall flat on his face is James Dobson– the self-righteous, smarmy pseudo-psychologist who advocates the ex-gay movement and patriarchal marriage, among other wrong-headed notions.

Information Theory

Sunday, November 5th, 2006

I started reading Shannon’s A Mathematical Theory of Communication today. Who knew a seminal paper could be this well-written? It is much better for gaining an intuitive view of the subject than the information theory text I picked up earlier. Also, just about every page contains some titillating connection to a field outside of information theory proper; for example, he brings into play the characteristic equation from finite differences, a graph theoretic reformulation of the necessary conditions for a process to be ergodic, and statements about how the amount of redundancy in a language determines the possibility of forming large crossword puzzles. Apparently English has just enough redundancy at 50% to make large crossword puzzles feasible.

Full-rank underdetermined systems

Thursday, November 2nd, 2006

Usually when we think of non-square systems of linear equations, we think of skinny full rank matrices: i.e., of systems of the form

\displaystyle Ax = b, \quad A \in \C^{m \times n}, m > n, \text{rank}(A)=n.

One way of solving these systems is to use the (reduced) QR factorization, which decomposes A into a matrix Q \in \C^{m \times n} which has orthonormal columns, and an upper-triangular invertible matrix R \in \C^{n \times n}. Intuitively, the Q matrix is the result of applying the Gram-Schmidt orthonormalization process to the columns of A, and R combines the columns of Q to get back A. There is also a full QR factorization, A= Q_f R_f where Q_f \in \C^{m \times m} is unitary, and R_f \in \C^{m \times n} is upper triangular. A full QR factorization can be obtained from a reduced one by completing the basis in Q, and adding m-n zeros to the bottom of R.

Something really cool, that I learnt today while doing my linear algebra homework, is that the QR factorization is useful for solving full-rank underdetermined systems as well. That is, suppose your problem is now
\displaystyle Ax = b, \quad A \in \C^{m \times n}, m < n, \text{rank}(A)=m.

Then this has at least one solution, but there may be multiple ones. What if you’d like to select the solution with minimum Euclidean norm? The idea is that A^\star is a full-ranked skinny matrix, so has a QR decomposition: A^\star = QR.

Then minimizing \|x\| given Ax = b is the same as minimizing it given x^\star Q R = b^\star, which is the same as minimizing it given Q(R^\star)^{-1} b = QQ^\star x (since Q^\star Q = I, remember Q has orthonormal columns). But then QQ^\star is a projection, so the x with minimum length that satisfies this equation is just QQ^\star (Q(R^\star)^{-1}b) = Q(R^\star)^{-1}b . So the minimum length vectors that satisfies Ax = b is simply Q(R^\star)^{-1}b . Pretty neat.

Real Analysis

Wednesday, November 1st, 2006

Today we started the second part of the applied functional analysis class: real analysis in \R^n. We jumped right in at Lebesgue outer measure, but considering that it’s on the syllabus, hopefully we’ll come back later after we develop Lebesgue theory to develop a more general theory of measure and integration. I like this approach: start with Lebesgue integration in \R^n, then do the abstract stuff. This is more effective than say Foland’s approach: do abstract integration, use it to establish Lebesgue integration in \R^1, and have an optional section on the theory in \R^n.

Now the notes from both parts of the course (1: functional analysis, 2: real analysis) are available– cf. the Lecture Notes section. They are comprehensive enough that I didn’t refer to the book at all for the first part, but I actually like the book for this part (Wheeden and Zygmund), so I’ll probably read along.