Graph layout by Stress Majorization by Optimization of A Complicated Function
I haven’t had much to talk about lately– the Probabilistic Graphical Models course is eating up my time like crazy: I haven’t had time to do anything but that and prepare my lectures and homeworks for the Matlab/Mathematica course– but I have been doing a lot of coding for the course I’m teaching and the course I’m taking. Most of it isn’t fit for public consumption, but some of it is worth sharing/archiving here.
The code below takes the weighted adjacency matrix of a graph and returns a planar embedding using stress majorization by optimization of a complicated function (a simple algorithm with a fancy name, originally developed by multidimensional scaling people). The idea is that ideally you want an embedding with the property that the distance between the points of the embedding approximates the distance between the corresponding vertices in the graph. This concept gives rise to a stress function, which compares these two distances. The stress function isn’t convex, so you majorize it at each iteration by a convex quadratic and use the minimizer of that quadratic as the next embedding. Continue until the quality of the embedding doesn’t change much.
Surprisingly, Matlab doesn’t seem to have built in graph layout routines (I still don’t get why gplot doesn’t have one built in), except for trees, so I find this routine useful when I need to quickly visualize a graph. You can also graph the m-file: smoacp.m
%SMOACP finds a layout for a graph that as much as possible is an isometry %from the graph to the plane (local minimum of the stress function). % % [P, SIGMA] = SMOACP(A, EPSILON) A is the weighted adjacency matrix for % the graph, and EPSILON is a stopping tolerance. Once the relative % decrease in the stress drops below EPSILON, terminate. P is the n-by-2 % matrix of vertex coordinates and SIGMA is the stress of P % % [P, SIGMA] = SMOACP(A, EPSILON, PGUESS) uses PGUESS as the initial guess % for P % % See Also SPECTRALGRAPHLAYOUT % function [P, sigma] = smoacp(A, epsilon, P) disp('Computing a graph layout that minimizes stress'); tic % useful stuff n = length(A); D = floydwarshall(A); remove1strow = @(M) M(2:end, :); % calculate the LGhat matrix Dinv2 = D.^-2; Dinv2(1:size(Dinv2,2)+1:end) = 0; % reset the diagonals to 0; LG = diag(sum(Dinv2)) - Dinv2; LGhat = LG(2:n, 2:n); % inner function to calculate the stress; note that it can access D, % the matrix of intervertex distances, and n even though they're % defined outside of it function sigma = stress(P) sigma = 0; for i = 1:n for j = i+1:n sigma = sigma + D(i,j)^2*(norm(P(i,:) - P(j,:)) - D(i,j))^2; end end end % inner function to calculate L^Z(P) function M = LZP(Z, P) % fill in the off diagonal terms for i = 1:n for j = 1:n if i==j continue; end M(i,j) = -D(i,j)^-1*pinv(norm(Z(i,:) - Z(j,:))); end end % fill in the diagonal terms M = M - diag(sum(M)); M = M*P; end % initialize if guess not provided if nargin == 2 P = randn(n, 2); end % recursively minimize the quadratic stress majorizer while true fprintf('Current layout has stress %f\n', stress(P)); oldP = P; % slightly inefficient solve of linear system: could do a little % better with linsolve and a precomputed cholesky decomposition P = LGhat\remove1strow(LZP(oldP, oldP)); P = [0, 0; P]; % check the two termination conditions if norm(P - oldP) < epsilon disp('Terminating because embedding didn''t change much'); break; end if (stress(oldP) - stress(P))/stress(oldP) < epsilon disp('Terminating because relative stress didn''t change much'); %break; end end sigma = stress(P); fprintf('Took %f seconds to find stress-minimizing layout', toc); end %FLOYDWARSHALL calculates the distances between vertices in weighted graph % % D = FLOYDWARSHALL(A) given an n-by-n weighted adjacency graph A, returns % the n-by-n matrix of distances between vertices. % function Dc = floydwarshall(A) disp('Computing the intervertex distances') tic n = length(A); % initialize d(0,i,j) Dp = A; Dp(A == 0) = Inf; A(1:size(A,2):end) = 0; % zero diagonals % recursively calculate d(k, i, j) for k=1:n for i=1:n for j=1:n if i == j continue; end Dc(i,j) = min(Dp(i,j), Dp(i,k) + Dp(k, j)); end end Dp = Dc; end fprintf('Took %f seconds to calculate distances\n', toc); end
Possibly relevant posts:
- K-means as a matrix optimization problem (1/9/2010)
- Connectedness of the set of feasible
-colorings (2/21/2007) - What good are eigensystems? (9/16/2005)