ID3 and audio fingerprinting

Today I read a brief synopsis of the ID3v2 tagging technology (ID3 tags are the embedded metadata that let your MP3 player determine the name of a song in a file, the artist, album name, etc.). I was surprised to see that ID3v2 supports the embedding of time-stamped lyrics– none of the free tagging software I saw provides a way to edit or even look at lyric data, but I did see about two commercial ones with those capabilities. Haven’t found any lyrics databases on the net, however.

I was also thinking about audio fingerprinting: since I saw a commercial advertising a cell phone that could identify a song based on a clip, I’ve been wondering how you’d go about designing such a system. One obvious thought that came to mind was using time-frequency decompositions, like wavelets, to compress the data in some way consonant with human auditory perception… but how to translate this broad idea into a specific algorithm? I chanced across libFooID, an opensource implementation of a fingerprinting algorithm; the linked page has a nice synopsis of the algorithm. This algorithm isn’t aimed on identifying songs based on arbitrary fragments, but it’s impressive and useful as is.

On a related note, some researchers at HP wrote a short report on using semantic analysis of song lyrics to identify similar music. The conclusion is that automated auditory similiarity analysis is more accurate, but the two approaches are potentially complementary. The report is worth reading just as a showcase application a probablistic version (ps) of latent semantic indexing (pdf), a technique used in automatic document indexing. Just scanning those papers on semantic analysis has given me a contact high: beautiful math concepts (like principal component analysis and relative entropy) find natural applications here.

Leave a Reply