Monday, January 31, 2011

mp3collect.go -- reorganizing mp3 files by hashes of their mpeg-1 content

A friend asked me a couple weeks ago for a sample of what a "real" Go program looks like. I have been using Go quite a bit for fuzzers and analysis packages at IOActive for the last few months, but I obviously can't share those with anyone else. On the flight back from Shmoocon, I decided to write a Go program to solve a problem that has been slowly building up in my ~/music directory.

It's a real trainwreck; between cycles of using iTunes and copying my music between devices I now have this mass of duplicated songs that have tweaked ID3 tags so I cannot simply de-duplicate them using hashes. The solution is to calculate a hash for the actual MPEG frames frames in each file and ignore all the helpful metadata.

The program does just that by constructing hard links between a file and the hash of its media contents; duplicates are reported and left intact. The plan is to go back through those files and normalize their ID3 metadata using a program that doesn't try to "organize" my music -- Quod Libet. (Or the Android music player, which is too dumb to attempt any of this.) There is, of course, room for improvement -- it does not handle FLAC, OGG or M4A files, which do occur in my library due to certain stores using non-MP3 formats. (Trent Reznor, Rhythmbox and iTunes, respectively.) It also should have a way of properly handling cross-filesystem collections by copying the file instead of hardlinking.