Monday, October 17, 2011

Big Acoustic Fingerprints

A few days ago, I found myself in a situation that begged for automation. As Larry Wall says, for a software developer, laziness is a virtue (good programmers will learn or invent shortcuts to save time). I had recorded many .mov files from my digital camera, and wanted to extract the audio track. I wanted to create one .mp3 file for each .mov file with its corresponding audio.

The shareware program Total Recorder, which I recommend, can record any audio played by your computer to a .wav or .mp3 file on disk. With this and Windows 7 Media Player's ability to play .mov files, I was already half-way to the solution. I could start Total Recorder, put all of the .mov files in a playlist, and wait while the audio tracks were recorded in real-time.

The result, though, would be one large .mp3 file that would have to be cut manually into pieces. For dozens of tracks, this would be a tedious process.

My first thought was to write down the list of lengths, and use that list to cut the .mp3 file. I know that the first track is 1:21, the second is 1:06, and so on. This approach, though, was prone to error. The times were rounded to the nearest second, and so by the end, errors would accumulate. Also, the Media Player itself might add slight pauses between tracks.

I then realized that I could put a 'marker' sound between each track. After recording, I could look for the markers, and split the audio precisely.

One approach to do this would be to use a generated combination of tones. I could pick a certain number of pitches that would be rare to encounter in the real audio data, and create a marker based on these frequencies. When processing afterwards, I would split the audio data into small fragments, run FFT on each fragment, and search for fragments that contained just those frequencies (allowing some margin for error). This approach would be robust to mp3 audio compression, and even playback on different speakers.

For this particular setup, though, I did not need to be as robust, because Total Recorder will record the audio digitally before it is converted to analog, although the signal is still slightly altered by mp3 compression and the operating system. As a half hour hack, I made an audio marker that has an unusual profile:

Over the course of one second, the signal value is increasing. Playing this on speakers, the sound is silent except for a click at the end when the signal suddenly drops to 0. I left the click intentionally in order to hear the sound. This is very different than a typical audio signal, which fluctuates up and down rapidly; the quick fluctuations causing an audible pitch:

It was trivial for me to find areas in the audio data where the signal value increases for more than half a second. To make the system more robust, I looked at the averages of blocks of 16 samples, and looked for long sequences where these averages were always increasing. (Note that this block size can't be safely set to a large value, because a low sound increasing in volume with DC offset could trigger false positives). After finding a match, I could skip forward several seconds, knowing that no track would be shorter than 10 seconds. It's a good thing I had my own audio library; the wav files were too big to realistically store in memory, but because I was familiar with reading raw .wav data I could just scan through the stream of bytes. My program output the cut points in milliseconds to a plain text file, adjusting the times to trim away the audio marker.

The final workflow:
1) A python script creates a .wpl Windows media player playlist of the .mov files separated by my marker audio.
2) Start Total Recorder and start the playlist, and wait for a while.
3) Save the recorded audio as .wav and run my program to determine cut points.
4) Encode as mp3.
5) I use the excellent mp3directcut to split the audio at those points. A python script turns my text file into a .cue file that mp3directcut uses to split the audio into pieces.

And that's all. The system worked very well and hit no false positives. mp3directcut can (magically?) slice up a mp3 file without recompressing, so the process is lossless and very quick. I now have an automated way to cut a long mp3 file into pieces.

No comments: