Friday, December 23, 2011


I just wrote a little something to cut .wav files into smaller audio files. It doesn't load the whole file into memory, and so will support large files. Many other programs do this, even ffmpeg, but I wanted to write this myself, in part because I wanted to be able to specify the cutpoints very precisely (by sample number, in fact). I also wanted to be sure no audio data was lost or altered.

To confirm correct results, I wrote a simple .wav analyzer that prints information about a wav file. It confirms that information in the header is correct and outputs the riff chunks. Use the -check option to see this.

This is called WavcutHHH so that it's clearly better than any existing program called Wavcut.

WavcutHHH will split a large .wav file into several files based on a text file
that specifies cut points.

wavcut.exe input.wav input.txt

First line of the .txt file is 'wavcut'.
Remaining lines of the .txt file should be numbers, specifying where to cut.
Refer to wavcut.exe trombone.wav trombone.txt as an example.

To check wav contents, use

wavcut.exe -check input.wav

Here's what it looks like (these are ~800kb because of an example .wav):
C# source
Win32 binary

On an unrelated note, I'm still in the midst of reorganizing my music library and moving to mp4. There are more hacks in the pipeline.

Tuesday, October 18, 2011

Mandelbrot Set on a FPGA

In April 2009, I developed an original system to compute and display the Mandelbrot set on FPGA hardware. I used the capabilities of this platform to achieve speeds of more than 25 million results computed per second.

The FPGA, or field-programmable gate array, essentially consists of a large number of logic gates that can be configured to perform computation. Developing for this platform feels foreign at first. There is no operating system, or even a CPU. Consequently, one cannot in general use a simple model where execution steps from one instruction to the next. One instead uses the fabric spatially, setting up associations and connections between components and placing logic in parallel. Making use of parallelism can lead to excellent performance solving certain problems.

I created this system on my own in my spare time, outside of class. I thank Professor Mark Chang for providing equipment and some advice.


My goal was to use the unique advantages of embedded programming to make a system that can perform computations very quickly.

I chose to compute the Mandelbrot set (yes, the cliché image) on the FPGA platform, solving the problem of whether a particular x,y value was inside this set, to a given tolerance. My reasoning:
  • Nature of problem lends itself to parallelization
  • Problem involves arithmetic, has opportunities for coming up with optimization strategies.
  • Easy to visually verify correctness (and looks cool)

The configuration made by my design can only perform one task. A CPU, like the processor on the computer you are currently using, is general purpose, and reads from a list of instructions to perform. If given a new list of instructions, it is able to execute them. The 'program' I describe here, on the other hand, has no such list of instructions, it is physically only capable of this one task.

Like a processor, though, everything is still governed by a clock signal. In a more advanced system, the developer will divide the problem into different smaller operations that can be done per clock tick. Then, resource-expensive steps can be apportioned out (there are a limited number of hardware multipliers and other units available on the chip), and the overall clock speed can be set higher. In the design described here, though, I used a simpler approach where the same operations are done on every clock tick. While this means the system clock speed might have to be slower, it still means that during each tick of the clock, several multiplications and additions and memory reads are done at the same time -- very different than what occurs in a CPU.

I used four types of components to solve the problem. A 'Child' unit receives x,y coordinates and screen coordinates as input. It has an internal state machine. In `State_calc it performs one iteration of the Mandelbrot set computation. If the current value is too great (pixel is not in Mandelbrot set), or the number of iterations is sufficiently large (pixel is probably in Mandelbrot set), state moves to `State_write which will write the result to memory by setting a write-enable flag. `State_acqnew retrieves the next value that should be computed and resets computation. Each child had a separate RAM unit so that it could independentally read and write to memory. The 'Parent' unit contains several Child units and is responsible for dividing work among them. The current set of uncomputed values is given to the first child that is not busy. The ColorMap unit maps the number of iterations to an R,G,B color, using some shifting to unpack colors from distributed ROM. Because this isn't an art project, I just reverse-engineered and used colors from the shareware program UltraFractal. The VGA unit reads a value from memory, translates it to a color with the ColorMap unit, and sets output voltage on a physical pin. VGA scans through the image more quickly than values can be computed, which is why separate memory is required.

I could then connect these units. (I find it interesting that when on hardware, the units are literally physically connected to each other. Makes one realize that so much programming one does is very far from the hardware.) The design was intentionally flexible with regard to the number of Child units. This number can be increased until reaching the capacity of the fabric, so that my design will easily scale if a FPGA with more gates is available.


I first made a Verilog module called vga_simulator that could be used in ModelSim. This module received r, g, and b values with the same interface as an actual vga output. It wrote to a file on disk that could later be transcribed into a standard bitmap image.

I then wrote a C implementation of the Child algorithm. Because this prototype ran quickly, I could use it to experiment with the color table, number of iterations, and initial view.

I translated my design to Verilog, using the vga_simulator to test results. I used fixed point arithmetic, which was sufficient for the problem. The product of x and y is written (x*y)>>>`FIXPTSIZE. Floating point would not have been practical for the scope of my project. I started with just one Child, and black and white, to reduce possible points of error, before slowly re-adding functionality. One obstacle was that I needed to use both signed reg and signed shift (>>>) in a certain part of my arithmetic. Coming from C, I had not realized that I needed separate syntax for signed shift, and so it took some time to realize was incorrect.

(This picture shows the result of a bug in my scheduler. Sometimes it is broken code that makes the coolest-looking results.)

With a working simulation, and a highly-visible counter that showed how many clock ticks were necessary to complete computation, I could work on optimization.


Optimizing for this platform was a different experience than optimizing typical software. As well as taking less clock ticks to complete, the program also has to be optimized in space, to reduce the number of gates needed. This led to behavior that I did not expect: In typical software optimization, I will often check for conditions so that I can do less work if necessary, and skip going down an unneeded code path. In hardware, though, conditions are expensive, because each requires wiring. "Skipping a code path" is sometimes meaningless - the entire path is done in one clock tick anyways, and it is taking up space, so it is often better to --always-- be performing the computation, and just ignore the result if it happened to not be needed.

Straightforward optimizations included tuning constants so that iterations decreased while the image quality is still acceptable. I set up initial conditions and coordinates to values that the Parent scheduler happened to work more efficiently with. I tightened up the state machine in the child module, coming up with a way to need only two states. I rewrote the scheduler so that it did not need a barrel-shifter. For the sum of two fixed-point products, I could right-shift the sum instead of the individual products. I moved most logic to assign= statements.

I could save significant time by skipping computation inside the inner 'cardioid' shape. I soon learned, though, that multiplication was expensive due the design I chose. There were limited hardware multipliers, and I needed all children to perform at least 3 multiplies every clock tick. On the real hardware I had, there was no room for this shortcut.

Multiplication by a constant, though, I realized could be done very quickly. It was fun to re-write multiplication by 320, as n*320 == (n<<6) + (n<<8). Division by powers of 2, of course, I also converted into shifts. I aggressively re-used computed values, such as how x2 and y2 could be used in both iteration and checking magnitude. One attempt was to use an "approximation" abs(x)+abs(y) instead of the true magnitude, and to replace x2-y2 with (x+y)*(x-y), saving one multiply. This changed the visual appearance in an odd way, though.

I was pleasantly surprised to see in the logs that the Xilinx software understood that the Child module was using a state machine, it apparently could place the state reg in a more efficient way.


Developing with simulation can be misleading. You can use as much memory as you want, and it seems perfectly fine to multiply and even divide (!) numbers with just the symbols * and /. On real hardware, RAM is limited, and in my case, hardware multipliers even more limited. One makes tradeoffs between longest chain of logic and clock speed. Also, the software for synthesis has a learning curve, and synthesizing can take a significant amount of time.

I succeeded in synthesizing the project onto the hardware available to me, a Xilinx Spartan3 XC3s1500.

To make the image look interesting, I shifted the color palette over time, making the colors move. The entire image is recomputed every time the colors shift, an intentional way of demonstrating how fast my system is. As the video above shows, the results looked great, and the refresh rate was more than sufficient.

With the limitations of the specific hardware used, my final configuration was: 4 computing children 64 colors in palette 320x240 image 25Mhz clock speed 68,024 ticks to draw one frame at standard zoom

With a larger FPGA, I could have increased the number of children to drive these numbers higher. Even still, this math brought a smile to my face: 68,024 ticks to compute 76,800 pixels 68,024 ticks @ 25Mhz to compute 76,800 pixels 0.00272096 seconds to compute 76,800 pixels 1 second to compute 28,225,332 pixels

28 million computed values per second sounds pretty nice to me.

Monday, October 17, 2011

Big Acoustic Fingerprints

A few days ago, I found myself in a situation that begged for automation. As Larry Wall says, for a software developer, laziness is a virtue (good programmers will learn or invent shortcuts to save time). I had recorded many .mov files from my digital camera, and wanted to extract the audio track. I wanted to create one .mp3 file for each .mov file with its corresponding audio.

The shareware program Total Recorder, which I recommend, can record any audio played by your computer to a .wav or .mp3 file on disk. With this and Windows 7 Media Player's ability to play .mov files, I was already half-way to the solution. I could start Total Recorder, put all of the .mov files in a playlist, and wait while the audio tracks were recorded in real-time.

The result, though, would be one large .mp3 file that would have to be cut manually into pieces. For dozens of tracks, this would be a tedious process.

My first thought was to write down the list of lengths, and use that list to cut the .mp3 file. I know that the first track is 1:21, the second is 1:06, and so on. This approach, though, was prone to error. The times were rounded to the nearest second, and so by the end, errors would accumulate. Also, the Media Player itself might add slight pauses between tracks.

I then realized that I could put a 'marker' sound between each track. After recording, I could look for the markers, and split the audio precisely.

One approach to do this would be to use a generated combination of tones. I could pick a certain number of pitches that would be rare to encounter in the real audio data, and create a marker based on these frequencies. When processing afterwards, I would split the audio data into small fragments, run FFT on each fragment, and search for fragments that contained just those frequencies (allowing some margin for error). This approach would be robust to mp3 audio compression, and even playback on different speakers.

For this particular setup, though, I did not need to be as robust, because Total Recorder will record the audio digitally before it is converted to analog, although the signal is still slightly altered by mp3 compression and the operating system. As a half hour hack, I made an audio marker that has an unusual profile:

Over the course of one second, the signal value is increasing. Playing this on speakers, the sound is silent except for a click at the end when the signal suddenly drops to 0. I left the click intentionally in order to hear the sound. This is very different than a typical audio signal, which fluctuates up and down rapidly; the quick fluctuations causing an audible pitch:

It was trivial for me to find areas in the audio data where the signal value increases for more than half a second. To make the system more robust, I looked at the averages of blocks of 16 samples, and looked for long sequences where these averages were always increasing. (Note that this block size can't be safely set to a large value, because a low sound increasing in volume with DC offset could trigger false positives). After finding a match, I could skip forward several seconds, knowing that no track would be shorter than 10 seconds. It's a good thing I had my own audio library; the wav files were too big to realistically store in memory, but because I was familiar with reading raw .wav data I could just scan through the stream of bytes. My program output the cut points in milliseconds to a plain text file, adjusting the times to trim away the audio marker.

The final workflow:
1) A python script creates a .wpl Windows media player playlist of the .mov files separated by my marker audio.
2) Start Total Recorder and start the playlist, and wait for a while.
3) Save the recorded audio as .wav and run my program to determine cut points.
4) Encode as mp3.
5) I use the excellent mp3directcut to split the audio at those points. A python script turns my text file into a .cue file that mp3directcut uses to split the audio into pieces.

And that's all. The system worked very well and hit no false positives. mp3directcut can (magically?) slice up a mp3 file without recompressing, so the process is lossless and very quick. I now have an automated way to cut a long mp3 file into pieces.

Thursday, September 22, 2011

Beat Circles

Make beats. Tweak them by shifting each channel. Save and send to friends.

If you have Chrome or Firefox, try it! (Recent versions of Safari may also work).

My novel layout shows the repeating loop as an actual circular loop.

Some demo beats I came up with (but you really should put together your own):

Use the icons on the left for more adjustments:

Load a demo. Change the number of divisions, so that you can make the 7/4 beat you always wanted. A shortcut: after it is clicked, when you create a drum-hit in the upper-right quadrant, the corresponding drum-hit is also turned on in the other three quadrants.

Because the sound is generated in real time, you can edit the rhythm, and even tempo, during playback. If the volume of a channel is very high, I 'distort' the sample by intentionally causing clipping.

I made this for fun. Real software is more complicated.

Thursday, September 15, 2011

Laser Sounds

It's another real-time audio experiment, this one even less serious. Click the 'play' button and drag the dot across the screen to make "laser sounds".

Try it! (in a recent version of Chrome or Firefox).

An example - but there are other sounds to be made that I haven't discovered yet.

Moving left to right will increase frequency, moving downwards will increase harmonic content. Moving slowly to the left with a square wave can make 90s arcade-game like noises. Increase the 'sparkle' to create even older-sounding arcade game noises.

What's going on? I am playing one tone and quickly changing the frequency. When the frequency is changed slowly, one hears a typical siren like one would expect. If the frequency is changed more quickly, the result is a "tshu" pulse sound. Finally, if the frequency is changed very quickly, the entire quality of the sound can change. This is called FM synthesis, where FM stands for frequency modulation. Radio transmitters use Frequency Modulation, changing the pitch of a single note very quickly, to 'move' a signal in frequency to a very high frequency. Video games in the 90s also used FM synthesis; when putting this program in square wave mode and moving to the left, the sound resembles that of a DOS game or GameBoy handheld. Maybe I'll follow the work of some musicians (Bibio, Crystal Castles) in incorporating arcade-like noises into music.

Friday, September 2, 2011

Tremolo Machine

If you have Firefox: this is a machine that makes sounds. It's not a very serious project, but I find it entertaining. Click and drag the dots in order to move the levers, and click the play button to hear the result. (Further changes are not picked up in real time; you have to click play again.)

Try this:

Bubble music

Curious drill

Beach things

Click the eye icon for more examples (be prepared to turn down the volume).

'vib' is vibrato, causing the pitch to change pitch over time.
'trem' is tremolo, causing the sound to become quieter and louder over time.
'vib10' and 'trem10' are more potent versions of 'vib' and 'trem' (the frequency is 10 times faster).
trem10 with a high frequency can be an interesting effect. vib10 with a high frequency can in effect use frequency modulation to create another pitch.
vibc is the cooing of a bird; it plays one quarter of the vibrato and silences the rest.

The smaller control to the right of the modification type can adjust phase.

The button on the left, above the eye, will enable 'layer audio'. Now whenever you click play, you hear the previous results too, and can layer sounds upon sounds.

To make fine adjustments, click the textfield button to manually enter a value. Alternatively, click save and make adjustments to the data in the url.

Try it!
(Chrome or Firefox only)

Saturday, August 6, 2011


Over the years, I have accumulated a fair number of website URLs. In one form or another, I've kept links to music videos, webcomics, essays, computational complexity blogs, artwork, and anything else that I found to be worth saving. Some of these links I organized in a OneNote web app on SkyDrive (so that it could easily be accessed from many computers). However, I wanted my own system, so that I could easily include screenshots of each page (a crucial feature for me - even if the thumbnail is too small to see content, just the color scheme can sometimes trigger one's memory of the website).

My requirements:
-Thumbnail screenshots of the webpage
-Easy to organize bookmarks into folders, not just tags
-Ability to export bookmarks and titles to simple plain-text
-Ability to sync bookmarks between machines
-Quickly navigate, rename, open, delete

My first thought was to create some non-trivial web application, maybe with some new UI concepts to browse the data. I could use a service like thumbalizr to get thumbnail. However, as Larry Wall says, for a programmer, laziness is a virtue, and I didn't want to spend very much time on what would probably be a program only I cared about. (There being many existing apps like readitlater, etc).

Then came the idea.

I will use metadata embedded in a normal .jpg image as my file format, and organize the bookmarks as normal files in Windows Explorer! Each bookmark is represented by a .jpg image that is a screenshot of the website. The name of the file is the name of the site. The url is stored in exif metadata within the image, a standard format that can be read by many programs. Now 90% of the work is done for me - navigating, renaming, moving into folders. Syncing between machines can be done with Dropbox or Skydrive. Exporting to plain text can be done with a simple Python script. To open in a webpage, I drag the .jpg into the "bookmarkbuckets" window and the url will open in one's default browser.

1) Start with a plain text file with urls and titles on alternating lines, or a folder of .url files.
2) A short lnzscript file runs which opens each url in a browser and takes a screenshot as a .jpg image.
3) Bookmark buckets modifies the .jpg image to add the URL in exif data. It also adds the title.
Now, these .jpg files are all are needed. One can rename and categorize as needed.
To open the corresponding webpage, drag the .jpg and drop it into the "bookmarkbuckets" window.
(Bookmark buckets can also export all links back to a tab-delimited file).

It's worked well for me. I'll release to the public under a GPL license in the future.

It seems to me that there are (or were) 3 usages of bookmarks:
1) a shortcut to a commonly accessed page
2) a way to remember a less-commonly accessed page
3) a way to put aside a webpage in order to return to it later
(I made this list in 2007)
I note that these have changed over the past few years:
Shortcuts like the awesome bar, and Google instant have all but eliminated the need for 1.
Apps like instapaper and readitlater address 3.
I'll try to make Bookmarkbuckets effective for 2.

Tuesday, May 31, 2011

Waveform UI

I'm having fun writing code for real-time audio effects.

If you have Chrome or Firefox, you can play with this Waveform UI toy. Click the play button to start the audio, click and drag on the dots to change the sounds!

You can experiment with what different wave shapes sound like. In general, smoother curves have less high-pitched buzzing, because there are fewer harmonics needed to approximate that curve. I haven't found very many interesting sounds in the time domain like this, though, so you'll have to wait for further experiments to make a variety of textures.

The "cut" setting, if turned on, takes a part of the waveform and sets it to 0. (This isn't reflected in the waveform graphic.)

I made the interface using raphaeljs, which I recommend. This project is made possible by Firefox 4's cool raw audio data api. Thanks to those at Mozilla.

Monday, April 25, 2011

Visualising musical appetite

Last night I thought of a way to plot my music library over time. I quickly wrote a python script to make a histogram of when songs were added. Because my music is already sorted by genre, I added that to the visualization.

I call this plot "musical appetite", because that conveys both taste and quantity.

Larger render

I haven't really analyzed this, but see a few trends. Added a lot of rock in 5/2007. I think the upticks correspond with college spring and winter breaks :) Currently, nearly all growth is in 2000s and electronic. A project actually done in 30 minutes for a change.

*The music was manually sorted, as opposed to using an algorithm, but genres do overlap. My 19 categories are here combined to 11. To clarify, the category rock includes rock music from the 60s and 70s, and the decade categories "1960s" and "1970s" contain non-rock music from those times. Latin contains music from Cuba, Argentina, Brazil, and Spain. Classical here refers to composed music in general, from 1300 to the present. Other includes music from Egypt, Iraq, Bulgaria, Jamaica, some African countries, and soundtracks.

Friday, February 11, 2011

How to quickly multiply roman numerals

I used to think that it would be very slow to multiply roman numerals. A few days ago, I thought of an algorithm that is actually fairly quick.

The roman symbols themselves were slowing me down, so, my first step was to come up with a different set of numerals. One might call these "potato bug" numbers. (When converting from roman numerals, also expand any shorthand so that IV is expanded to IIII and so on). Each hatch mark means a factor of 10.

and so on. This system is easier to remember and also happens to be quick to write:

Multiplication is now simple. Anything multiplied by I is the number itself. Anything multiplied by ✝ is the number times 10, which is to say: draw another hatch line on everything. When multiplying by 5, one can use the shortcuts in this table:

To generalize the table - if either or both of the inputs have hatch marks, simply add that many hatch marks on the result. Easy to remember that multiplying by ✝ is just drawing another hatch mark on everything.

One can multiply two numbers by using a "FOIL" like method by the distributive law. To multiply two numbers, take the left-most symbol of the left number, and distribute it across the symbols in the right number. The intermediary results can just be appended in one long string (because there is no place system, a ✝ is 10 wherever it is written). Then take the next symbol in the left number, and again distribute it across the symbols in the right number, and repeat. Finally, simplify the answer by grouping symbols.

There are many operations, but each is trivial and done in a split second. When later "gathering", I would underline to help me remember what I had moved into a group of 5.

So there you have it. Multiplying two digit numbers in seconds with some circles and scratches. Not as slow as you might think.