Saturday, November 15, 2008

ISM, an intermediate level language

For our Computer Architecture class, a partner and I created a MIPS cpu in Verilog. This was a really cool project that deserves its own description, but for now I'll focus on something I made while testing the cpu. The cpu runs MIPS assembly code. Writing assembly is not bad for simple tests, but when I started to write more complicated programs, I got tired of remembering what I was using $s0 for and so on.

So, just for fun, I quickly wrote a Python script to translate assembly-like code into assembly. The small language I designed is an intermediate between higher-level code and assembly. The syntax is more like a higher level language, assigning values with "a=b" or adding with "c = a+b". It's not a compiler, but doesn't really need to be, and with too much abstraction one would be better off writing in C anyway.

The cool part about the project is how quickly and naturally the code turned out. It only took an hour to have a fully working version that understood the opcodes, and had useful error messages for syntax errors. The code just flowed quickly as I found ways to reuse code and simplify the problem.

For example, I needed to split each line of code by tokens. I came up with a creative way to do this - Python has a built-in way to split a string based on whitespace, " 'a b c    d e'.split() " evaluates to "['a','b','c','d','e']". So in order to split on both + and - tokens and whitespace, I did "input=input.replace('+',' + ').replace('-',' - '); result=input.split()". By surrounding each occurrance of '+' with whitespace, now the string is split by both whitespace and tokens, so that a+b and a + b are both recognized as ['a','+','b']. (This could also be done by splitting on a regex, but that would be slightly more complicated).

The first, one-hour version, understands code like:
//add numbers from 1 to 25 and put result in sum.

registers i, const_one,sum,highest
seti const_one=1
seti sum=0
seti highest=25
seti i=0
LOOP:
 add i=i+const_one
 add sum=sum+i
 bne (i!=highest) goto LOOP

Then, later, I started to add more features. I added syntax shortcuts like a++, and simulated more types of branching even though our cpu only supported bne. I also added inference, where each line is checked to infer which operation you are trying to do. For example, if the line is "c=a+b" it is clear that you are doing an add instruction, so you don't have to type "add". Also, it now understands simple loops and if structures, translating them to assembly. It looks like Python if you squint.
// find prime numbers from 2 to 100.
// Place the largest found in result.
registers i,tmp, base, limit, result

// clear memory
for i,300,0
  dataWordIndex[i] = zero
endfor

limit=100
for base,2,100
  // check if we've already seen this number come up
  tmp = dataWordIndex[base]
  (tmp != zero) goto CONTINUELOOP
  
  // check off multiples of this number.
  i=base
  INNERLOOP:
    i = i + base
    dataWordIndex[i] = one
    (i < limit) goto INNERLOOP
  
  CONTINUELOOP:
endfor

// put largest prime found into result
for i,100,2
  tmp = dataWordIndex[i]
  if (tmp==zero)
    result=i
    goto DONE
  endif
endfor

DONE:

It was fun to come up with syntax for this. Branches are done with the unique, yet readable syntax "(a==b) goto Label". Array-like syntax is used for accessing data memory, like 'dataWordIndex[i] = b'. (dataWordIndex means that the index is automatically multiplied by four, so that it can be used as an array of 32 bit integers.)


When the script runs, it also prints out Verilog $display statements, which can be placed in the Verilog code to see the values of the registers.

Here is the code. ism_simple.py is the first version. I realize that it won't be useful unless you have a mips assembler and cpu.

No comments: