So, just for fun, I quickly wrote a Python script to translate assembly-like code into assembly. The small language I designed is an intermediate between higher-level code and assembly. The syntax is more like a higher level language, assigning values with "a=b" or adding with "c = a+b". It's not a compiler, but doesn't really need to be, and with too much abstraction one would be better off writing in C anyway.
The cool part about the project is how quickly and naturally the code turned out. It only took an hour to have a fully working version that understood the opcodes, and had useful error messages for syntax errors. The code just flowed quickly as I found ways to reuse code and simplify the problem.
For example, I needed to split each line of code by tokens. I came up with a creative way to do this - Python has a built-in way to split a string based on whitespace, " 'a b c d e'.split() " evaluates to "['a','b','c','d','e']". So in order to split on both + and - tokens and whitespace, I did "input=input.replace('+',' + ').replace('-',' - '); result=input.split()". By surrounding each occurrance of '+' with whitespace, now the string is split by both whitespace and tokens, so that a+b and a + b are both recognized as ['a','+','b']. (This could also be done by splitting on a regex, but that would be slightly more complicated).
The first, one-hour version, understands code like:
//add numbers from 1 to 25 and put result in sum. registers i, const_one,sum,highest seti const_one=1 seti sum=0 seti highest=25 seti i=0 LOOP: add i=i+const_one add sum=sum+i bne (i!=highest) goto LOOP
Then, later, I started to add more features. I added syntax shortcuts like a++, and simulated more types of branching even though our cpu only supported bne. I also added inference, where each line is checked to infer which operation you are trying to do. For example, if the line is "c=a+b" it is clear that you are doing an add instruction, so you don't have to type "add". Also, it now understands simple loops and if structures, translating them to assembly. It looks like Python if you squint.
// find prime numbers from 2 to 100. // Place the largest found in result. registers i,tmp, base, limit, result // clear memory for i,300,0 dataWordIndex[i] = zero endfor limit=100 for base,2,100 // check if we've already seen this number come up tmp = dataWordIndex[base] (tmp != zero) goto CONTINUELOOP // check off multiples of this number. i=base INNERLOOP: i = i + base dataWordIndex[i] = one (i < limit) goto INNERLOOP CONTINUELOOP: endfor // put largest prime found into result for i,100,2 tmp = dataWordIndex[i] if (tmp==zero) result=i goto DONE endif endfor DONE:
It was fun to come up with syntax for this. Branches are done with the unique, yet readable syntax "(a==b) goto Label". Array-like syntax is used for accessing data memory, like 'dataWordIndex[i] = b'. (dataWordIndex means that the index is automatically multiplied by four, so that it can be used as an array of 32 bit integers.)
When the script runs, it also prints out Verilog $display statements, which can be placed in the Verilog code to see the values of the registers.
Here is the code. ism_simple.py is the first version. I realize that it won't be useful unless you have a mips assembler and cpu.