Page moved to: Strangeparse

Let's say I want to be able to parse an arbitrary expression like
f(f( a+f(a * b * f(c))))

where a, b,c are variables, and f is some function. I want to make a parse tree, and possibly to transform the tree. Last night I thought of a very unorthodox way to do this.

I've heard of yacc/lex, pyparsing, and Python's ast. However, for what I have in mind this is all I need:
- Python supports overloading operators.
- Python has an "eval" for dynamically running code.

My idea is to create Python objects for f, a, b,c, and then *eval* the expression as if it were Python code. The objects I create will have their operators overloaded so that a+b, for example, returns an expression like ['+', a, b]. Here's an example that supports addition, multiplication, and functions taking any number of arguments. (Adding the rest of the operators is simple).
class FunctionSymbol:
  def __init__(self, name):
    self.name = name
  def __call__(self, *args):
    return SubExp(self, list(args))
  def __str__(self):
    return 'FN_'+self.name

class VariableSymbol:
  def __init__(self, name):
    self.name = name
  def __add__(self, other):
    return SubExp('OP_ADD', [self, other])
  def __mul__(self, other): 
    return SubExp('OP_MULT', [self, other])
  def __str__(self):
    return 'VAR_'+self.name

class SubExp(VariableSymbol):
  def __init__(self, op, args):
    self.op = op; self.args= args
  def __str__(self):
    return str(self.op) + '(' + ','.join(map(str, self.args)) + ')'

def strangeparser(s):
  #parse something by evaluating it as if it were Python code
    
  symbols = {}
  #create objects for the symbols in the string 
  snospace = s.replace(' ','').replace('\t','') + ' '
  import re
  for match in re.finditer('[a-zA-Z_]+', snospace):
    strfound = match.group()
    if strfound not in symbols:
      #assume that if the next character is "(", then it is a function
      if snospace[match.end()]=='(':
        symbols[strfound] = FunctionSymbol(strfound)
      else:
        symbols[strfound] = VariableSymbol(strfound)
  # evaluate it
  try: 
    return eval( s , globals(), symbols )
  except Exception, e: 
    print 'Could not parse. %s' % str(e)
    return None
  
def main():
  tree = strangeparser('a+b+c')
  print tree
  # OP_ADD(OP_ADD(VAR_a,VAR_b),VAR_c)
  tree = strangeparser('f(f( a+f(a * b * f(c))))')
  print tree
  # FN_f(FN_f(OP_ADD(VAR_a,FN_f(OP_MULT(OP_MULT(VAR_a,VAR_b),FN_f(VAR_c))))))

Also, I've written code that turns this tree into a sequence of simple operations. Coincidentally, I think this could be used to write a compiler. I haven't read any books about that, though.
input:
 f(f( a+f(a * b * f(c))))

output:
 i1=OP_MULT(a,b);
 i2=FN_f(c);
 i3=OP_MULT(i1,i2);
 i4=FN_f(i3);
 i5=OP_ADD(a,i4);
 i6=FN_f(i5);
 i7=FN_f(i6);
 return i7;
 

I'm working on a way to conserve the temporary variables used here, because there is probably a way to reuse them after they aren't needed. What I do is go down to the bottom of the tree, find something that can be evaluated, and replace that deepest node with a temporary variable. I then repeat that process until the whole tree has been "flattened".