Let's say I want to be able to parse an arbitrary expression like
f(f( a+f(a * b * f(c))))
where a, b,c are variables, and f is some function. I want to make a parse tree, and possibly to transform the tree.
Last night I thought of a very unorthodox way to do this.
I've heard of yacc/lex, pyparsing, and Python's ast. However, for what I have in mind this is all I need:
- Python supports overloading operators.
- Python has an "eval" for dynamically running code.
My idea is to create Python objects for f, a, b,c, and then *eval* the expression as if it were Python code. The objects I create will have their operators overloaded so that a+b, for example, returns an expression like ['+', a, b].
Here's an example that supports addition, multiplication, and functions taking any number of arguments. (Adding the rest of the operators is simple).
class FunctionSymbol:
def __init__(self, name):
self.name = name
def __call__(self, *args):
return SubExp(self, list(args))
def __str__(self):
return 'FN_'+self.name
class VariableSymbol:
def __init__(self, name):
self.name = name
def __add__(self, other):
return SubExp('OP_ADD', [self, other])
def __mul__(self, other):
return SubExp('OP_MULT', [self, other])
def __str__(self):
return 'VAR_'+self.name
class SubExp(VariableSymbol):
def __init__(self, op, args):
self.op = op; self.args= args
def __str__(self):
return str(self.op) + '(' + ','.join(map(str, self.args)) + ')'
def strangeparser(s):
#parse something by evaluating it as if it were Python code
symbols = {}
#create objects for the symbols in the string
snospace = s.replace(' ','').replace('\t','') + ' '
import re
for match in re.finditer('[a-zA-Z_]+', snospace):
strfound = match.group()
if strfound not in symbols:
#assume that if the next character is "(", then it is a function
if snospace[match.end()]=='(':
symbols[strfound] = FunctionSymbol(strfound)
else:
symbols[strfound] = VariableSymbol(strfound)
# evaluate it
try:
return eval( s , globals(), symbols )
except Exception, e:
print 'Could not parse. %s' % str(e)
return None
def main():
tree = strangeparser('a+b+c')
print tree
# OP_ADD(OP_ADD(VAR_a,VAR_b),VAR_c)
tree = strangeparser('f(f( a+f(a * b * f(c))))')
print tree
# FN_f(FN_f(OP_ADD(VAR_a,FN_f(OP_MULT(OP_MULT(VAR_a,VAR_b),FN_f(VAR_c))))))
Also, I've written code that turns this tree into a sequence of simple operations. Coincidentally, I think this could be used to write a compiler. I haven't read any books about that, though.
I'm working on a way to conserve the temporary variables used here, because there is probably a way to reuse them after they aren't needed. What I do is go down to the bottom of the tree, find something that can be evaluated, and replace that deepest node with a temporary variable. I then repeat that process until the whole tree has been "flattened".