For an open-ended project in my Computer Science course, I wrote a program which will take any regular expression as an input and generate a NFA for recognizing that language. It can display this NFA graphically as well as generate Prolog code for simulating it and using it to match strings. (So, if you have Prolog, you could use this as an implementation of regular expressions.)
I generate dot format files and call GraphViz to draw the graph. I use the symbol ',' to mean concatenation and enter 'a,a,b,b' instead of the traditional regexp 'aabb'. In the graphs, '.' refers to a null transition. Here are some sample inputs and outputs:
(Note that my handling of * is slightly incorrect because the expression must be present at least once.) Disjunction uses the '|' symbol, for example 'a|b' means a or b. Unlimited levels of nested parenthesis are supported, thanks to recursion.
I process the expression recursively by at each point dividing the input string into two parts, if possible. The string '(a|b),b' is split into (a|b) and b with the operation ',' , and the first part is then split into a and b with the operation |. If the expression is a or a*, then it will return, and work its way down. To my knowledge all complicated expressions are recognized successfully.
The Python program then will output Prolog code that can be consulted. Once consulted, the predicate parse can be used with a list to see if the string is recognized by the language.