I spent this weekend writing a parser for a Mathematica-style grammar. I posted the files (scanner, parser, and grammar). They should be pretty interesting for anyone who is interested in implementing a non-trivial parser from scratch. I based them on the example calculator parser in Programming Python, but the complexity of the grammar I used, compounded with the fact that it doesn’t evaluate anything, makes it different enough to be interesting as is. The only big problem I have is that I haven’t been able to figure out how to implement unary minuses; everything else is licked, but that’s getting me. I’m going to implement the equivalent of Mathematica’s FullForm command for printing out the trees returned by the parser; currently, I’m using a crude tree dump that’s pretty hard to read (and extremely long! for even short expressions), but contains every last detail. Also, I’m going to gut the current error handling system, based on the rather inadequate system in Programming Python, and put in a more informative one; currently, half the time when an expression is ill-formed, the parser crashes instead of signaling the location of the error as it should, and it never makes any suggestions about what the error could be. Another caveat is that, for some reason, when the parser crashes on one expression, it sometime crashes on everything else after it, so you have to create a new parser for each expression to be safe. Oh, and the parser (this is minor, I’m just too tired to fix right now) expects each expression to be on separate lines.
Example of parsing f[{x_,y_}] = {0-1*y, x} / Norm[{x,y}] ( the weird 0-1*y is because of the incapability of dealing with unary minuses).
| BasicCAS |
Mathematica FullForm |
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: ImmediateAssign
Value: Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: list
Value: None
Children:
Type: sum
Value: None
Children:
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: integer
Value: 0
Children:
*
+
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: integer
Value: 1
Children:
*
Type: term
Value: None
Children:
Type: identifier
Value: y
Children:
*
-
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: identifier
Value: x
Children:
*
*
Type: term
Value: None
Children:
Type: funccall
Value: None
Children:
Norm
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: list
Value: None
Children:
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: identifier
Value: x
Children:
*
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: identifier
Value: y
Children:
*
*
/
Children:
Type: funccall
Value: None
Children:
f
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: list
Value: None
Children:
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: identifier
Value: x_
Children:
*
Type: factor
Value: None
Children:
Type: term
Value: None
Children:
Type: identifier
Value: y_
Children:
*
*
*
|
List[Times[-1, y, Power[Norm[List[x,y]], -1]], Times[x, Power[Norm[List[x, y]], -1]]]
|
Note that Mathematica performs some pre-processing before displaying the output (the zero disappears, for one, and the list takes the division inside). I plan on implementing this also, because among other things, I’m going to use GMP or a similar library for the numbers, and I want to keep rationals, reals, and integers distinct.
The next step is to implement a pattern based function definition system, like the one Mathematica uses. I love being able to say f[{x_, y_}] := {-y, x} /Norm[{x, y}] in Mathematica, so I want to be able to define functions in the same way. Or specify the value for a particular class of inputs, e.g. f[x_Integer] := Factorial[x] . Before starting the parser, this seemed like a difficult task, but now it seems like cakewalk. To implement this system, all I have to do is store the expression tree used to define the parameters to the function, and then match that tree against the parameters used to call the function. In the first example, the tree would be something like List[x, y] , so I’d match up the passed in parameters to this tree, binding x and y to the value of the expressions in the corresponding space, and running any tests on them (like _Integer); if the tree fit and the parameters passed the test, then I’d know what definition of the function to use.
After that, I need to make a namespace system. Then it should be a simple task to code in the simple interpretive programming utilities like For, Do, While, etc. Then I can think about the real hard stuff– the symbolic stuff– at leisure.
Comments and suggestions are more than welcome.