I just finished my assembler in python using regular expressions for labels and command recognition. There's one thing that still bothers me, though. I can't find the single regex pattern to matches both C-command variants - with and without jump - and returns only the 'comp' part of the command. I have to use conditional statement to check if I found one or the other and then extract the data:
compVal = re.search('=(.+);?', line)
if not compVal:
compVal = re.search('=?(.+);', line)
Combining both patterns into:
doesn't work either.
The following form:
is working but it returns 2 separate groups of which one will always be empty and, once again, program has to check which group is the valid one.
My suggestion is to not use regexps for this, as they'll probably make your code harder to understand. Especially for the HACK assembler, where you'll need to escape symbols like | and +.
But, if you really want to go with regexps, first start by defining expressions for the DEST, COMP and JUMP parts. Next, you have 3 cases:
1. The full case DEST=COMP;JUMP
It's hard (but not impossible) to combine them in a single regexp. But an easier alternative would be to build a regexp, where both DEST and JUMP are optional. Then you can test in your python code and raise an error (return value, exception or whatever) if both are missing.
Actually, it's not hard to just combine these regexps in one. I'm using $XXX notation to mean, that we use the regexp fox XXX:
We can simplify this, like this:
The problem is, that it's hard to extract the real values for DEST, COMP and JUMP, because COMP and JUMP can in two places. Again, an easy solution would be to just check in the code (pseudo-code):