I have finished the assembler in Python in 170 codelines using only the basic stringprocessing methods of Python.
I conformed strictly to the three module API's and i have made a nice main function definition to drive the 2 passes.
I would like to share some experiences on my endavour.
1. At start i thought it was a waste of time to conform to the API's and i made a quick prototype that passed the Maxl.asm test the first time.
But then i realized that when i would work in a team it is more natural that there are agreements between the team members in some form. And the API's look like agreements to me. So i gave it a try. One result was that i got more "overview" over the entire problem. The second was that, when i finished a certain module and checked it, i did not have to think about it anymore and could think "bigger".
2. In Python you can nearly literally copy , ad verbatim, the text of the algorithm explaining the 2 passes of the Tecs book.
3. I first thought i could define a grammar for the asm language. but after quite some time i realized that that was not practically possible. For instance "D" is a symbol for a destination but also for a command , so it is not straight forward to define the grammar rules for this situation.
So i decided to work with the basic Python string processing methods. All the right marker symbols, like ';' and '=' , are there in the Tecs asm language definition and make it very easy to use the Python string partitioning ,splitting and concatenation operations. When writing this code you can already feel the limitations of such a language definition approach which relies exclusively on the use of markers.
I enjoy working on the projects very much and i am curious about the grammar defining the Jack language.
Thank you for sharing this. Point 1 is most important, and it's great that you realized it.
The largest project I ever worked on involved about a dozen programmers. Without clear interface specifications between peoples' areas of responsibility it would have been utter chaos. It was bad enough when someone needed to change a documented interface to fix a bug or add a newly required feature. The proposed changes had to be circulated and agreed upon before they could be made, and occasionally there was heated debate when one programmer's required change was catastrophic to another programmer's part of the system.
I just got started with assembler implementation. I was thinking of using FLEX/BISON. I didnt get your point on impossible to define grammar for asm. Can you please be a little elaborate? It can save me a lot of time.
The definition of the Hack Assembler language looks deceptively simple because it uses
informal natural language English grammar rules that are easy to understand by us humans.
But these are not so easy to formalize.
Consider the Hack assembler instruction "D=D+A".
If you want to use Bison you must use context free grammar rules to parse this instruction.
There must be at least one rule dealing with the first occurrence of "D" as a destination for the result and
a second one for the second occurrence of "D" as part of the description "D+A" of the required computation.
I see no simple way to parse "D=D+A" with context free grammar rules.
Also if you want to use Flex instead you have to deal, among lots of other things, with allowing "D+A" as a valid computation but not "D+D".
That complicates the building of a suitable regular expression very much.
And this is only the parsing stage, after this you still have to translate the parsed instruction into binary.
You can sidestep all these translation problems by noticing that the sets of destinations, computation and jumps are finite sets. So finite translation mappings are possible.
Just associate the 3 character string "A+D" with the 7 character string "0000010" : interpretation by men and machines don't matter while translating.
And then organize the associations, in tables for instance.
(By the way, you could specify grammar rules in Bison that mimic the finite mappings, but that looks like shooting at a fly with a canon.)
Totally agree with this. I'm writing a C-compiler using Flex/Bison, currently, while also completing the Nand2Tetris course. Because there are only a handful of possible Hack assembly instructions, it is rather simple to go from assembly to binary without the robustness of Flex/Bison. Though, I'm using C++ and I'm at 320 lines, roughly. However, I would say that about 40 are from functions/Classes I could reduce in size, and 90 are from me just listing the possible binary patterns and stacking them 1-pattern per line (and actually some are duplicates I can do away with). The actual algorithm is only about 30 lines of code...