The Hack assembler written in Nand2Tetris is an absolute assembler.
The entire source code must be in one file and the .hack file that it produces is a ROM image that must be loaded starting at address 0. RAM symbols are allocated starting at address 16.
This sort of assembler works quite well for its intended purposes in this course: learning the Hack Assembly Language by writing small programs, learning the design of a simple assembler and writing one, and assembling large machine generated assembly language files as generated by the VM translator.
This does not work well it you are writing a large assembly language program that will consist of many source files, possibly written by different team members. This would require that all the ASM files be combined into a single file and run through the assembler.
Imagine the chaos—how many of those files have the label (LOOP) in them?
OverviewThe solution is to assemble all the ASM files separately and use a link editor (linker) that can combine multiple binary files into the final program binary.
For this to work, the ASM code needs to list what symbols are to be available for other ASM files to use, and what symbols it needs to use from other ASM files. This is called linkage information. Typical linkage info looks like this:
.public (TETRA) .extern (LSUM) .extern LSUM_n .extern LSUM_p
Another issue is that the binary files have memory addresses for both code labels and variables. These addresses will be incorrect if the binary is loaded into memory anywhere other than starting at address 0. Correcting these addresses when the binary is moved to a new starting address is called relocation.
The linkage and relocation information needs to be written into the binary files so that the linker will be able to use it. Binary files that include this information are relocatable object files.
Relocatable object filesEvery relocatable object file is generated with its ROM addresses starting at address 0, and their RAM variables will be allocated starting at address 0. (This is different than the normal Hack assembler which allocates RAM variables starting at address 16; the linker will ensure that RAM variables in the linked binary will start at address 16.)
The built in symbols R0–R15, SP, LCL, ARG, THIS, THAT, SCREEN and KBD are absolute (constant) RAM addresses, not relocatable variables.
Hack relocatable object files will be similar format to the existing .hack files, writing the data values in binary. The linkage and relocation information will use the following suffixes on the binary value.
The linkage information is written as a header at the beginning of the object file. This allows the linker to quickly read the info without reading the entire file. For the same reason, the header should include the required ROM and RAM sizes, although these sizes could be determined by reading and parsing the entire file. The order of the linkage records within the header is unimportant.
Address relocation will be discussed in the next section.
Link EditorThe link editor (or linker) combines the relocatable object files into a single memory image (.hack) file. This is a two pass operation like the assembler. The first pass reads the linkage and size information from the headers in the object files and computes the starting ROM and RAM addresses for each object file. Given this information, the object's code and data can be allocated in ROM and RAM, and the final locations for all the public symbols will be known.
The second pass copies the program code from the object files to the executable file, supplying all the external link addresses and relocating all the internal memory addresses using the information collected in the first pass.
There is a small issue of where the program entry point is. With the original Assembler, the first instruction in the ASM file is the first instruction executed. Now that multiple ASM files are involved, The entry point can be handled in a couple of ways.
Chaos ensues...I have been experimenting with a more complex project than the simple example, and had quite a lot of trouble debugging the linked executable. The Hack Assembler's automatic allocation of RAM variables interacts very badly with missing .extern declarations for code labels. I missed a few of the .externs in several files and ended up with calls that jumped to RAM addresses. After finding the second wild jump, I decided that automatic variable allocation is not compatible with multiple source file usage.
I added a ".data" command to explicitly allocate RAM variables. At the same time I allowed multiple symbols on all the "." commands. Tetra.asm now starts like this:
// File: Tetra.asm // Set RAM[0..9] to the first 10 tetrahedral numbers: // 1, 4, 10, 20, 35, 56, 84, 120, 165, 220. .public (TETRA) .extern (LSUM), LSUM_n, LSUM_p // from LSum.asm .extern (__HALT__) // from Main.asm .data p, i (TETRA) // Make list of 1's. @10 // p = 10 D=A @p M=D
Linking PongThe "more complex project" mentioned in the previous section is Pong, linked with a set of individually compiled OS object files.
I wrote some Python scripts to split the .asm files generated by my VM Translator into individual .asm files. At simpler optimization level, the only common code routines that my VM translator uses are call, return, and compare. These common routines and the bootstrap are split to their own file, VM.asm. This results in the linkable OS library sources:
Using the same splitter on Pong generates its sources:
Pong builds successfully, and it runs!!!
Here's a portion of the link map. The map shows the address for all the publics, and a list of files that reference that public. The number in () is the number of references to that public from the named file.
You can see that the initial jump in the screen shot is going to __MAIN__ in VM.asm which is the bootstrap code. The bootstrap calls (Sys.init). As expected, VM.asm is the only reference to (Sys.init).
File <ENTRY POINT>: code segment 2 L 0 data segment 16 L 0 File ball.hackr: code segment 2 L 1941 2 Ball.new ponggame.hackr(1) 148 Ball.dispose ponggame.hackr(1) 181 Ball.show 234 Ball.hide 287 Ball.draw 353 Ball.getLeft ponggame.hackr(1) 367 Ball.getRight ponggame.hackr(1) 386 Ball.setDestination ponggame.hackr(1) 789 Ball.move ponggame.hackr(1) 1201 Ball.bounce ponggame.hackr(1) data segment 16 L 0 File bat.hackr: code segment 1943 L 879 1943 Bat.new ponggame.hackr(1) 2033 Bat.dispose ponggame.hackr(1) 2066 Bat.show 2119 Bat.hide 2172 Bat.draw 2243 Bat.setDirection ponggame.hackr(2) 2263 Bat.getLeft ponggame.hackr(1) 2277 Bat.getRight ponggame.hackr(1) 2298 Bat.setWidth ponggame.hackr(1) 2360 Bat.move ponggame.hackr(2) data segment 16 L 0 . . . . . . . . . . . . . . . . . . . . . . . . File os\math.hackr: code segment 5189 L 1453 5189 Math.init os\sys.hackr(1) 5372 Math.abs ball.hackr(2) os\screen.hackr(2) 5404 Math.multiply ball.hackr(11) os\output.hackr(3) os\screen.hackr(13) os\string.hackr(2) 5743 Math.divide ball.hackr(6) os\output.hackr(1) os\screen.hackr(5) os\string.hackr(1) 6366 Math.sqrt 6566 Math.max os\screen.hackr(2) 6604 Math.min os\screen.hackr(2) data segment 17 L 2 . . . . . . . . . . . . . . . . . . . . . . . . File os\sys.hackr: code segment 22944 L 349 22944 Sys.init os\vm.hackr(1) 23062 Sys.halt 23077 Sys.wait ponggame.hackr(2) 23182 Sys.error os\array.hackr(1) os\math.hackr(2) os\memory.hackr(2) os\output.hackr(1) os\screen.hackr(5) os\string.hackr(7) data segment 30 L 0 File os\vm.hackr: code segment 23293 L 186 23293 __MAIN__ 23320 $CALL$ ball.hackr(30) bat.hackr(18) main.hackr(4) os\array.hackr(3) os\keyboard.hackr(16) os\math.hackr(9) os\memory.hackr(2) os\output.hackr(122) os\screen.hackr(49) os\string.hackr(16) os\sys.hackr(13) ponggame.hackr(52) 23365 $CMP_JLT$ ball.hackr(10) bat.hackr(1) os\math.hackr(13) os\memory.hackr(3) os\output.hackr(7) os\screen.hackr(22) os\string.hackr(8) os\sys.hackr(1) ponggame.hackr(2) 23404 $CMP_JGT$ ball.hackr(2) bat.hackr(1) os\array.hackr(1) os\keyboard.hackr(2) os\math.hackr(10) os\memory.hackr(3) os\output.hackr(4) os\screen.hackr(18) os\string.hackr(6) os\sys.hackr(2) ponggame.hackr(3) 23430 $CMP_JEQ$ ball.hackr(6) bat.hackr(1) os\keyboard.hackr(3) os\math.hackr(2) os\memory.hackr(7) os\output.hackr(7) os\screen.hackr(2) os\string.hackr(8) ponggame.hackr(7) 23440 $RETURN$ ball.hackr(10) bat.hackr(10) main.hackr(1) os\array.hackr(2) os\keyboard.hackr(5) os\math.hackr(7) os\memory.hackr(5) os\output.hackr(12) os\screen.hackr(11) os\string.hackr(13) os\sys.hackr(4) ponggame.hackr(6) data segment 30 L 0 Code size = 23479 (0x5BB7) Data size = 30 (0x001E)Using my VM translator to translate Pong generates 23471 instructions. The 8 extra instructions are due to the bootstrap jump and some jumps required by moving the common code to VM.asm.
Future workNamed constants: Add ".const symbol=number" command to define constants and non-relocatable addresses. Constants can be public or external like any other symbol. There will need to be two new linkage records: PA and EA (A for absolute). Syntax for external constant will likely be the two statement sequence .extern X_SIZE .const X_SIZE which mirrors declaring an external variable.
Include files: Add ".include" command to support a single set of externs across a project. This might have saved me from the above chaos. This will require that the parser can push its current context to read the (possibly nested) include files and then pop its context to continue with the current file.
Library search: If there are unresolved externs, search library directories (-L options, HLIB=path;...) for .hackr files that define them.
Support function level linking: The obvious scheme is to add a .function command to the ASM to mark the beginning of an independent block of code and generate linkage tables for each functions. The Assembler will need to be aware of references between functions and write public/extern linkages for them. Seems like a lot of work. Is it worth it? For Pong, it would reduce the size from 23K words to 19K — 17% of the OS code is uncalled.
|Free forum by Nabble||Edit this page|