Mult16 chip

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Mult16 chip

SteveGreen
Hi there!

Recently I completed part II of the NAND2Tetris course.  What a trip! As a computer professional for over 30 years this course has been a chance to reconnect with what I love about computers, the basic fundamentals. It has been a fun and challenging 12 weeks and I was kind of sad when it all ended. But here's hoping part III comes along so I can get into some FPGA.

In the meantime I have taken Shimon's excellent advice and revisited the hardware and am in the process of extending the functionality. Today, I completed the Mult16 Chip and tested it using the Hardware Emulator. I have uploaded the code if anybody is interested to grab it and have a play. You will need to also build the Add15 to Add2 chips, but these are just cut down versions of the Add16 chip we did in week 2. The reason for using narrowed Add chips was to lower the overall hardware cost of the Mult16 chip. I have uploaded the source for these chips to save you some typing.

Anyhow, I play to move onto Div16 next and n-way left and right shift chips following that. Ultimately I'd like to extend the ALU and the machine code of the HACK computer to utilize these new hardware units. Obviously I would been to modify the CPUEmulator to make proper use of them down the track. Lots of fun awaits.

The code and testing scripts are attached:

Mult16.hdl
Mult16.tst
Mult16.cmp
Add2.hdl
Add3.hdl
Add4.hdl
Add5.hdl
Add6.hdl
Add7.hdl
Add8.hdl
Add9.hdl
Add10.hdl
Add11.hdl
Add12.hdl
Add13.hdl
Add14.hdl
Add15.hdl
Reply | Threaded
Open this post in threaded view
|

Re: Mult16 chip

cadet1620
Administrator
This post was updated on .
[Edited to correct link.]

Modern hardware compilers are good at optimizing away unused and unnecessary logic.  For example Mux16(a=false, b=whatever, ...) will most likely be reduced to 16 And gates.  Unused bits in an adder will also be eliminated.  This means that you can use Add16(a[0..3]=..., b[0..3]=..., out[0..3]=...); when you need a 4 bit adder and assume that the logic for the remaining 12 bits will not be generated.  This will let you focus on the overall structure of your multiplier rather than the busy-work of writing a bunch of various width adders.

Also note that there is duplication in each stage of the multiplier.  You can combine the Ands and the Adder into a new part and use that new part in each stage.


There has been a lot of research on optimizing parallel multipliers.  These will get you started it you want to look into it
    https://en.wikipedia.org/wiki/Wallace_tree
    https://en.wikipedia.org/wiki/Dadda_multiplier


Note that most hardware multipliers generate the double length result.  For Hack, that would mean you would need to add a register to hold the high result and instructions to access it, or have 2 instructions: one that returned the low word and one that returned the high word.


Another application of parallel addition similar to that used in multiplication is bit counting, sometimes called POPCNT (population count).  This circuit/instruction computes the number of one bits in its input.

The obvious implementation is to build a tree of adders. For 16-bit count:
        8 1-bit adders generate 8 2-bit numbers that need to be summed.
        4 2-bit adders generate 3 3-bit numbers that need to be summed.
        2 3-bit adders generate 2 4-bit numbers that need to be summed.
        1 4-bit adder generates the final 5-bit population count.
That requires 11 full-adders and 15 half-adders
       
My size-optimized 16-bit popcnt uses 11 full-adders and 4 half-adders.


--Mark