Ideas for a Slightly More Powerful CPU

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Ideas for a Slightly More Powerful CPU

Warren Toomey
I sat down during lunch today and doodled up a design for a replacement CPU for the Hack machine with the same API, but with a few more registers in it, and which has a more traditional machine language. Here's the design, hopefully my flat text document will come out. Comments on the design are most welcome!

            Mack: Warren's Idea for a Slightly More Powerful CPU
            ===================================

Goal: to build a different CPU, using the existing builtin Tecs chips with some extra control logic. Keep the same ALU and the same branch logic, but have multiple registers, and some CPU state flags, as per more popular CPUs.

Basic architecture: ALU; four 16-bit registers: R0, R1, R2, R3; two flag bits (negative and zero) which are set by the last ALU operation. Each instruction is 16 bits. There is up to one RAM fetch and one RAM store in each instruction.

Instruction formats: there are 3 instruction formats: load R0 with a constant, perform an ALU operation, and branch or jump. Here they are:

Load R0 instruction
-----------------------

  0 <15-bit unsigned constant>          Load R0 <- constant

ALU operation instruction
------------------------------

  10 idd iss oooooo ss                  meaning dst = src1 op src2

        dst = src1 op src2              Same 6-bits to control existing ALU,
                                                and 2 bits to identify the dst, src1
                                                and src2 registers.

                                        When i bit is set in iss, we load
                                        from RAM[src1], not src1. When i bit
                                        is set in idd, we save result to
                                        RAM[dst], not the dst register.

                                        Result of the ALU operation clears or
                                        sets the two flag bits: zero, negative.

Branch/Jump instruction
-----------------------------
This instruction can encode a relative branch of up to 2048 instructions, or it can encode an absolute jump to the value store in R0.

  11 jjj <11-bit signed constant>       When any of the j bits are set,
                                                      they act like the Hack jump bits.
                                                      Examine the zero and negative flags.
                                                     If a branch, sign extend the constant
                                                     to 16 bits, and add it to the PC.

                                        However, when all the j bits are off,
                                        this is an unconditional jump. Load
                                        the contents of R0 into the PC.

Assembly Syntax
--------------------

label:                                identifier followed by colon
  R0 = constant                 Load R0 with constant instruction

  [*]Rn = [[*]Rn] op Rn         ALU instruction, * means indirect addressing,
                                          [ ] means that this is optional

  Bxx label                     Relative branch to label, +/- 2048 instructions
                                    Bxx is BEQ, BNE, BLT, BLE, BGT, BGE, or BR
                                    for an unconditional branch.

  JMP label                     This is an absolute jump "macro" instruction.
                                    The assembler translates this into: load R0
                                    with the label, followed by a jump instruction.

Also another macro instruction:

        Rn = constant           If Rn != R0, do R0 = constant, Rn = R0.

Thus, R0 becomes a sort of "scratch" register.

Sum the Numbers from 1 to 100
--------------------------------------

  R1 = 1                // i = 1
  R2 = 0                // sum = 0
loop:
  R2 = R2 + R1          // sum = sum + i
  R1 = R1 + 1           // i++
  R0 = 100
  R0 = R1 - R0         // calculate i - 100
  BLE loop                // loop while i <= 100
end:
  JMP end                 // end with infinite loop
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ideas for a Slightly More Powerful CPU

Warren Toomey
I found a bug. The general Hack computer architecture has only one address bus from CPU to memory, used to both identify the location of the data to fetch from RAM and write back to RAM. Therefore, the CPU cannot read from one address and write back to a different address in an instruction.

This means my CPU cannot do, for example, *R3 = *R2 + 1, i.e. load from the address which R2 points to, and write back to the address in R3.

So, the solution is, when the i bit is set in both idd and iss, the src1 value (i.e the ss in iss) is replaced with the destination value. This allow the CPU to do *R3 = *R3 + 1.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ideas for a Slightly More Powerful CPU

cadet1620
Administrator
For a lunch time back-of-the-envelope doodle, this is quite a clever design.

Implementing this architecture in LogiSim has been on my to do list for quite a while and I finally got around to it.

Mack computer in LogiSim

Mack computer CPU in LogiSim

Note: type in the keyboard while the program is running to change the message.

Files
Mack.circLogiSim circuit
MackHello.lsbin  Hello world program (if you need to reload the ROM)
MackHello.lstCommented source

Along the way I discovered some interesting details.

Easily extended Jump instruction

As I was wiring up the PC load address for the Jump instruction (Branch with jjj = 0) I had a rather snaky wire running its way to R0. It went past the ALU on it way and I got to thinking...

The Jump instruction, 11 000 xxxxxxxxxxx, has all the correct bits available in the undefined region to turn it into a PC= instruction with full ALU computation including indirect src1.
    11 000 iss oooooo ss

This may be quite useful for return IPs for assembly language subroutines, computed jumps and implementation of call and return stack functions. It appears useful if the PC= instruction does not set the status bits; there may be a way to use the zr flag as a subroutine pass/fail return value.

        R0 = RIP_1
        R3 = R0
        BR      Function
    RIP_1:
        BNE     Error
        ...

    Function:
        ....
        R0 = 0  // good result
        PC = R3

Status bits and overloaded R0=n instruction

There is a problem with the above code snippet. Depending on how the assembler encodes the R0=0 instruction, it may be an A-instruction that does not set the zr flag. I changed the data routing for the A-instruction—instead of muxing the data directly into R0, I am muxing it into the ALU x input, setting the ALU function to out=x, and loading the status register.

More macros

If R0 is considered a scratch register that the assembler my use at will, then additional useful macros are possible. For example:
    Rd=-6               R0=6; Rd=-R0
    *Rd=123,            R0=123; *Rd=R0
    Rd=Rs+16, s!=0      R0=16; Rd=Rs+R0

How much do the extra registers help?

Here is an assembly version of C's strcpy. The parameters are pointers to the string data.
On Hack, the parameters and RIP are passed in memory varaibles; on Mack they are passed in registers.
      Hack       Mack
// call strcpy (dest, src)
	@src
	D=M
	@strcpy_src
	M=D
	@dest
	D=M
	@strcpy_dest
	M=D
	@rip_123
	D=A
	@strcpy_rip
	M=D
	@strcpy
	0;JMP
	...
	
strcpy:	@strcpy_src
	M=M-1
	@strcpy_dest
	M=M-1
	
loop:	@strcpy_src
	AM=M+1
	D=M
	@p2
	AM=M+1
	M=D
	@loop
	D;JNE
	
	@strcpy_rip
	A=M
	0;JMP
// call strcpy (dest, src)
	r0=src
	r1=*r0
	r0=dest
	r2=*r0
	r0=rip_123
	r3=r0
	r0=strcpy  // jump strcpy
	PC=r0      // macro
rip123:	...
	
	
	
	
	
	
	
strcpy:	r1=r1-1
	r2=r2-1
	
	
	
loop:	r1=r1+1
	r2=r2+1
	r0=*r1
	*r2=r0
	bne loop
	
	
	
	
	PC=r3
	

The Hack call + function runs in 23 + 8n instructions, the Mack call + function in 11 + 5n, where n is the string length including the terminating 0.

 

--Mark

Loading...