# How do I do an If statement ?

13 messages
Open this post in threaded view
|

## How do I do an If statement ?

 An example would be if x is 0 then output is the input or if x is 1 then output is 0 thanks
Open this post in threaded view
|

## Re: How do I do an If statement ?

 Administrator In software, if statements make choices. What hardware part did you make in chapter 1 that chooses what to output? --Mark
Open this post in threaded view
|

## Re: How do I do an If statement ?

 Hey Mark, I love how much help you give on these fourms :) I know how to do the if statement with a chip, my question is It seems I need to compute both branches before going into the If statement, this seems inefficient as one branch is thrown away. Is there a way to to do if where it only computes the path chosen? Or is this the point of CPU branch prediction I've heard about, without that you have to compute both paths? Thank you for your help!
Open this post in threaded view
|

## Re: How do I do an If statement ?

 arjunyel, that's a great question. I tell myself, "hardware is not software," and in this case, there is no loss of "efficiency" by not using one of the computed values.
Open this post in threaded view
|

## Re: How do I do an If statement ?

 Administrator This post was updated on . In reply to this post by arjunyel The thing to realize is that the hardware is always there so its computation occurs in parallel (at the same time) as the other options. Power is used in modern ICs when signals change state, so there is a slight power cost if unused circuits are changing values. There is, for instance, a lot going on in the Add16 in the ALU during operations that use the And function, so there is some power wastage. The circuitry to make the ALU not compute the sum and carry would likely require more power than would be saved! Many microcomputers for embedded systems have lots of interfaces for various communications buses and most of the time the product using the microcomputer only needs 2 or 3 of the many interfaces that are provided. These microcomputers often have built in mechanisms to power down sections of the chip that are not used. This is something that the firmware can do during initialization if there is a need to reduce power usage. --Mark
Open this post in threaded view
|

## Re: How do I do an If statement ?

 I was also thinking of another option for implementing if in hardware, namely to use 3-state logic. You  put the "switch" first and it would forward the the signal to the correct branch. The other branch (or branches) would receive "floating". You can then just wire the outputs together. To make that work all the parts need to be 3-state aware or hidden behind a controlled buffer or inverter. I'm not sure if it has any advantages or drawbacks from the hardware stand point, but I think it should work. One advantage is, that it's closer to the way software developers think of if and switch statements. N.B. This cannot be used in the Nand2tetris Hardware Simulator, because it doesn't support three-state logic.
Open this post in threaded view
|

## Re: How do I do an If statement ?

 Administrator This post was updated on . CMOS has problems with floating inputs so you would need pull-up resistors or "bus hold" circuits on all the inputs that might be tri-stated. When the input voltage gets close to midway between "0" and "1" both the pull-up and pull-down transistors begin to turn on and large amounts of current flow.   This image comes from an interesting app note from Texas Instruments. Warning, it's rather technical. There is a related problem with slow transitions between "0" and "1", which can cause oscillations in the gate's output. In this case "slow" means 200 nanoseconds!       This image comes from another TI app note. --Mark
Open this post in threaded view
|

## Re: How do I do an If statement ?

 So this is my understanding of this post. We do not go for hardware implementation of the "if comparison" because it is more costly. Instead, we compute both results (in this case Add16 and And16) first and then discard one of them using Mux. Now is here is my first question: 1) How can you generalize this result without looking at the instructions to be executed? Suppose both branches are much more costly than the comparison (i.e "then" and "else" costlier than "if comparison"), wouldn't it be efficient to compute "if comparison" first and then compute either "then" or "else" part. Now you also said in one of the answers in the thread that the cost of not computing SUM and CARRY using ALU is higher than calculating both the parts and later discarding one of them. Here is my second question/thought: 2) Processors these days provide ILP. And Superscalars also do "out of order execution". So there is no question about the cost of not computing the SUM and CARRY. Instead, the processor can execute another instruction on the unused part. In other words, we first do the "if comparison" then compute either AND16 or ADD16 and processor can execute another instruction(in parallel) on the unused part (possible because of "out of order execution"). Please share your thoughts.
Open this post in threaded view
|

## Re: How do I do an If statement ?

 invin wrote 1) How can you generalize this result without looking at the instructions to be executed? Suppose both branches are much more costly than the comparison (i.e "then" and "else" costlier than "if comparison"), wouldn't it be efficient to compute "if comparison" first and then compute either "then" or "else" part. Let's say the if computation takes 1 time unit, the then and else computations need 10 time units (TU) each. If you first compute the if and then one of the other computations you'll end up with 1TU + 10TU = 11TU. You may need some more time units to implement the switching logic, which would choose the then or the else circuit. In case of the common implementation, you'll compute if, then and else in parallel, so you'll need MAX(1TU, 10TU, 10TU) = 10TU. You'll only need one selector afterwards to choose which one to propagate. So, if anything, this scenario is a bit faster the way it's normally implemented. It would be a bit different if the then and else computations tool significantly different times. Let's say 5TU and 15TU respectively. Now in your case, the computation if then would need 1TU + 5TU = 6TU time units, and the computation for else would need 1TU + 15TU = 16TU. Assuming that both branches happen with 50% probability, the average cost would be (6TU +  16TU)/2 = 11TU. In the common implementation you'd need MAX(1TU, 5TU, 15TU) = 15TU. So clearly your approach is better in this case. But there is a problem. How would the external circuitry (the one that uses this if-then-else implementation) know when the answer is ready? Normally, this happens by timing, and the way it's done is to take the worst case (slowest) time and make sure you're not using the answer provided by the circuit before that. So again, this means that the user will wait for 16TU even if the circuit is done in 6. That's why part of the hardware optimization is to identify this critical route (or routes) and try to shorten them. There's no much point in shortening non-critical routes. invin wrote Now you also said in one of the answers in the thread that the cost of not computing SUM and CARRY using ALU is higher than calculating both the parts and later discarding one of them. Here is my second question/thought: 2) Processors these days provide ILP. And Superscalars also do "out of order execution". So there is no question about the cost of not computing the SUM and CARRY. Instead, the processor can execute another instruction on the unused part. In other words, we first do the "if comparison" then compute either AND16 or ADD16 and processor can execute another instruction(in parallel) on the unused part (possible because of "out of order execution"). I think most of the time you'd want to reuse larger parts. For example, you can reuse the ALU but not the AND16 or ADD16 in it. Here is why: Suppose you know that the AND16 is in use, but ADD16 is free and you want to use it. How do you provide the two inputs? You can't just use the same wires, because they are in use by the AND16 at the moment. And where do you get the result? Again, you can't use the same wire as AND16 at the same time. So you'll need to have additional inputs and outputs (don't forget the zx and ng outputs as well). You'll need to make the ALU much more complicated to make it possible to compute two different calculations at the same time. It is not impossible, but it only pays off for larger parts.
Open this post in threaded view
|

## Re: How do I do an If statement ?

 Administrator In reply to this post by invin It is important to note the difference in scale. A top end processor these days can have more than 10 billion (10^10) transistors. A naive implementation of the Hack CPU can be built with less than 5000. (1186 Nands) For comparison, the Intel 8080, a much more complex CPU albeit only 8 bits wide had 3500 transistors. For nand2tetris, the hardware is on a very small scale. There are few opportunities for optimization at this level. invin wrote 1) How can you generalize this result without looking at the instructions to be executed? The hardware is always there; as long as it has power, it is always computing something, possibly garbage values like the ALU in the Hack CPU during A-instructions[1]. Turning power on and off is a very slow operation and is especially tricky for sequential circuits because they generally need a power-on-reset so that they come up in a determined state. Power control like this is generally only done for portions of a processor that are not tightly interconnected, for instance integrated peripherals can have simple enough interfaces to the processor's I/O system and memory to be so controlled. This makes the practical trade off choosing between duplicating sub-circuits or sharing them. As Ivan pointed out, wiring is a problem.  In software, a function can be called from 100 places in the code and there is no additional cost to the function's code; just the incremental cost for each call. In hardware, the inputs from each "caller" need a wire routed to the shared circuit, through some additional circuit that controls which of those wires is logically connected (mux, 3-state driver, etc.) The shared outputs also need to be larger to be able to supply more power to overcome additional load. More output wiring also means more capacitance which results in slower operation. These tradeoffs usually make it impractical to share circuitry as small as adders and Ands. 2) Processors these days provide ILP. And Superscalars also do "out of order execution". ILP and reordering aren't new concepts. Back in the '80s there were mainframes that had multiple functional units and could execute pipelined instructions without stalling if there were no register/computation conflicts. The compilers did the instruction reordering and could flip branch conditions to favor execution speed. (Earliest form of speculative execution?) Doing the reordering on-the-fly in hardware is impressive. Speculative execution even more so; I understand the high-level concepts but haven't read at all about the implementations. Both the hardware/software and hardware only versions are continuing attempts to have the hardware spend as little time as possible computing garbage. _______________ [1] You can re-purpose the ALU during A-instructions to eliminate the A register input mux.  Force the ALU to compute "y" during A-instructions and send the A-instruction value to the ALU y input through a 3-way mux selecting between A register, "inM" and "instruction".
Open this post in threaded view
|