# A more efficient square root algorithm

5 messages
Open this post in threaded view
|

## A more efficient square root algorithm

The square root algorithm presented in Chapter 12 is fairly easy to understand and implement, but it is a bit slow because it needs to compute 8 squares.

This post presents a more efficient algorithm that only requires addition, subtraction and shifting.

Algorithm

The algorithm is derived from the "long division" pencil and paper method of computing square roots. If you don't know this method, this page is the best presentation I found in a quick search.

Base-10Binary
```        5  6  0  4. 9  9
\/31 41 59 27.
5*5    25
6 41
106*6   6 36
5 59
1120*0        0
5 59 27
11204*4    4 48 16
1 11 11 00
112089*9   1 00 88 01
10 22 99 00
1120989*9    10 08 89 01
14 09 99
```
```       1  1  0  1. 0  1
\/10 11 00 01.
01     1
1 11
101    1 01
10 00
1101         0
10 00 01
11001     1 10 01
10 00 00
110101             0
10 00 00 00
1101001      1 10 10 01
1 01 11
```

The tricky part of this manual method can be finding the correct digit to append to the partial square root such that the partial difference isn't larger than the partial remainder. In the example above, my first guess for the 4th digit was 5, but 11205*5 =56025, which is too big. The algorithm can be converted to base-2 which makes the digit choice 0 or 1, and that's an easy decision; just a single comparison.

Notice that the base-10 "double and append a digit" is mathematically "double, multiply by the radix, add the digit." In binary, given the partial square root bbb, the partial difference will either be zero or bbb*2*2+1 = bbb01.

Implementation

To generated the square root of x:

1. Start with square root q and remainder r initialized to 0.
2. For each iteration:
1. Shift the two most significant bits of x into the least significant end of r. This can be thought of as the double width shift r:x = r:x << 2.
2. Shift q left 1 bit to make room for the next square root bit.
3. Generate subtrahend s.
4. If sr, subtract s from r and add 1 to q.
3. q contains the integer square root of x.
Notes:
1. Although this implementation actually computes the square root of 16-bit unsigned values, your Jack code should still check that x is ≥ 0 and raise System Error 4 if it is not.
2. Don't use r = r*4 to do shifting. That will result in a call to Math.multiply!
3. Step 2.1 is a bit tricky in Jack. Hints: What does the MSB of x mean? How do you test if the MSB of x is set?

Results

Running in the VM Emulator, my Jack implementation of this algorithm runs 5 times faster than the Math.sqrt in the supplied Math.vm file. Half the time in each iteration is spent on step 2.1. Jack is not good at bit manipulation!

Open this post in threaded view
|

## Re: A more efficient square root algorithm

 Can you give me step-by-step examples for each implementation step? I managed to understand the base-10 long division method but have no idea how binary version works. Like, when the iteration should be stopped? What does it mean to "Shift the two most significant bits of x into the least significant end of r."? And what does it mean to "Generate subtrahend."?
Open this post in threaded view
|

## Re: A more efficient square root algorithm

 Ju Se Hoon wrote Can you give me step-by-step examples for each implementation step? I managed to understand the base-10 long division method but have no idea how binary version works. Like, when the iteration should be stopped? What does it mean to "Shift the two most significant bits of x into the least significant end of r."? And what does it mean to "Generate subtrahend."? Found the the step-by-step explanation from FreeBSD git! https://github.com/freebsd/freebsd/blob/master/cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c#L291
 Administrator In reply to this post by Ju Se Hoon Ju Se Hoon wrote What does it mean to "Shift the two most significant bits of x into the least significant end of r."? The notation r:x means to treat the r and x variables as a single 32-bit value, so if r=0000..0010 and x=1101..01, then r:x = r:x << 2 results in r=00..001011 and x=01..0100. what does it mean to "Generate subtrahend."? Subtrahend is the number being subtracted in a subtraction problem. In this case the xxx01 values in the column on the left side of the "division" problem. This value is q01 = q*4+1. Because step 3 just did q = q*2, step 3 is s = q*2+1. when the iteration should be stopped? The algorithm iterates once for each bit in the output value. The argument must be non-negative, so it can be treated as a 16-bit unsigned number. The square root of a 16-bit number fits in 8-bits, so 8 iterations are required. Note that r:x = r:x << 2 is a bit tricky in Jack. My code does r:x = r:x << 1 twice: ```var int r, x; ... // Move the next 2 argument bits into the remainder (r:x = r:x << 2). let r = (r+r)-(x<0); // Shift r left 1 bit and merge in new x bit. let x = x+x; // Shift x left 1 bit. let r = (r+r)-(x<0); let x = x+x; ```I leave it for you to puzzle out how the "-(x<0)" adds x's most-significant bit. --Mark