X-From-Line: nobody Thu Feb 1 12:17:33 2001 Sender: paul@saltationism.subnet.hedonism.cluefactory.org.uk Newsgroups: sci.crypt Subject: Re: Barrett modular reduction References: <94nlm2\$sm\$1@nnrp1.deja.com> <94orld\$vuu\$1@nnrp1.deja.com> <94plgo\$lso\$1@nnrp1.deja.com> <94q1o4\$2g2\$1@nnrp1.deja.com> <94s5at\$r18\$1@nnrp1.deja.com> From: Paul Crowley Date: 01 Feb 2001 12:17:32 +0000 Message-ID: <87k87aiomb.fsf@saltationism.subnet.hedonism.cluefactory.org.uk> User-Agent: Gnus/5.0803 (Gnus v5.8.3) XEmacs/21.1 (Capitol Reef) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Lines: 86 Xref: saltationism.subnet.hedonism.cluefactory.org.uk misc-news:992 X-Gnus-Article-Number: 992 Thu Feb 1 12:17:33 2001 advonet@my-deja.com writes: > 4x faster would be awsome! I'll try it and let you know how it comes > out. There are many tricks to speed up RSA operations; I enjoyed reading descriptions of some of them in Chapter 14 of the Handbook of Applied Cryptography recently. Download chap14.pdf free from http://www.cacr.math.uwaterloo.ca/hac/ In particular, you should get faster results with Montgomery multiplication than you do with Barrett reduction. Their description of it is a bit hard to follow, so here's some hints. Definitions: m is the modulus, assumed constant and implicit throughout. b is the base you're working in; on a normal PC, b = 2^32 normally. R is the smallest b^r greater than m; in other words, if r is the number of "digits" (ie words) used to represent m, then R = b^r. Since m is odd, gcd(m,R)=1, and thus there exists a "multiplicative inverse" R' such that RR' = 1 (mod m), which you can find using Euclid's algorithm; if you're not familiar with this, read Chapter 2 of the HAC. In the example in Table 14.7 of HAC, they use base 10 (b = 10) so you can see what's going on. m = 72635, five digits so r = 5 and R = 10^5 = 100 000. Follow the example alongside the text. Suppose we've got some number T that might be bigger than m; in fact 0 <= T < mR. Barrett reduction would give us T mod m very rapidly. Faster yet, Montgomery reduction will give us TR' mod m. Is this any use? Yes, as I'll explain later. Suppose the first r "digits" of T were all zero (say T = 3979600000 as in the example). That would mean that T was conveniently divisible by R. Then T/R (39796) would be the right answer; it's less than m, and it's equivalent to TR' mod m; in (mod m) world, multiplying by R' is just another way of dividing by R. And dividing by R would simply be a matter of shifting those zeroes out of the way. So the trick with Montgomery reduction is to work out an integer U such that (T + Um) is divisible by R. This is legit since you can add whatever multiples of m you like to T and it's still the same mod m. Better yet, you never explicitly have to work out all of U; you work it out a digit at a time, starting from the least significant digit, and then multiply that digit by m and add it to T. This turns each digit of T to zero in turn, starting with the least significant digit. Once you've zeroed r digits, you can divide by R. Look at the example and the algorithm 14.32 to see how this is done. Now, T < mR and U < R, so (T + Um) < 2mR and (T + Um)/R < 2m. So finally, check if the result is bigger than m and subtract m if not. Now you have TR' mod m. Montgomery multiplication is just multiplication any way you like [1] followed by Montgomery reduction, so Mont(x,y) = xyR' mod m. This works because if x,y < m, then xy < m^2 < mR. Is this any use? Won't that R' constantly get in the way? As it turns out, there is a way around it. We use multiplication by R to correct for the R' factor (since RR' = 1 mod m), but here's the cunning bit: you correct *right at the start* of your exponentiation, to minimise the work you have to do at the end. If you want to work out x^d, start by working out xR mod m. You can think of this as the "Montgomery representation" of x, because it has this handy property: Mont(xR mod m, yR mod m) = (xR)(yR)R' = xyR (mod m) - so multiply the "Montgomery representation" of x and y together using Montgomery multiplication, and get the Montgomery representation of xy! So, if you're doing exponentiation, convert x to Montgomery representation, work out whatever powers you like with Montgomery multiplication, and then convert back. This will be much faster. With the right precomputed constant (R^2 mod m) you can even use Montgomery multiplication to do the conversion - HAC describes this. There's lots more tricks in Chapter 14 (including the CRT trick - see 14.75) and you could have fun writing a super-optimised RSA implementation. Let us know how you get on. Footnote [1]: For this I recommend checking out another great speedup, Karatsuba multiplication: see http://www.math.niu.edu/~rusin/known-math/99/karatsuba http://www.users.csbsju.edu/~cburch/proj/karat/ -- __ \/ o\ news@paul.cluefactory.org.uk /\__/ http://www.cluefactory.org.uk/paul/