|A BRIEF HISTORY OF FACTORIZATION TECHNIQUES
Frederick J. Smith (0537348)
March 7, 2006
CSE590 - Practical Aspects of Modern Cryptography
University of Washington
“The problem of distinguishing prime numbers from composites, and of resolving composite numbers into their prime factors, is one of the most important and useful in all of arithmetic.
…The dignity of science seems to demand that every aid to the solution of such an elegant and celebrated problem be zealously cultivated.”
-- C. F. Gauss, Disquisitiones Arithmeticæ, Article 329 (1801)
Prime numbers can be considered the atoms of arithmetic. Just as chemists organized the physical elements into a chart1, Eratosthenes in the third century B.C. discovered a relatively painless way for producing a list of prime numbers up to a certain value. And just as all physical molecules can be reduced to the set of atoms that make it up, so it is the case with composite numbers. For as long as there have been those interested in prime numbers, there have been those who want to decompose composite numbers into their prime constituents. The task of reducing a number into its prime constituents is known as factoring.
Although factoring a number into its prime factors is a problem of great importance in number theory, it is also very computationally intensive. This paper surveys the history of factoring numbers, and some of the various aids constructed along the way.
Some of the earliest factoring aids where tables of prime factors for numbers often published in the back of math texts. Rahn published a table of prime factors for the numbers up to 24,000 (excluding those divisible by 2 and 5) in 1659. In 1668 an extension of this work was published with numbers up to 100,000 by John Pell. By the 19th century factor tables existed for number up to 10,000,000. These factors were published in individual volumes of one million each.
For numbers outside the range of the published tables, however, other techniques where required for factoring numbers. Several early factoring algorithms existed.
Early Factoring Algorithms
Trial division is perhaps the simplest and most obvious method for factoring a composite number into its prime constituents. This method actually works quite well, and is quite fast for most numbers. In fact it has been shown that about 88% of all integers have a factor less than 100, and 92% have a factor less than 1000. For this reason most modern factorization algorithms start by using trial division. Unfortunately, if the number to be factored does not have a small prime factor, this algorithm can be extremely slow. A prime factor of any composite number must be at most. Therefore to factor a number using this technique you would start at the first prime p1, and see if it divides n. If so, you have found a prime factor. We will record this factor, and then continue by trying again to see if p1 divides n. As soon as p1 no longer divides n, we will move on to the next prime, and try again. Eventually we will have found all of the prime factors of N. Note that we don’t need to restrict our search to prime values, we can instead blindly proceed to the next integer. However, this is wasteful, as these composite values will never result in additional results, because each of the primes that make up this composite will have already been factored out.
Fermat’s (1601-1665) is considered by many to be the father of number theory. Although he published little himself, he wrote numerous letters to other mathematicians. His algorithm for factoring integers was found in an undated letter (about 1643) probably addressed to Mersenne (1588-1648), and is based on the following facts:
If a number can be written as the difference of two squares, then the factorization is obvious:
n = x2 – y2 = (x + y)(x – y)
if n = ab, where b ≥ a then x and y can be written
The idea is to find an x and y such that. We will start with.
Suppose n = 12317. We start by finding an initial x of 111. Since, and 4 is a square we are already done. . Therefore the factors of n are x-2 and x+2, or 109 and 113.
Fermat’s algorithm works well if factors (or product of some of these factors) of a number are near the square root of the number.
Euler (1707-1783) was an official at the French mint, and was remarkable scientist, and one of the most productive mathematicians in history. Euler’s method was mentioned in a letter of February 16, 1745 to Christian Goldbach (1690-1764). Euler’s method applied to numbers which can be represented as the sum of two squares. We’ll also assume N is odd.
Assuming N can also be represented as the sum of two other squares c2 and d2, we have the following:
, and therefore or.
Let k be the greatest common factor of a – c and d – b so that a – c = kl, d – b = km, and gcd(l, m) = 1.
Therefore, and since l and m are relatively prime, a + c is divisible by m. ie, a + c = mn, and d + b = ln.
Difficulty in Factoring
Despite the existence of these algorithms, factoring was a very difficult task when all the work was done by hand.2 As a demonstration of the wide spread acknowledgement of the difficulty of factoring numbers, consider the following story handed down through history. In October 1903 F. N. Cole (1861-1927) presented a paper entitled “On the factorization of large numbers”. Without saying a word he walked to the chalkboard, and wrote out the arithmetic for 267 – 1. Then he proceeded to multiply 193,707,721 x 761,838,257,287. Each of the two results matched. Cole had shown that the 67th Mersenne number was composite. The audience at the American Mathematical Society that day vigorously applauded Cole. This was the first and only time one record that an author ever received such applause without uttering a single word.
Employing Machines to Aid in Factoring
Several developments in the 20th century sped the process of factoring integers by mechanizing it. Around 1925 mechanical calculators in the US finally included support for multiply and divide. In 1927 D. H. Lehmer built a machine known as the bicycle-chain sieve. The machine consisted of long cylinder on which several bicycle chains of various lengths were looped. Each chain had a device on one of the links which would close a circuit when the link was in a given position. If all of the chains were in the same position at the same time, a current could flow all the way through the device, and notify the operator that an interesting number had been found. This machine could scan about 50 numbers per second. This machine led to the following two factorizations:
In 1932 D. H. Lehmer improved on this machine and built the Photoelectric Number Sieve. This device could perform 300,000 tests per minute. It also produced impressive results including the following factorizations:
This time instead of bicycle chains, very precise gears were used, and holes in the gears allowed light to shin through. If all of the gears were aligned, then light could go all the way through to a detector, and the detector would stop the device. An operator could now rewind the device slightly to find the values of the gears that resulted in them all being lined up.
With the introduction of electronic computers, these machines were also enlisted in the task of factoring integers. D. H. Lehmer gave an account of how “during a holiday weekend” the ENAIC produced 85 new factors of for k ≤ 500. By comparison a hand computer would find at most a few new factors in a few months. It became common practice to allow early computers to factor numbers in their idle time. There are reports of IBM engineers claiming that these factoring programs in a couple of occasions detected hardware problems that the standard tests did not find.
The Modern Era
With the advent of public key cryptography, research in factoring integers was invigorated. The security of public key cryptography depends on the difficulty of factoring integers, and the ability to factor large integers would provide a mechanism for breaking public key ciphers. In the August 1977 issue of Scientific American, Martin Gardner’s column described a challenge to try to factor a 129 digit number. The belief at the time was that it would take millions of years to factor this number. In reality it was factored 17 years later in 1994.
Carl Pomerance was one of the driving forces behind this factoring effort. As a student in school he was once asked to factor the number 8,051 in five minutes. He astutely assumed that there must be a trick to simplify the problem, and spent a few minutes thinking about it, but worried that time would run out, started attempting to factor the number with trial division. He did not factor the number in time. However, this event had a long term effect on Carl.3 When he heard of the RSA challenge, he thought of how to attack it, and came up with the Quadratic Sieve. In April 1994 several hundred desktops across 24 countries found the two primes that made up the number in 8 months.
Survey of modern Factoring Algorithms
Factoring algorithms can be broken into two main categories, special purpose and general purpose. Special purpose algorithms efficiency depends on the factors of the number being factored, whereas with the general purpose algorithms the efficiency depends on the number being factored. Special purpose algorithms work best when the factors are small, and are therefore not useful for factoring the modulus used in a public key cryptosystem (such as RSA). The special purpose algorithms listed below are considered Exponential algorithms – meaning their running time grows exponentially with the size of the factors. The general purpose algorithms are sub-exponential and thus the running time does not grow as fast as with the special purpose algorithms – making them more attractive as the size of the number to factor increases.
Pollard’s rho method
This algorithm is also known as Pollard Monte Carlo factorization method. There are two main ideas that make up this algorithm. The first, inspired by the birthday paradox, is the idea of iterating a formula until it falls into a cycle. Assuming n = pq, we can iteratively produce a sequence x, where. Iterating this formula or almost any polynomial formula for any initial value x0 will eventually fall into a cycle.
The basic method is as follows:
Pick two random numbers (mod N), x, and y
If x – y = 0 (mod N) we found a factor gcd(x-y, N), else go to step 1
Pollard’s p-1 method
This algorithm is a derivative of the Pollard rho method. The birthday paradox idea is replaced by Fermet’s little theorem. Fermat’s theorem claimed that for any a, and any prime p, . This means that if . p is therefore a factor of the left hand side. Poolard’s p-1 method takes advantage of this by trying to find values, and see if they have a factor with N.
The basic method is as follows:
Choose a between 1 and N
Choose a k (ex, k = 2)
If gcd(a, N) != 1 we found a factor, else
Set t = ak mod N
Set d = gcd(t-1, N)
if d divides N then we found a factor, else
select new values for a and k and go to step 3
This algorithm uses residues from the continued fraction of for well chosen m to obtain a square number. The algorithm solves by finding m where m2 (mod n) has the smallest upper bound.
Sub-Exponential Factoring Algorithms
The algorithms mentioned in this section are too complicated to be described in this brief paper, but some basic facts about the algorithms are provided. It should be noted that since the run-time of these algorithms is sub-exponential the increase in time required to factor larger and larger integers is lower than with the exponential algorithms, and thus these are the only real candidates for factoring extremely large integers which don’t have a special form, or small factors.
The basis of most modern factoring algorithms is an enhancement of Fermet’s difference of squares technique, introduced by Maurice Kraitchik in the 1920s. He suggested that instead of trying to find x and y such that x2 – y2 = n it might suffice to find x and y where x2 ≡ y2 mod n.
Until the advent of the Number Field Sieve, this was considered the best general purpose factoring algorithm. The algorithm is still widely regarded as the second-best general purpose algorithm for factoring large integers. The running time of this algorithm is exponential in the square root of the length of the integer to be factored.
Number Field Sieve
The number field sieve is currently considered the fastest general purpose factoring algorithm. It was used to factor the RSA-130 number.
Future of Factoring
There are currently at least three factors that contribute to the time it takes to factor a number of a fixed size.
The algorithms used
The speed of the machines used
The number of machines used
While it is impossible to predict what algorithmic improvements will appear in the future, it is reasonable to assume that steady progress in factorization algorithms will continue.
Despite repeated warnings that it will eventually come to an end, Moore’s law has continued to hold true.
The number of machines which could be brought to bear on a factorization problem is very large. The RSA 129 factorization project used idle time on machines from around the world. The organizers of that project estimated that they were able to obtain about 0.03% of the internet’s resources. They estimate that they could have organized a project using as much as 3% of the internets resources, without extraordinary effort. Since most of the modern factorization algorithms lend themselves to distributed processing it seems likely that an organization could bring an enormous amount of processing power to bear on the project of factoring.
These improvements in factorization technology will required that those using encryption to protect resources will need to take into account when in the future their keys may be broken. Fortunately they can make relatively educated guesses -- barring a massive advance in technology.
One potential quantum leap in factorization is that allowed by a quantum computer. It remains to be seen whether quantum machines large enough to perform useful calculations can actually be built. However if the can, then algorithms have already been designed which could be run on these new machines to factor integers exponentially faster than what is possible today on a traditional machine. Peter Shor developed one such algorithm, and in 2001 IBM demonstrated a 7-qubit quantum computer which factored 15 into 3 and 5.
Should quantum computers ever become a practical reality, then public key cryptosystems of the form we use today would be rendered obsolete.
Eric Temple Bell, Mathematics - Queen and Servant of Science, Tempus Books, 1951
Richard P. Brent, “Recent Progress and Prospects for Integer Factorisation Algorithms”
John Brillhart, D. H. Lehmer, J. L. Selfridge, Bryant Tuckerman, S. S. Wagstaff, Jr., Factorizations of bn±1, b = 2, 3, 5, 6, 7, 10, 11, 12 Up to High Powers, Third Edition, American Mathematical Society
S. C. Coutinho, The Mathematics of Chiphers, A K Peters, Ltd, 1999
Richard Crandall, Carl Pomerance, Prime Numbers - A Computational Perspective, Springer, 2001
D. E. Knuth, The Art of Computer Programming, Vol. 2, Addison Wesley, third edition, 1997
A. Lenstra, “Integer Factoring”, Designs, Codes and Cryptography, 2000
Peter L. Montgomery, “A Survey of Modern Integer Factorization Algorithms”, 1994
Andrew M. Odlyzko, “The future of integer factorization”, 1995
Oystein Ore, Number Theory and Its History, Dover Publications, 1976
Carl Promerance, “A Tale of Two Sieves”, 1996
Richard Rubinstein, D.H. Lehmer’s Number Sieves, The Computer Mueum Report, Volume 4, Highlights, Spring 1983 http://ed-thelen.org/comp-hist/TheCompMusRep/TCMR-V04.html
Marcus du Sautoy, The Music of the Primes, HarperCollins Publishers, 2003
Eric W. Weisstein. "Pollard rho Factorization Method." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PollardRhoFactorizationMethod.html