The Prime Internet Eisenstein Search
The work for such prime number finding projects falls into two main areas. First, the head-scratching is performed by the mathematicians to determine how to find prime numbers and prove they are prime. If necessary, this step involves writing custom programs that are optimal for the task at hand. The second part is the computational work involving network communications and systems management. It makes for a productive partnership, with little of the overhead that accompanies larger projects.
Most large primes are found by distributed computing projects, as can be seen from the top finders' tables on the Prime Pages. Therefore a real but friendly sense of competition exists among projects and also among individuals involved. Both get scores and are ranked by discovering large numbers, the most numbers and numbers with particular special forms. For most of 2004, PIES was the single largest project by count of prime numbers, as it was working on a hugely fruitful band of small prime numbers, of about 50,000 digits. Alas, all those primes have dropped off the Prime Pages' Top 5,000 list, and the project now is only the third largest producer by count of primes. Phil considers the ranking by count to be not particularly important—large quantities of small primes are not particularly challenging. They also are a bad investment, as they don't stay on the list long.
Probably the best known search project is the Great Internet Mersenne Prime Search (GIMPS). This project is seeking the largest Mersenne prime number, which is, at the moment, also the largest prime number of any form. In February 2005, the largest known prime number, with 7,816,230 digits, was discovered. The calculations took 50 days on one 2.4GHz machine, and independent verification required an additional five days. A second verification took 15 days. The discoverer, Dr Martin Nowak from Germany, joined GIMPS six years ago. In essence, he has been calculating for six years to find this one number, only the 42nd Mersenne prime found. The 41st was discovered in May 2004; the project has found only eight since 1996. GIMPS lists about 41,000 people involved in the calculations, many of whom allow their personal machines' idle CPU cycles to be used to crunch numbers. Other participants have large academic or commercial facilities at their disposal, helping the GIMPS global network sustain over 17 teraflops.
According to Professor Caldwell, Phil has implemented an important advance by looking at numbers that often are quicker to test for primality than are the usual numbers. In a decade or so, such a project may be able to compete seriously with GIMPS for the primes of record size. This would happen not because they are somehow better projects but because the Mersennes numbers steadily thin out, and many other forms don't thin out so quickly. Even when the Mersennes once again lose their lock on the largest known prime, they may stay the most important primes because of their long history.
Professor Caldwell, who happens to run his Prime Pages on Linux, said, “PIES quickly has established itself as a major player by finding a large number of primes in the 100,000 digit range. I myself am a PIES member, and I really appreciate the thought and effort Phil has put into his system.” There are a number of other projects, some even in the teraflop category. In contrast to these larger projects, PIES is working on a smaller scale and within smaller ranges of digit size. More numbers are being discovered more quickly, in a sense backfilling from the high Mersenne numbers. In the list of the largest known prime numbers, PIES has reached 191,362 digits as of April 4, 2005, but expects to find a new larger prime roughly once a week.
For simply attracting project members, the ideal distributed computing setup is client-architecture neutral. All of the largest distributed projects work equally well with Linux, *BSD and other OSes running on the clients. Phil decided to use Perl to write both his client and his server, as it provided all of the networking primitives that he needed and is secure by design. The task of actually exchanging assignments and results is such a small part of the whole project that a p-coded, or compiled into intermediate form, language such as Perl is perfectly fast enough.
Phil intends to adapt his server so that it can be used for arbitrary distributed prime-hunting projects. He then plans to release it as free software.
PIES runs almost exclusively on Linux machines. Linux has enabled the installation of a reliable OS across many machines, and an individual license for each machine would be a significant cost. Linux administration is easy, making it possible for one part-time person to administer a cluster, and a lot of admin tools are available. Linux works well on almost any hardware, which means a 200MHz machine can be used as a subnet gateway or a DNS server, further saving money. Inexpensive hardware and a free OS gives the hobbyist the ability to run advanced facilities that produce first-class results.
Phil runs the project's central server from an Alpha machine at his home. He first used Linux as his OS of choice in late 1993 and turned his back entirely on what one might call high-street operating systems about five years ago. He has several other Linux machines, which he uses as clients. He typically develops for Linux only, but he doesn't discriminate against architectures—for example, he has enrolled several Alpha testers. He happily builds for BSDs and UNIX-alikes and begrudgingly for whatever else may be out there.
One of the US computational support sites is located in Vermont, in Bob Bruen's barn. The barn was built for horses, so there are stables and two open areas on the first floor and a large open space on the second floor. One of the two first-floor spaces now is the PIES computational facility. The facility was under construction before PIES to support work in Linux, networks and security. Rather than let the facility waste CPU cycles, Bob offered PIES access to the machines while they aren't working on other tasks. For a while now, there have been no such other tasks.
The Vermont facility is a dedicated cluster of more than a dozen machines ranging from 450MHz to 3GHz, with several SMP machines on a separate subnet in Bob's barn. The facility is linked by a wireless bridge to the house, which in turn has a cable modem connection to the Internet. The majority run Red Hat 9.0 and Fedora Core 2, but SUSE, Mandrake and Debian have been installed as well.
The hardware was purchased at auction or on sale, and it is server-class hardware: rackmount, heavy-duty case, some SMP, SCSI and a lot of memory. The same auctions yielded racks, switches and other hardware for pennies on the dollar.
Even with Linux as a foundation, there still are some problems. Although the auction-bought servers did not mind, one Dell SMP server failed in the extreme cold in the unheated barn. There were several days of temperatures 20–25 degrees below zero, Fahrenheit. Occasionally, an individual server has required attention, but by and large, as one would expect from Linux machines, they keep on running. The wireless bridge required a reboot once during the same cold snap. The most serious problem, however, has been the primitive electrical power in that part of Vermont.
One additional, small US facility is located in Arizona, where Steven Harvey, a criminal defense lawyer, runs PIES on Mandrake 10.1. He uses several machines for other prime number searches. Phil, a permanent resident of Espoo (near Helsinki), which is also home to the server and his few client machines, is working several hundred kilometers away, in Turku. Within a few days of moving there, he already investigated buying a handful of refurbished PCs. In order to keep costs minimal, he intends to buy systems with no hard or floppy drives, simply booting off a CD instead. Although Debian is his favorite distribution for desktop and server use, he plans to boot diskless machines with Knoppix.
- Geek Guide: The DevOps Toolbox
- Download "The DevOps Toolbox: Tools and Technologies for Scale and Reliability"
- Nmap—Not Just for Evil!
- High-Availability Storage with HA-LVM
- Resurrecting the Armadillo
- DNSMasq, the Pint-Sized Super Dæmon!
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Localhost DNS Cache
- March 2015 Issue of Linux Journal: System Administration
- Days Between Dates: the Counting