Linux and Open-Source Applications
Can you trust your computer? This question is becoming increasingly important as both consumers and businesses adopt the Internet as a medium for financial transactions. Even more worrying is the number of computers with Internet access that are also used to store sensitive technical or corporate information. We feel that by using 128-bit encryption and similar techniques, many users are simply deluding themselves into thinking their data is secure.
The proliferation of Internet connections in the last few years is the source of a major security concern. The real threat is not to data traveling over secure connections, but rather comes from an inability to safeguard the data on the machine itself. By using open-source software as well as multiple software distribution channels, we believe potential network-related security problems can be largely eliminated.
Credit-card information is often transmitted between users and commerce servers using 128-bit encryption provided through the “secure socket layer” in many web browsers. Information within financial organizations is usually transmitted through private links, relying on the integrity of the communications carrier to help ensure privacy, or through virtual private network links where network encryption devices protect the data.
While these methods of ensuring the secrecy of transmitted data have proven to be quite practical, they overlook what we consider to be a potentially far more serious security leak: direct Internet connection of computers at either end of the secure link, combined with the presence of untrustworthy application or operating-system software on those machines.
The world's first major publicized experience of the damaging potential of a worm—a virus-like program that propagates through a network by taking advantage of security loopholes—was on November 2, 1988 (see Resources 1). A graduate student whose worm accidentally went wild managed to bog down thousands of Sun 3 and VAX computers that made up the backbone of the early Internet.
What is not so well-known is that the worm included a high-speed password-cracking algorithm, designed to allow it to gain access to more privileged operating-system functions. Had the worm been designed for espionage and operated at a much lower CPU priority level, it could well have scanned thousands of machines for “interesting” information and quietly sent data back to its author without anyone even noticing.
Computer users today are all too aware of the risks that viruses pose. Virus checkers are now commonly used to search out and destroy viruses that could do harm to a computer system. However, viruses are usually detected because of some action which they initiate to call attention to themselves. A virus developed for espionage would likely be explicitly designed to avoid doing anything that might call attention to its existence. As with the Internet worm, when armed with the appropriate technology to scan a system for interesting information, a virus could take advantage of an Internet-connected computer to “call home” with its findings and await further instructions.
What if the snooping software was intentionally loaded by the user and thus could never be detected by a virus checker? Easter eggs are sections of code within most common application software packages that can be activated by a series of undocumented commands. Eggs generally do something completely unrelated to the host application and unintended by the manufacturer. That isn't to say that eggs occur accidentally—they owe their existence to programmers who secretly add sections of code without any authorization from their employer to do so.
Most known Easter eggs are small and harmless, but as applications and operating systems have gotten larger over the last few years, it has become possible for rogue employees to include much more elaborate eggs and have them released with the finished product. An Easter egg that provides a good example of a significant amount of code being included in a popular product is the flight simulator in Microsoft Excel 97 (see Resources 2).
Word 97 has a surprising dictionary entry: if a sentence includes the word “zzzz”, the auto-spell-checker underlines it and offers “sex” as the corrected spelling. It's clear that Microsoft did not know about that entry when they released Word 97, and demonstrates the inability of the manufacturer to adequately monitor its programmers.
Virtually all versions of Microsoft Windows and application programs from other vendors also contain a plethora of Easter eggs (see Resources 3).
Back doors can easily be embedded in large programs. Occasionally they serve the legitimate function of allowing a manufacturer to perform remote maintenance. But what if a manufacturer embedded a secret door to be used for devious purposes? Would the user even notice?
Programs have certainly become too big for inspection of their executable code for possible security loopholes. More often than not, we don't actually know what a program such as a word processor is doing at any one time—perhaps it is saving a backup copy of the document being typed; perhaps it is scanning the hard disk for credit-card numbers.
Are any of the programs you are running doing tasks you aren't aware of? While writing this article, a prompt appeared on one of the author's screens, informing him that MS Explorer had committed an illegal operation and would be shut down—only he had never explicitly launched Explorer. When starting Visual C++ at home, Windows 95 tries to connect to an ISP. Visual C++ on a machine at work starts without incident, presumably having made the connection through its permanent Ethernet connection to the Internet. If it were not for the first machine, we would never have suspected any network connections were being made.
Early versions of Windows 98 also had an interesting feature. As a result of a programming error, the network registration section passed on system and personal-identification information to the operating system's manufacturer, even if the user explicitly elected not to do so (see Resources 4).
Some versions of Netscape Navigator had an unintended, back-door-style bug first discovered by Cabocomm, a software company located in Aarhus, Denmark (see Resources 5). Web site operators could exploit this error to allow them to upload the contents of any file on the Netscape user's hard disk, making anything on a machine running Netscape 2.x, 3.x or 4.x world-readable to even inexperienced web page creators.
Given the various options discussed so far, what would be the best way to infiltrate as many computers as possible with a data-gathering agent? An ideal vehicle for such a program would be a large application, like Microsoft Office. The overwhelming success of this product has led to its installation in a very high percentage of computer systems. The trick, of course, would be inserting the rogue code into the host program in the first place. Like most corporations, Microsoft would never approve of something like this. However, from the Easter egg examples, we know there are sections of common software packages that definitely did not get corporate approval and can contain substantial functionality.
The unauthorized code could be further hidden by encrypting large portions of it and having some small code fragment decrypt and activate it on demand. An even more flexible technique would be to have a small Easter egg determine if the computer is connected to the Internet, and if so, open a connection to some foreign host. By downloading code from a remote site at runtime, the Easter egg could be tailored to do something to a specific computer or group of computers that wasn't even thought of at the time the original code was created. Perhaps the code would look for computer schematics if the egg was running on a machine inside the ibm.com domain, or automotive sales figures inside gm.com. With browser functionality becoming more and more embedded in operating systems and applications, one more web connection would appear as harmless as the thousands of other web connections continually being made from the victim computer during the course of the day.
The amount of code used by modern programs prevents any manual scrutiny by a few programmers from providing meaningful verification that a program is “safe”. Expert systems, such as those used to track down Y2K problems and conventional viruses, could be used to try to uncover rogue code, but encrypting the implant could render this approach ineffective. Our conclusion is that there is no way a user could effectively scrutinize the object code of an application to determine that it is “safe”. Neither can any software manufacturer.
One possible solution is to make our operating systems more secure. Microsoft Windows NT is a substantial improvement in security from its Windows cousins. NT provides good password security and the ability to regulate access to system resources by different categories of users, and it has generally acceptable network-security features.
What if the OS itself is not safe? We have already suggested that large programs cannot be screened for security violations by programmers or expert systems. The latest operating systems certainly fall into that category, with the result that we cannot be sure the OS itself is not the source of a major security risk. Indeed, most operating systems also contain Easter eggs of one form or another. Thus, there is little point in being concerned about the security risks of application programs if the operating system is suspect.
Open-source software offers a way out of this dilemma. If the source code is open, it can be inspected, and security holes can be found and fixed. Intentional security violations become much harder to hide and will almost certainly be discovered by the thousands of amateur and professional programmers on the Internet. If the source is truly open and widely distributed, flaws of all kinds will be discovered and announced on web sites and newsgroups. This form of interaction has proven remarkably effective in making Linux one of the most (if not the most) stable operating systems available.
With this type of public code review, can users be reasonably sure that Linux is trustworthy? Can users be sure that, if sufficient safeguards are incorporated into the OS, their data is secure? Linux had to be compiled using a compiler—what if the compiler was corrupted? It appears we now have to insist that even the compiler used to compile the OS should be open source. For truly concerned users, even that will not be enough, and a procedure involving multiple compilations on different platforms using different initial compilers would be required to produce the object code of the open-source compiler used to compile Linux.
In order to create a Linux build you can trust with sensitive information, you first need a compiler known not to insert hidden code as it compiles the operating system. How do you create a trusted compiler when starting off with compilers and operating systems that are not trustworthy? We propose the following as a possible sequence of steps.
Have thousands of programmers on the Internet inspect the source code of the compiler/linker, GNU C++.
Create an executable of the compiler/linker by compiling the source on a number of different platforms using different compilers and linkers.
Use the newly compiled compiler/linker executable on each of the different platforms to cross-compile themselves, as well as a number of different test programs, to a single platform such as x86.
The cross-compiled compiler/linker and sample program executables on each of the different platforms are then compared. If they are not identical under a byte-by-byte comparison, one or more of the newly generated compilers/linkers is probably subject to a security problem, and the system(s) and compiler source code should be investigated.
Assuming all the newly generated compiler and sample executables are identical, it can be asserted with a large degree of confidence that both the intermediary compiler/linker executables and the re-compiled compiler/linker executables are trustworthy.
Now that a safe version of GNU C++ has been created, the next step is to repeat the process to create a secure Linux build:
Have thousands of programmers on the Internet inspect the source code of the operating system, key system libraries, utilities and scripts.
Cross-compile executables of the operating system and its libraries/utilities to a common architecture (x86) using the trusted compilers.
Perform a byte-by-byte comparison of the executables and proclaim them trustworthy if they match.
Now, build a minimal Linux system installation using the trusted components, and gradually expand on the system's functionality by certifying all additional components.
As soon as a trusted Linux platform has been created, a similar process could be used to create a trusted browser. Because the browser is the application used to download other applications as well as communicate securely for e-commerce, it deserves special attention. To create a trusted browser:
Have thousands of programmers on the Internet inspect the source code.
Compile executables using a trusted platform and compiler.
Using a trusted compare utility (on CD-ROM), periodically compare CD-ROM versions of the executables of the operating system, compiler, utilities and applications (including the browser) with what is currently on the hard disk, just to ensure that an application hasn't used some hidden code to corrupt the platform.
Since Netscape Corp. has taken the initiative to open their browser code, Netscape would be the logical choice as a trusted browser. Ideally, all other applications to be used should be made trustworthy too, so the set of steps listed above should be carried out to create each new trusted application.
Of course, getting users to carry out this certification process would be impossible. What is really needed is a system of software repositories—or “banks”--from which users can obtain certified versions of Linux and associated applications.
A national organization, such as the U.S. National Security Agency, could verify open-source programs and place both source and binaries on the Web for immediate download. However, this approach would be subject to the same concerns that make closed-source software insecure. A disgruntled employee could add some extras to the certified code, or perhaps a government organization will decide that having a back-door might be useful for national security reasons.
Clearly, no single testing organization can be trusted. A better approach would be to have three or more certification organizations, each with its own download site. The National Security Agency in the U.S., the Communications-Electronics Security Group in Britain and the Communications Security Establishment in Canada could each independently verify and make certified binaries available. A user could then download the same binaries from all three sites and be sure they are trustworthy if, and only if, no differences are found. While there is a potential security problem in downloading over the Internet (after all, a devious ISP could intercept the FTP request and divert it to a rigged server), the likelihood of that is small and the chances of it being discovered are high.
For even greater security, each of the major certifying sites would also make certified CD-ROMs available, preferably each with a simple file-comparison program directly bootable from the unalterable CD. That way, one could order certified CDs from two or more certifying agencies and do a quick file comparison between them as a final verification. The write-only nature of CDs would also prevent any corruption on one from contaminating the other CDs.
Of course, trusting the U.S., Britain and Canada's electronic espionage agencies might leave something to be desired. By requiring each certifying agency to make not only its certified binaries available but also the original source code, it would be possible for other countries, companies or individuals to set up their own complementary certifying sites. Presumably, millions of Internet users would be continuously watching the various sites offering certified applications and operating systems, and a sudden discrepancy at one of them would be noticed, investigated and exposed. By having each certification organization keep its own set of confidential source-code examples for testing the output of compilers being certified, one could dramatically reduce the already small chance of a clever compiler recognizing test code and producing sanitized executables during certification.
At this point, it is also worth emphasizing that a proliferation of independent certifying sites for open-source software located around the world would not only be an excellent safeguard against any sort of Easter egg or back door, but would also ensure that bugs—particularly the security-sensitive ones—are exposed and quickly corrected.
There are potentially severe security problems arising from the inherent nature of closed-source software and its use on Internet-connected computers. While the chances of someone planting a globally or even nationally destructive section of code in a popular operating system or application program is low, the consequences of such an event are potentially too disastrous to ignore. Indeed, a well-orchestrated Easter-egg attack could make the Y2K problem look miniscule in comparison. To safeguard against these problems, the solution is the replacement of closed-source applications and operating systems with certified open-source programs. Organizations providing banks of certified trusted applications and operating systems could provide a vital public service.
Peter F. Jones is a research engineer at Neptec Communications in Ottawa, Canada. He received a B.Sc. (1986) and a Ph.D. (1993) from the Department of Electrical Engineering at Queens University, Kingston, Ontario, Canada and is also a licensed engineer (P.Eng). Peter has worked on a variety of software projects including writing SVGA card graphics drivers, creating a Java web search engine, and developing a Linux-based multiple-sound card interface library for an adaptive antenna phased-array HF modem. He is currently working on two projects: developing a miniature single-board Linux computer for home and office applications and studying the characteristics of the Space Shuttle's TV cameras for the purposes of developing algorithms to reduce image distortions. Peter can be reached via e-mail at firstname.lastname@example.org.
Mark B. Jorgenson is at Neptec Communications in Ottawa, Canada. His B.Sc. (1984) and M.Sc. (1989) are both in Electrical Engineering from the University of Calgary and he is also a licensed engineer (P.Eng). Mark's main research focus is in wireless communications, with emphasis on link-layer aspects. Mark has recently led the development of a software radio prototype and is currently leading a team designing an advanced HF radio modem. He can be reached via e-mail at email@example.com.