The Compiler as Attack Vector
Media exposure of serious security threats has sky-rocketed in the last five years, and this has caused a strange parallel to develop. As software developers have become more aware of security problems and have taken steps to mitigate them during the development phase, attackers have been forced to become more insidious in exploit vectors. A possible vector that often is not explored is attacking the program as it is built.
I first encountered this idea while reading the September 1995 ACM classic of the month article “Trusting Trust”, by Ken Thompson. The article originally appeared in the August 1984 issue of Communications of the ACM, and it deals with the belief that ultimate security is impossible to achieve because in the chain of building an application there is no way to trust every link fully. The particular focus was on the C compiler for UNIX and how, within the build process, the programmer can be blind to the compiler's actions.
The same problem still exists currently. Because so many things in the Linux world are downloaded and compiled, an avenue of attack opens. Binary distributions like RPMs and Debian packages are becoming increasingly popular; thus, attacking the build machines for the distributions would yield many unsuspecting victims.
Before engaging in a discussion of how such attacks could take place, it is important to become familiar with the target, and how someone would evaluate it for places to attack. GCC, written and distributed by the GNU Project, supports many languages and architectures. For the sake of brevity, we focus on ANSI C and the x86 architecture in this article.
The first task is to become more familiar with GCC—what it does to code and where. The best way to start this is to build a simple Hello World program, passing GCC the -v option at compile time. The output should look something similar to that shown in Listing 1. Examining it yields several important details, as GCC is not a single program. It invokes several programs to translate the c source file into an ELF binary. It also links in numerous system libraries with virtually no verification that they are what they appear to be.
Further information can be gained by repeating the same build with the -save-temps options. This saves the intermediate files created by GCC during the build. In addition to the binary and source file, you now have filename.i, filename.s and filename.o. The .i file contains your source after preprocessing, the .s contains the translated assembly and the .o is the assembled file before any linking happens. Using the file command on these files provides some information as to what they are.
Listing 1. gcc -v
$gcc -v tst.c <snipped for length> as -V -Qy -o /tmp/ccAkwBG3.o /tmp/cczFkUQ2.s GNU assembler version 18.104.22.168.18 (i586-mandrake-linux-gnu) using BFD version 22.214.171.124.18 20030121 /usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/../../../crt1.o /usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/../../../crti.o /usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/crtbegin.o -L/usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2 -L/usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/../../.. /tmp/ccAkwBG3.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/crtend.o /usr/lib/gcc-lib/i586-mandrake-linux-gnu/3.2.2/../../../crtn.o $
The thing to focus on while looking through the temp files is the type and amount of code added at each step, as well as where the code comes from. Attackers look for places where they can add code, often called payloads, without being noticed. Attackers also must add statements somewhere in the flow of a program to execute the payload. For attackers, ideally this would be done with the least amount of effort, changing only one or two files. The phase that covers both these requirements is called the linking phase.
The linking phase, which generates the final ELF binary, is the best place for attackers to exploit to ensure that their changes are not detected. The linking phase also gives attackers a chance to modify the flow of the program by changing the files that are linked in by the compiler. Examining the verbose output of the Hello World build, you can see several files like ld_linux.so.2 linked in. These are the files an attacker will pay the most attention to because they contain the standard functions the program needs to work. These collections are often the easiest in which to add a malicious payload and the code to call it, often by replacing only a single file.
Let's take a small aside here and discuss some parts of ELF binaries, how they work and how attackers can use this to their advantage. Ask many people who write C code where their programs begin executing and they will say “main”, of course. This is true only to a point; main is where the code they wrote begins execution, but in actuality, the code started executing long before main. You can examine this with tools like nm, readelf and gdb. Executing the command readelf --l hello shows the entry point for the program. This is where the program begins executing. You then can look at what this does by setting a breakpoint for the entry point, and then run the program. You will find the program actually starts executing at a function called _start, line 47 of file <glibc-base-directory>/sysdeps/i386/elf/start.S. This is actually part of glibc.
Attackers can modify the assembly directly, or they can trace the execution to a point where they are working with C for easier modifications. In start.S, __libc_start_main is called with the comments Call the user's main function. Looking through the glibc source tree brings you to <glibc-base-directory>/sysdeps/generic/libc-start.c. Examining this file, you see that not only does this call the user's main function, it also is responsible for setting up command-line and environment options, like argc, argv and evnp, to pass to main. It is also in C, which makes modifications easier than in assembly. At this point, making an effective attack is as simple as adding code to execute before main is called. This is effective for several reasons. First, in order for the attack to succeed, only one file needs to be changed. Second, because it is before main(), typical debugging does not discover it. Finally, because main is about to be called, all the built-ins that C coders expect already have been set up.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Build a Skype Server for Your Home Phone System
- Validate an E-Mail Address with PHP, the Right Way
- A Topic for Discussion - Open Source Feature-Richness?
- Why Python?
- Tech Tip: Really Simple HTTP Server with Python
1 hour 39 min ago
- Reply to comment | Linux Journal
1 hour 47 min ago
- Understanding the Linux Kernel
4 hours 2 min ago
6 hours 31 min ago
- Kernel Problem
16 hours 34 min ago
- BASH script to log IPs on public web server
21 hours 1 min ago
1 day 37 min ago
- Reply to comment | Linux Journal
1 day 1 hour ago
- All the articles you talked
1 day 3 hours ago
- All the articles you talked
1 day 3 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?