Source Code Scanners for Better Code

by Jose Nazario

Coding is tough enough, and coding right can sometimes seem an almost impossible task. Between design constraints, deadlines and making it work in the first place, it's difficult to get your code secure. Security concerns, however, can be helped by scanners.

Source code scanners are nothing new. Tools like lint have been around for many, many years to help you find errors in your C code. There are even lint checkers for DNS and HTML files, nslint and weblint, respectively. The lint tool doesn't explicitly talk about insecure code, though, only basic inconsistencies in your style.

Source code audits are also great things, things like the Linux Auditing Project. However, audits can take a lot of time and money if you outsource them, though the payback is well worth the investment. Source code scanners used available for only commercial software, prohibiting their use by many Linux developers and researchers, who didn't always place user friendliness high on their lists of goals.

A couple of years ago, pscan was released under the GPL. It's a simple format string scanner but one of the first such tools commonly available on the hobbyist developer market. Since then, three new source code scanners have become available for source available software, and all three are worth looking at. Not all of them are open source, but they do work well for most users coding in C or C++.

The resources listed at the end of this article are worth checking out. The Open Source Quality Project has links to several sites, including commercial products and academic research groups, involved in performing software quality testing. Some of these tools can do more rigorous testing but can be far less user-friendly.


Developed by the noted author and coder David Wheeler, Flawfinder is a Python program that can be used to assist auditing C and C++ code. It's in the early stages, currently at version 0.19, but already it's strong competition for the other tools listed here in the 1.x stages. Flawfinder is also quite fast, covering thousands of lines of C code on a typical desktop machine in a matter of seconds. Flawfinder is released under GPL version 2, meaning it is free software.

Flawfinder also shows some intelligence when it comes to scanning for vulnerabilities. For example, in tests using intentionally insecure code, Flawfinder was able to distinguish between strcpy() from a constant sized string and variable length strings and tell the difference between vulnerabilities and false hits. Furthermore, Flawfinder understands the gettext libraries and their use in internationalization.


RATS, the Rough Auditing Tool for Security, is a source code scanner under active development that is capable of scanning C, C++, Perl, PHP and Python source code. Also released under the GPL, some of the RATS developers also worked on a similar tool, ITS4 (discussed below). The current version, 1.3, is noticeably more stable than version 1.2, but it still can fail on large amounts of input.

RATS is configurable when the source code is modified (through lexical analysis), with error messages controlled by XML reporting filters, which requires the XML tool expat to also be installed. At runtime, you can configure the level of output you wish to see (defaulting to medium), alternative vulnerability databases and even report functions that accept input from the user, facilitating the tracking of user supplied data.

Some of the specific limitations of RATS include the use of greedy pattern matchings, meaning that tracking for "printf" will match not only "print()" calls but also "vsnprintf()" and the like. This can make it difficult to filter hits from noise when specific functions are being sought with the -a flag.

The authors of RATS and Flawfinder, by the way, plan to coordinate their development efforts to produce a high quality, open-source development tool. This should be good to watch, as each development team is well respected in the field.


One of the original source available scanners for Linux and UN*X, It's the Software Stupid Source Scanner (ITS4), scans C and C++ source for common security-related flaws. Developed by Cigital (then known as RST Technologies) and some of the same people who went on to develop RATS, ITS4 has set the pace. Many of the niceties in ITS4 are also available in RATS or Flawfinder, such as ignorable lines, tracking user input (the -m flag) and alternative databases.

Modification of a particular project or site is relatively straightforward. A simple text file of vulnerabilities is used, which can be added to or modified to suit any specifics of a system. It should be noted that ITS4 comes with a non-free license, even though the source is available. Users should read and understand the license if they are doing anything beyond hobbyist programming with it.

Using the Tools

Use of these source code scanners is relatively straightforward, as they take for their input one or more source code files and then produce output. Our examples will focus on the file openldap-2.0.11/libraries/libldap/print.c, from the OpenLDAP 2.0.11 source distribution. This has been chosen because it highlights some of the subtle differences in these scanners. The code section which produces hits looks like this:

35-int ldap_log_printf( LDAP *ld, int loglvl, const char *fmt, ... )
37-     char buf[ 1024 ];
38-     va_list ap;
40-     if ( !ldap_log_check( ld, loglvl )) {
41-             return 0;
42-     }
44-     va_start( ap, fmt );
47-     buf[sizeof(buf) - 1] = '\0';
48-     vsnprintf( buf, sizeof(buf)-1, fmt, ap );
50-     vsprintf( buf, fmt, ap ); /* hope it's not too long */
52-     /* use doprnt() */
53-     chokeme = "choke me! I don't have a doprnt manual handy!";

Now, let's process this code piece using each of the three scanners to demonstrate their differences. First, we will look at the output from the RATS tool:

$ rats print.c
print.c:37: High: fixed size local buffer
Extra care should be taken to ensure that character arrays that are allocated
on the stack are used safely.  They are prime targets for buffer overflow
print.c:50: High: vsprintf
Check to be sure that the non-constant format string passed as argument 2 to
this function call does not come from an untrusted source that could have added
formatting characters that the code is not prepared to handle.
print.c:50: High: vsprintf
Check to be sure that the format string passed as argument 2 to this function
call does not come from an untrusted source that could have added formatting
characters that the code is not prepared to handle.  Additionally, the format
string could contain `%s' without precision that could result in a buffer

RATS found three errors with high severity that are worth noting. Notice that it reported the vsprintf() twice, however, with the second report being nearly identical to the first, with an added warning.

Now let's try Flawfinder on the same input file:

$ flawfinder print.c                                                           
Flawfinder version 0.15, (C) 2001 David A. Wheeler.
Number of dangerous functions in C ruleset: 40
Processing print.c
print.c:48 [4] (format) vsnprintf: if format strings can be influenced by an attacker, 
they can be exploited. Use a constant for the format specification. 
print.c:50 [4] (format) vsprintf: Potential format string problem. Make Format 
string constant. 
There are probably other security vulnerabilities as well; review your code!

Flawfinder found two unique problems worth reporting, but it doesn't note the fixed size declaration of "char buf[ 1024 ]" at line 37, which could become a problem (and it does on some platforms).

Lastly, let's use ITS4 on the same input and examine its output:

$ its4 print.c                                                                 
print.c:48:(Urgent) vsnprintf
Non-constant format strings can often be attacked.
Use a constant format string.
print.c:50:(Urgent) vsprintf
Non-constant format strings can often be attacked.
Use a constant format string.

Again, the same results as Flawfinder, tagging string format caveats. Notice that in each case the output is formatted differently, different functions are tagged by all three tools (this limited example doesn't highlight all of the differences), and none of them handle the conditional inclusion of safe or unsafe functions.

Interpreting the results is not always as easy as it may appear. For example, input sanitization can be a tricky, as is safely opening a file to avoid a symlink attack. Even copying strings can be tricky, as subtle issues like NUL termination, length and memory allocation all play a part. Astute programmers will still be required (see Resources).

Typically, the scanners are designed to be run over the source code several times during development, each time fixing or investigating the major problems. When coding, it's easy, initially, to forget to use a more secure function like strncpy(), and these tools help reinforce improved habits. When examining outside sources of code, scanners help highlight areas of code that may be problematic. In either case, these scanners help show you where to focus your attentions, and they cover many of the basic, common coding errors that lead to faulty code or, worse, security issues.

All of these tools are quite fast, operating on about 100,000 lines of input in under one second. Simply put, you'll spend more time trying to figure out the meaning of the output than you will running the checker itself.

Caveats of Source Scanning Tools

Source code scanners specifically designed to look for security flaws are obviously a help. However, there are a number of limitations users of them will have to keep in mind.

First, they will never replace a good manual audit of the code. There are simply too many variables that have not yet been abstracted into an automated scan.

Secondly, it is vital for code authors to understand the functions and libraries they are using and the nuances inherent in them. There will never be a replacement for understanding the source of errors, as these tools only list some of the possible security holes in the code. For example it can't dig into a library to find unsafe functions buried beneath other functions, unless the tool has been explicitly told that the function is unsafe. A similar constraint exists for types of data, such as int, char or longs. If you are using types you define in your code, the scanner most likely doesn't know how to handle them natively.

Lastly, scanners are limited, so far, in the languages they understand. This definitely limits their utility, wide though it may be for most of us. RATS is, so far, the lead in this arena, understanding five programming languages, while the other two are focused on C and C++ parsing.

Some of these limitations come from poor documentation of functions and their obstacles, such as the printf() family. Other errors come from a lack of standardization of how to perform actions securely, such as opening a file. And still others come from the lack of portable secure replacement functions. In this last case, it's probably best to implement the safe functions in your code and call them as needed on platforms which lack them. For example, several versions of snprintf() exist for platforms which lack it, covered under a variety of licenses.

Having the output of these scanners is only the start of securing the code. It's vital to correctly use more secure replacements, such as strncpy() or the right format strings in scrubbed user supplied data. This requires a good understanding of the functions and the code in which they're used. Off-by-one errors, for example, are easy to make when you forget to count NULL termination in your storage allocation.

One of the major problems evident with these scanners is the lack of any preprocessing, so no macros or definitions are expanded, and no external functions available in source form are examined. Therefore, code such as this:

   #define p(x) printf ## x
   char *string1, *string2;     /* user supplied */
   /* stuff happens ... */
   p((string1));                /* insecure! */
   p((string2));                /* again! how horrible! */
   p(("%s", string1));          /* finally, its correct ... */

may only produce one error in the definition but not in the use of the macro. However, an insecure call is made twice, which will go unnoticed by the scanner. While in this example a macro is used, the same issue applies to unsafe user-defined functions or wrapper functions. This has been the source of several major security holes found over the years, where internally defined functions, which are insecure, are used throughout the code. This additional layer hides the problems in the code. However, in this case, the tools flag the insecure function first, which can then be followed up to fix.

Preprocessing itself is an area of debate for static analysis. Sometimes, flagged code may not be in use on the platform on which development is being tested, in which case it can be ignored for that platform. However, it should still be noted that it may affect some platforms. In the OpenLDAP code example above, older systems without snprintf() would be affected by a potential buffer overflow. Issues surrounding coding adjustments are best addressed by developing a strong understanding of the language and environment for which you are coding and having some secure programming references handy. Several are listed in Resources that are worth investigating.

Some of these pitfalls are traditional problems inherent to static analysis. The most major of these issues can be overcome by preprocessing the input to show the scanner what the compiler would see. This function, however, is still not available on the code examination tools listed here.


Despite some of the mentioned warnings, source code scanners can help improve the state of your code in development or afterwards. It is important to keep these limitations in mind and not presume that everything has been found. The use of two or even all three of these tools is recommended for development teams and basic security audits. Keep in mind that these are tools help assist you in the auditing process, not automate it.


As always, newer versions of the software may be available.




Open Source Quality ProjectProject at UC Berkeley to assist in software reliability, excellent links and resources.

Secure Programming for Linux and Unix HOWTODavid Wheeler also provided helpful discussion during the preparation of this article.

Building Secure Software: How to Avoid Security Problems the Right Way by John Viega and Gary McGraw

Security Engineering: A Guide to Building Dependable Distributed Systems by Ross J. Anderson

The Practice of Programming by Brian W. Kernighan and Rob Pike

Jose Nazario is a biochemistry graduate student nearing the completion of his PhD. Side projects include Linux and other UNIX variants, software and security-related matters, and hobbies outside his office, like fly-fishing and photography.

Load Disqus comments

Firstwave Cloud