Quantcast
Username/Email:  Password: 

Source Code Scanners for Better Code

They aren't a replacement for manual checks and edits, but tools like Flawfinder, RATS and ITS4 can point you in the right direction.

Coding is tough enough, and coding right
can sometimes seem an almost impossible task. Between design
constraints, deadlines and making it work in the first place, it's
difficult to get your code secure. Security concerns, however, can
be helped by scanners.Source code scanners are nothing new. Tools like
lint have been around for many,
many years to help you find errors in your C code. There are even
lint checkers for DNS and HTML files,
nslint and
weblint, respectively. The lint
tool doesn't explicitly talk about insecure code, though, only
basic inconsistencies in your style.Source code audits are also great things, things like the
Linux Auditing Project. However, audits can take a lot of time and
money if you outsource them, though the payback is well worth the
investment. Source code scanners used available for only commercial
software, prohibiting their use by many Linux developers and
researchers, who didn't always place user friendliness high on
their lists of goals.A couple of years ago, pscan
was released under the GPL. It's a simple format string scanner but
one of the first such tools commonly available on the hobbyist
developer market. Since then, three new source code scanners have
become available for source available software, and all three are
worth looking at. Not all of them are open source, but they do work
well for most users coding in C or C++.The resources listed at the end of this article are worth
checking out. The Open Source Quality Project has links to several
sites, including commercial products and academic research groups,
involved in performing software quality testing. Some of these
tools can do more rigorous testing but can be far less
user-friendly.FlawfinderDeveloped by the noted author and coder David Wheeler,
Flawfinder is a Python program that can be used to assist auditing
C and C++ code. It's in the early stages, currently at version
0.19, but already it's strong competition for the other tools
listed here in the 1.x stages. Flawfinder is also quite fast,
covering thousands of lines of C code on a typical desktop machine
in a matter of seconds. Flawfinder is released under GPL version 2,
meaning it is free software.Flawfinder also shows some intelligence when it comes to
scanning for vulnerabilities. For example, in tests using
intentionally insecure code, Flawfinder was able to distinguish
between strcpy() from a constant sized string and variable length
strings and tell the difference between vulnerabilities and false
hits. Furthermore, Flawfinder understands the gettext libraries and
their use in internationalization.RATSRATS, the Rough Auditing Tool for Security, is a source code
scanner under active development that is capable of scanning C,
C++, Perl, PHP and Python source code. Also released under the GPL,
some of the RATS developers also worked on a similar tool, ITS4
(discussed below). The current version, 1.3, is noticeably more
stable than version 1.2, but it still can fail on large amounts of
input.RATS is configurable when the source code is modified
(through lexical analysis), with error messages controlled by XML
reporting filters, which requires the XML tool
expat to also be installed. At
runtime, you can configure the level of output you wish to see
(defaulting to medium), alternative vulnerability databases and
even report functions that accept input from the user, facilitating
the tracking of user supplied data.Some of the specific limitations of RATS include the use of
greedy pattern matchings, meaning that tracking for "printf" will
match not only "print()" calls but also "vsnprintf()" and the like.
This can make it difficult to filter hits from noise when specific
functions are being sought with the -a flag.The authors of RATS and Flawfinder, by the way, plan to
coordinate their development efforts to produce a high quality,
open-source development tool. This should be good to watch, as each
development team is well respected in the field.ITS4One of the original source available scanners for Linux and
UN*X, It's the Software Stupid Source Scanner (ITS4), scans C and
C++ source for common security-related flaws. Developed by Cigital
(then known as RST Technologies) and some of the same people who
went on to develop RATS, ITS4 has set the pace. Many of the
niceties in ITS4 are also available in RATS or Flawfinder, such as
ignorable lines, tracking user input (the -m flag) and alternative
databases.Modification of a particular project or site is relatively
straightforward. A simple text file of vulnerabilities is used,
which can be added to or modified to suit any specifics of a
system. It should be noted that ITS4 comes with a non-free license,
even though the source is available. Users should read and
understand the license if they are doing anything beyond hobbyist
programming with it.Using the ToolsUse of these source code scanners is relatively
straightforward, as they take for their input one or more source
code files and then produce output. Our examples will focus on the
file openldap-2.0.11/libraries/libldap/print.c, from the OpenLDAP
2.0.11 source distribution. This has been chosen because it
highlights some of the subtle differences in these scanners. The
code section which produces hits looks like this:

35-int ldap_log_printf( LDAP *ld, int loglvl, const char *fmt, ... )
36-{
37-     char buf[ 1024 ];
38-     va_list ap;
39-
40-     if ( !ldap_log_check( ld, loglvl )) {
41-             return 0;
42-     }
43-
44-     va_start( ap, fmt );
45-
46-#ifdef HAVE_VSNPRINTF
47-     buf[sizeof(buf) - 1] = '\0';
48-     vsnprintf( buf, sizeof(buf)-1, fmt, ap );
49-#elif HAVE_VSPRINTF
50-     vsprintf( buf, fmt, ap ); /* hope it's not too long */
51-#else
52-     /* use doprnt() */
53-     chokeme = "choke me! I don't have a doprnt manual handy!";
54-#endif

Now, let's process this code piece using each of the three
scanners to demonstrate their differences. First, we will look at
the output from the RATS tool:

$ rats print.c
print.c:37: High: fixed size local buffer
Extra care should be taken to ensure that character arrays that are allocated
on the stack are used safely.  They are prime targets for buffer overflow
attacks.
print.c:50: High: vsprintf
Check to be sure that the non-constant format string passed as argument 2 to
this function call does not come from an untrusted source that could have added
formatting characters that the code is not prepared to handle.
print.c:50: High: vsprintf
Check to be sure that the format string passed as argument 2 to this function
call does not come from an untrusted source that could have added formatting
characters that the code is not prepared to handle.  Additionally, the format
string could contain `%s' without precision that could result in a buffer
overflow.

RATS found three errors with high severity that are worth
noting. Notice that it reported the vsprintf() twice, however, with
the second report being nearly identical to the first, with an
added warning.Now let's try Flawfinder on the same input file:

$ flawfinder print.c                                                           
Flawfinder version 0.15, (C) 2001 David A. Wheeler.
Number of dangerous functions in C ruleset: 40
Processing print.c
print.c:48 [4] (format) vsnprintf: if format strings can be influenced by an attacker, 
they can be exploited. Use a constant for the format specification. 
print.c:50 [4] (format) vsprintf: Potential format string problem. Make Format 
string constant. 
There are probably other security vulnerabilities as well; review your code!

Flawfinder found two unique problems worth reporting, but it
doesn't note the fixed size declaration of "char buf[ 1024 ]" at
line 37, which could become a problem (and it does on some
platforms).Lastly, let's use ITS4 on the same input and examine its
output:

$ its4 print.c                                                                 
print.c:48:(Urgent) vsnprintf
Non-constant format strings can often be attacked.
Use a constant format string.
----------------
print.c:50:(Urgent) vsprintf
Non-constant format strings can often be attacked.
Use a constant format string.
----------------

Again, the same results as Flawfinder, tagging string format
caveats. Notice that in each case the output is formatted
differently, different functions are tagged by all three tools
(this limited example doesn't highlight all of the differences),
and none of them handle the conditional inclusion of safe or unsafe
functions.Interpreting the results is not always as easy as it may
appear. For example, input sanitization can be a tricky, as is
safely opening a file to avoid a symlink attack. Even copying
strings can be tricky, as subtle issues like NUL termination,
length and memory allocation all play a part. Astute programmers
will still be required (see Resources).Typically, the scanners are designed to be run over the
source code several times during development, each time fixing or
investigating the major problems. When coding, it's easy,
initially, to forget to use a more secure function like strncpy(),
and these tools help reinforce improved habits. When examining
outside sources of code, scanners help highlight areas of code that
may be problematic. In either case, these scanners help show you
where to focus your attentions, and they cover many of the basic,
common coding errors that lead to faulty code or, worse, security
issues.All of these tools are quite fast, operating on about 100,000
lines of input in under one second. Simply put, you'll spend more
time trying to figure out the meaning of the output than you will
running the checker itself.Caveats of Source Scanning ToolsSource code scanners specifically designed to look for
security flaws are obviously a help. However, there are a number of
limitations users of them will have to keep in mind.First, they will never replace a good manual audit of the
code. There are simply too many variables that have not yet been
abstracted into an automated scan.Secondly, it is vital for code authors to understand the
functions and libraries they are using and the nuances inherent in
them. There will never be a replacement for understanding the
source of errors, as these tools only list some of the possible
security holes in the code. For example it can't dig into a library
to find unsafe functions buried beneath other functions, unless the
tool has been explicitly told that the function is unsafe. A
similar constraint exists for types of data, such as int, char or
longs. If you are using types you define in your code, the scanner
most likely doesn't know how to handle them natively.Lastly, scanners are limited, so far, in the languages they
understand. This definitely limits their utility, wide though it
may be for most of us. RATS is, so far, the lead in this arena,
understanding five programming languages, while the other two are
focused on C and C++ parsing.Some of these limitations come from poor documentation of
functions and their obstacles, such as the printf() family. Other
errors come from a lack of standardization of how to perform
actions securely, such as opening a file. And still others come
from the lack of portable secure replacement functions. In this
last case, it's probably best to implement the safe functions in
your code and call them as needed on platforms which lack them. For
example, several versions of snprintf() exist for platforms which
lack it, covered under a variety of licenses.Having the output of these scanners is only the start of
securing the code. It's vital to correctly use more secure
replacements, such as strncpy() or the right format strings in
scrubbed user supplied data. This requires a good understanding of
the functions and the code in which they're used. Off-by-one
errors, for example, are easy to make when you forget to count NULL
termination in your storage allocation.One of the major problems evident with these scanners is the
lack of any preprocessing, so no macros or definitions are
expanded, and no external functions available in source form are
examined. Therefore, code such as this:

   #define p(x) printf ## x
   char *string1, *string2;     /* user supplied */
   /* stuff happens ... */
   p((string1));                /* insecure! */
   p((string2));                /* again! how horrible! */
   p(("%s", string1));          /* finally, its correct ... */

may only produce one error in the definition but not in the
use of the macro. However, an insecure call is made twice, which
will go unnoticed by the scanner. While in this example a macro is
used, the same issue applies to unsafe user-defined functions or
wrapper functions. This has been the source of several major
security holes found over the years, where internally defined
functions, which are insecure, are used throughout the code. This
additional layer hides the problems in the code. However, in this
case, the tools flag the insecure function first, which can then be
followed up to fix.Preprocessing itself is an area of debate for static
analysis. Sometimes, flagged code may not be in use on the platform
on which development is being tested, in which case it can be
ignored for that platform. However, it should still be noted that
it may affect some platforms. In the OpenLDAP code example above,
older systems without snprintf() would be affected by a potential
buffer overflow. Issues surrounding coding adjustments are best
addressed by developing a strong understanding of the language and
environment for which you are coding and having some secure
programming references handy. Several are listed in Resources that
are worth investigating.Some of these pitfalls are traditional problems inherent to
static analysis. The most major of these issues can be overcome by
preprocessing the input to show the scanner what the compiler would
see. This function, however, is still not available on the code
examination tools listed here.ConclusionsDespite some of the mentioned warnings, source code scanners
can help improve the state of your code in development or
afterwards. It is important to keep these limitations in mind and
not presume that everything has been found. The use of two or even
all three of these tools is recommended for development teams and
basic security audits. Keep in mind that these are tools help
assist you in the auditing process, not automate it.ResourcesAs always, newer versions of the software may be
available.FlawfinderRATSITS4Open
Source Quality Project
Project at UC Berkeley to assist in
software reliability, excellent links and resources.Secure
Programming for Linux and Unix HOWTO
David Wheeler also
provided helpful discussion during the preparation of this
article.Building Secure Software: How to Avoid Security
Problems the Right Way
by John Viega and Gary
McGrawSecurity Engineering: A Guide to Building
Dependable Distributed Systems
by Ross J.
AndersonThe Practice of Programming by Brian W.
Kernighan and Rob PikeJose Nazario is a
biochemistry graduate student nearing the completion of his PhD.
Side projects include Linux and other UNIX variants, software and
security-related matters, and hobbies outside his office, like
fly-fishing and photography.

email: jose@biocserver.bioc.cwru.edu

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Need more intelligent tools

Anonymous's picture

The problem with scanner type tools is they provide very little intelligent filtering and flood the user with many false positives; invariably users look at the first 10 results and give up.

Re: Source Code Scanners for Better Code

Anonymous's picture

You have a good overview of the 3 source code scanners, are these the commonly used one's, are there any other.
I had a quick question on source code scanners, Can this scanners be used to scan code written for different platforms?(i.e. me running source code scanner on linux, can i scan some piece of code written to run on Windows, Unix)

-
Thanks,
Prasad

Re: Source Code Scanners for Better Code

jnazario's picture

sorry about the bad grammar in some places, i need to be a bit more careful with that. :) anyhow, hope you enjoy the piece.

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options