The Linux /proc Filesystem as a Programmers' Tool

Software

by Joshua Birnbaum

on June 17, 2005

One of the best investments in time that a person with a keen interest in operating systems can make is to explore them from multiple angles. Operating system installation and configuration can go a long way toward solidifying the concepts that one reads about in system administration and networking texts. Electronic references in the form of system manual (man) pages and the Linux info(1) facility are available for instant consultation. groups.google.com, with its archive of 20-plus years of Usenet postings, can provide expert guidance on issues ranging from DNS configuration to memory interleaving. However, I feel another approach is singularly unique when it comes to exploring operating systems--programming them.

My entry into systems programming was guided by my desire to understand further the operating systems I was working with daily as a contract UNIX and, later, Linux system administrator. The result of this was ifchk, a packet sniffer detector I wrote in C and released in June of 2003. ifchk initially was written under IRIX and then ported to Linux, mostly under the 2.4 kernel. The current ifchk revision, beta 4, recently was released and beta 5 is on the way.

My work on ifchk has allowed me to examine programmatically several areas of operating system functionality. Examples include the Linux netlink(7) and rtnetlink(7) facilities, device control--that is, network interfaces--via ioctl(2), signals and proc, the process filesystem. Proc and its ability to display a wide array of data concerning the runtime state of a system are the focus of our discussion here.

What Is /proc?

Before we begin to talk about the proc filesystem as a programming facility, we need need to establish what it actually is. The proc filesystem is a pseudo-filesystem rooted at /proc that contains user-accessible objects that pertain to the runtime state of the kernel and, by extension, the executing processes that run on top of it. "Pseudo" is used because the proc filesystem exists only as a reflection of the in-memory kernel data structures it displays. This is why most files and directories within /proc are 0 bytes in size.

Broadly speaking, a directory listing of /proc reveals two main file groups. Each numerically named directory within /proc corresponds to the process ID (PID) of a process currently executing on the system. The following line of ls output illustrates this:


dr-xr-xr-x    3 noorg    noorg           0 Apr 16 23:24 19636

Directory 19636 corresponds to PID 19636, a current bash shell session. These per-process directories contain both subdirectories and regular files that further elaborate on the runtime attributes of a given process. The proc(5) manual page discusses these process attributes at length.

The second file group within /proc is the non-numerically named directories and regular files that describe some aspect of kernel operation. As an example, the file /proc/version contains revision information relevant to the running kernel image.

Proc files are either read-only or read-write. The /proc/version file above is an example of a read-only file. Its contents are viewable by way of cat(1), and they remain static while the system is powered up and accessible to users. Read-write files, however, allow for both the display and modification of the runtime state of the kernel. /proc/sys/net/ipv4/ip_forwarding is one such file. Using cat(1) on this file reveals if the system is forwarding IP datagrams between network interfaces--the file contains a 1--or not--the file contains a 0. In echo(1)ing 1 or 0 to this file, that is, writing to the file, we can enable or disable the kernels ability to forward packets without having to build and boot a new kernel image. This works for many other proc files with read-write permissions.

Readers are invited to explore the /proc directory of an available system and consult the proc(5) manual page to solidify their understanding of the above discussion.

Programming with /proc

With the above foundation in place, we now turn our attention to using the contents of /proc to examine programmatically an aspect of the running kernel. A word of note: although the sample file ifmetric.txt discussed below contains all of the code we are referencing, additional understanding can be gleaned by downloading and building ifchk and by walking through its code, via gdb(1).

One area of ifchk functionality deals with the display of counters that describe transmitted and received packets across all network interfaces attached to the system. ifchk does this by parsing and processing /proc/net/dev. Although the file contains all manner of statistics concerning network interfaces, we are interested mainly in the extraction of packet count data.

ifMetric() is the function that implements all of the above functionality. It can be found in the file, ifmetric.txt, that is excerpted below. ifMetric() also can be found within the ifchk distribution in ~/ifchk-0.95b4/linux.c.

The ifMetric() function is defined as follows:


int ifMetric( struct ifList *list );

*list is a pointer to the head of a linked list of structs of type ifList. Each node describes characteristics of an interface present on the system. ifList is defined in ~/ifchk-0.95b4/linux.h as such:


struct ifList
{
    int flags;			/* Interface flags.                          */
    char data[DATASZ];		/* Interface name/unit number, e.g., "eth0". */
    struct ifList *next;	/* Pointer to next struct ifList.            */
 };

In getting started, our first inclination might be to open the /proc/net/dev file and begin searching for the relevant packet count data. There is, however, some file reconnaissance work to be done in between these two steps. Program failure due to maliciously crafted input is a staple of security advisories these days. We cannot make any assumptions about external program input, regardless of its source. As we will see, there is another security-related reason why we sequence the above file access operations as we do. With all of the above in mind, let's get to work.


540  int ifMetric( struct ifList *list )
541  {
542      struct stat fileAttrs;                 /* /proc/net/dev file
attributes. */
543      struct stat linkAttrs;                 /* /proc/net/dev link
attributes. */
544      char *filePath = NULL;                 /* Path to /proc/net/dev
file. */
545      FILE *file = NULL;                     /* File pointer for
fopen(). */
546      int status = 0;                        /* /proc/net/dev file
close() status. */
547      char buf[1024] = "";                   /* Character buffer. */
548      struct ifList *cur = NULL;             /* Current node in
linked list. */
549      int conv = 0;                          /* sscanf() string match
count. */
550      long long rxPackets = 0;               /* Received packet
count. */
551      long long txPackets = 0;               /* Transmitted packet
count. */
552      char *delim = NULL;                    /* Ptr to matched
character in string. */
553      unsigned int ifIndex = 0;              /* Interface index. */
554      mode_t mode = 0;                       /* /proc/net/dev file
permissions. */

558      cur = list;

559      if( cur -> data == NULL )
560      {
561          fprintf( stderr, "ifchk: ERROR: interface list is empty\n"
);
562          return (-1);
563       }

564      filePath = "/proc/net/dev";

565      if( ( lstat( filePath, &linkAttrs ) ) != 0 )
566      {
567          perror( "lstat" );
568          return (-1);
569       }

570      if( S_ISLNK( linkAttrs.st_mode ) )
571      {
572          fprintf( stderr, "ifchk: ERROR: /proc/net/dev is a symbolic
link\n" );
573          return (-1);
574       }

Before we open /proc/net/dev, we must be sure that the file is not actually a symbolic link (symlink). If passed a symlink, which our proc file shouldn't be, fopen(3) would follow that link to what it points to. Our subsequent fstat(2) call, using the file descriptor returned by fopen(3), would return data on the wrong file. To protect against this, we lstat(2) /proc/net/dev and then check if it is a symlink, by using the S_ISLNK POSIX macro. If S_ISLNK finds a symlink, we print an error message with fprintf(3) and return -1, signifying failure. Otherwise, we continue on.


578      if( ( file = fopen( "/proc/net/dev", "r" ) ) == NULL )
579      {
580          perror( "fopen" );
581          return (-1);
582       }

Presuming that S_ISLNK didn't find a symlink, we next open the /proc/net/dev file by calling fopen(3). If our call succeeds, fopen(3) returns a FILE pointer for use with future operations on /proc/net/dev. If our call fails, we call perror(3) and return -1, signifying failure. Notice that we check the return value of the fopen(3) call. Checking function return values is critical, to say the least.


586      if( ( fstat( fileno( file ), &fileAttrs ) ) != 0 )
587      {
588          perror( "fstat" );
589          return (-1);
590       }

One of the first things I think about when writing code is how it might fail. Additionally, I also think about how I can minimize the impact of failure by failing gracefully through programming defensively. Doing this is crucial, I feel, especially in light of the heightened state of concern regarding system security and the potential impact of security compromises.

Given this, we must continue to probe /proc/net/dev, via fstat(2), to see if the file exhibits what constitutes plausible attributes for a read-only proc file. These attributes include its file type outside of being a symlink, size in bytes and disk blocks, user and group ownership status and permissions. The following attribute criteria is based on what I have seen for /proc/net/dev on the bulk of Linux systems I have worked on.


file type:        regular
size in bytes:    0
size in blocks:   0
file ownership:   root:root
file permissions: 0444 (-r--r--r--)

From the perspective of a default ifchk build, our /proc/net/dev file must conform to the above attribute criteria. However, a simple modification to the ifchk code, if required, can accommodate site policies and so on that differ from the above.

If successful, fstat(2) fills in fileAttrs with /proc/net/dev attributes. In the event of failure, we relay a descriptive error message to the user via a call to perror(3). We then return -1, signifying failure. Examining fileAttrs in gdb(1), we see that fstat(2) has the following to say about /proc/net/dev (some struct members were removed from GDB output, as they did not add to the discussion):


(gdb) print fileAttrs 
$1 = {..., st_mode = 33060,
           st_uid = 0,
           st_gid = 0,
           st_size = 0,
           st_blocks = 0, ...}

In the discussion above, I alluded to an additional security-related reason behind the sequencing of our /proc/net/dev access procedures. To recap, our handling of /proc/net/dev--outside of lstat(2) and our not wanting to follow symlinks--currently consists of calling fopen(3) followed by fstat(2). We just as easily could have achieved the same result via a call to stat(2), not fstat(2), and then fopen(3), as stat(2) returns the same file attribute data as fstat(2) does. So, why use the former sequence--fopen(3), fstat(2)--instead of the latter one--stat(2), fopen(3)? Because we wish to avoid a race condition. The stat(2), fopen(3) sequence creates the possibility that the file we referenced during the stat(2) call could have been substituted with another one with possibly different attributes or even different contents once we got to the fopen(3) call. In this situation, we'd think we were calling fopen(3) on the same file we just called stat(2) on but, alas, not. The danger of this, I think, is obvious.


591      if( ( ( linkAttrs.st_ino ) != ( fileAttrs.st_ino ) ) ||
592          ( ( linkAttrs.st_dev ) != ( fileAttrs.st_dev ) ) )
593      {
594          fprintf( stderr, "ifchk: ERROR: /proc/net/dev file
attribute inconsistency\n" );
595          return (-1);
596       }

As an added measure in checking that we are working with the proc file we think we are, we compare the inode numbers, st_ino, and resident filesystem, st_dev, that both lstat(2) and fstat(2) reported in their /proc/net/dev attribute checks. If all is well, the value of linkAttrs.st_ino should equal fileAttrs.st_ino and the value of linkAttrs.st_dev should equal the the value of fileAttrs.st_dev. If we're okay here, we continue on. If not, we report an attribute consistency error via fprintf(3) and return -1, signifying failure.


600      if( ! ( S_ISREG( fileAttrs.st_mode ) ) )
601      {
602          fprintf( stderr, "ifchk: ERROR: /proc/net/dev is not a
regular file\n" );
603          return (-1);
604       }

S_ISREG is a POSIX macro that checks to see if its argument is a regular file and not, say, a directory. If it is, we continue on. If not, we print an error message via fprintf(3), and return -1, signifying failure. At this point, you might be asking why we need the lstat(2)/S_ISLNK symlink test above if we're testing for file type here. Referring back to the symlink test above should help to answer that question.


608      if( ( ( fileAttrs.st_size ) || ( fileAttrs.st_blocks ) ) != 0 )
609      {
610          fprintf( stderr, "ifchk: ERROR: /proc/net/dev file size is
greater than 0\n" );
611          return (-1);
612       }

Is /proc/net/dev 0 bytes in length and does it occupy 0 disk blocks? If we see zero for both byte count and disk blocks, we continue on. If not, we print an error message via fprintf(3) and return -1, signifying failure. Notice that only one of the two file tests has to fail for the program to fail.


616      if( ( ( fileAttrs.st_uid ) || ( fileAttrs.st_gid ) ) != 0 )
617      {
618          fprintf( stderr, "ifchk: ERROR: /proc/net/dev is not owned
by UID 0, GID 0\n" );
619          return (-1);
620       }

Is /proc/net/dev owned by user root and group root? Here again, only one of the two tests has to fail for the program to fail and return -1, signifying failure.


624      if( ( mode = fileAttrs.st_mode & ALLPERMS ) != MODEMASK )
625      {
626          fprintf( stderr, "ifchk: ERROR: /proc/net/dev permissions
are not mode 0444\n" );
627          return (-1);
628       }

Is /proc/net/dev mode 0444--read-only user, group and other? ALLPERMS, defined in the /usr/include/sys/stat.h system header file, is a mask that defines all file permissions, or mode 07777. MODEMASK, defined within ~/ifchk-0.95b4/linux.h, is a mask that defines user, group and other read permissions, or mode 0444.

By bitwise ANDing fileAttrs.st_mode with ALLPERMS and then comparing that result to MODEMASK, we can see if /proc/net/dev is mode 0444. If it is, we continue execution. If not, we print an error message via fprintf(3) and return -1, signifying failure. This concludes our /proc/net/dev file attribute tests. However, before ifchk can work with the file, we must examine its internal contents.

Testing the internal content structure or format of /proc/net/dev required that I define a criteria under which ifchk would accept or reject the proc file. As Linux has progressed, the number of fields--bytes, packets, errs--in /proc/net/dev have changed. All of the /proc/net/dev files I have seen share an identical content structure to the file below; output to far right was truncated, due to space limitations:


Inter-|   Receive                                                |  Transmit ...
 face |bytes    packets errs drop fifo frame compressed multicast|bytes      ...
    lo:   34230     586    0    0    0     0          0         0    34230   ...
  eth0:22476180  208548    0    0    0     0          0         0 52718375   ...

Two lines of headers are followed by lines of per-interface statistics. Older versions of the file do not contain the compressed field. In the name of simplicity, I decided that /proc/net/dev files that did not contain this field would be rejected by ifchk. With that foundation in place, we now begin the process of examining /proc/net/dev internally.


632      if( ! fgets(buf, sizeof( buf ), file) )
633      {
634          perror( "fgets" );
635          return (-1);
636       }

637      if( ! fgets(buf, sizeof( buf ), file) )
638      {
639          perror( "fgets" );
640          return (-1);
641       }

645      if( ( strstr( buf, "compressed" ) ) == NULL )
646      {
647          fprintf( stderr, "ifchk: ERROR: /proc/net/dev header format
is not supported\n" );
648          return (-1);
649       }

We make two identical fgets(3) calls in a row to read the first and second line of headers from /proc/net/dev. Each fgets(3) call results in an overwrite of what previously was in buf. As a result, buf now contains the second header line. We then check to see if the second line of headers contains the compressed field.

If compressed is located in buf, our strstr(3) call succeeds and we have what looks like a usable /proc/net/dev file. If compressed cannot be located in buf, we print an error message via fprintf(3) and return -1, signifying failure. With this, all of our testing, which began by looking at file attributes, is done.

The remainder of our counter output code takes the form of a while loop that handles the processing and output of data for each interface. It iterates as many times as there are interfaces on the system.


653      printf( "***   Network Interface Metrics   ***\n" );
654      printf( "Name    Index   RX-OK           TX-OK\n" );

659      while( fgets( buf, sizeof( buf ), file ) )
660      {
664          if( ( strstr( buf, cur -> data ) ) != NULL )
665          {
666              delim = strchr( buf, ':' );

670              if( *( delim + 1 ) == ' ' )
671              {
672                  conv = sscanf( buf,
673                  "%*s %*Lu %Lu %*lu %*lu %*lu %*lu %*lu %*lu %*Lu
%Lu %*lu %*lu %*lu %*lu %*lu %*lu",
674                  &rxPackets, &txPackets );
675               }

676              else
677              {
678                  conv = sscanf( buf,
679                  "%*s %Lu %*lu %*lu %*lu %*lu %*lu %*lu %*Lu %Lu
%*lu %*lu %*lu %*lu %*lu %*lu",
680                  &rxPackets, &txPackets );
681               }
682           }

We call fgets(3) to read the next line of the /proc/net/dev file into buf. We then call strstr(3) to check that the interface name in cur -> data, ifmetric.txt:558:, matches the interface name in the /proc/net/dev line we just read in. Interface statistics lines in /proc/net/dev begin with an interface name, followed by a colon followed by a count of bytes received on that interface. In some cases, there is whitespace between the colon and the received bytes count, for example, eth0: 6571407, and in other cases, not, eth0:12795779).

In order to maintain uniform column output in either of these cases, we use pointer arithmetic to test for the existence of this white space. If whitespace does exist, we enter the if() statement on line 670, whose sscanf(3) format specifiers deal with it. If there is no whitespace, we enter the else block on line 676. These format specifiers deal with the lack of whitespace.

In either case, the counters for both received and transmitted packets are copied by sscanf(3) from /proc/net/dev to the variables rxPackets and txPackets for later output.


683          else
684          {
685              fprintf( stderr, "ifchk: ERROR: current metrics do not
describe current interface %s\n",
686
cur -> data );
687              return (-1);
688           }

If we compare the interface name in cur -> data to the name in /proc/net/dev and find a mismatch, we print an error message via fprintf(3) and return -1, signifying failure.


692          if( conv != 2 )
693          {
694              fprintf( stderr, "ifchk: ERROR: /proc/net/dev parse
error\n" );
695              return (-1);
696           }

If successful, our above sscanf(3) call returns the number of matched items. As a result, the variable conv should equal 2 for rxPackets and txPackets. If not, we print an error message via fprintf(3) and return -1, signifying failure.


697          if( ( ifIndex = if_nametoindex( cur -> data ) ) == 0 )
698          {
699              perror( "if_nametoindex" );
700              return (-1);
701           }

Next, we call if_nametoindex(), passing it the interface name in cur -> data and, if successful, store the integer interface index in ifIndex. If the call fails, we handle it as usual. An interface index is a positive integer that the kernel assigns to each interface present on the system.


702          printf( "%-7s %-7d %-13Lu   %-13Lu\n", cur -> data,
ifIndex, rxPackets, txPackets );
703
704          conv = 0;

705          if( cur -> next != NULL )
706          {
707              cur = cur -> next;
708           }
709       }

Having built a line of counter output for the current interface, we print it. We then return to the beginning of the while loop and continue or, having reached our loop termination condition, exit the loop.


713      if( ( status = fclose( file ) != 0 ) )
714      {
715          perror( "fclose" );
716          return (-1);
717       }

721      if( ( writeLog( LOGINFO, pw -> pw_name, NULLSTATE ) ) != 0 )
722      {
723          fprintf( stderr, "ifchk: ERROR: could not pass logging
message to syslogd\n" );
724          return (-1);
725       }

726      return (0);
727  }

Having exited the while loop, we call fclose(3), which corresponds to our fopen(3) call on line 578. We then call our logging function to log that an interface counter dump was performed.

With all processing done, ifchk produces the following packet count output on a system with two interfaces:


***   Network Interface Metrics   ***
Name    Index   RX-OK           TX-OK
lo      1       104             104          
eth0    3       1280903         1162571<--.
^       ^       ^                         |
|       |       |[from /proc/net/dev]-----'
|       |
|       |[from if_nametoindex()]
|
|[from cur -> data]

Conclusion

The process filesystem provides all who make use of it with a wealth of system-level information. The ability to manipulate all manners of runtime state information by using file-level system calls and commands, such as cat(1) and echo(1), make proc a high priority candidate for inclusion in anyone's Linux toolkit.

Joshua Birnbaum began his system administration career in 1994. An addiction to SGI led to Sun and then to Linux. From there, he broadened his horizons by branching out into contract sysadmin, public speaking, UNIX/Linux systems programming and now writing for magazines. He can be reached at engineer@noorg.org.

Load Disqus comments