String Manipulation with the Standard Template Library

by William F. Simpson

Have you ever found yourself in a situation where you had to ensure that only one copy of a program could be running at any given time? If so, you probably thought it would be nice if the program could detect the presence of another copy of itself and exit gracefully. If you have faced this problem, read on to see how I solved it.

Listing 1. Output of ps xa

PID TTY      STAT   TIME COMMAND

1103 ?        S      0:00 /usr/sbin/nscd
1104 ?        S      0:00 /usr/sbin/nscd
1105 ?        S      0:00 /usr/sbin/nscd
1106 ?        S      0:00 /usr/sbin/nscd
1107 ?        S      0:00 /usr/sbin/nscd
1108 ?        S      0:00 /usr/sbin/nscd
1109 ?        S      0:00 /usr/sbin/nscd
1119 ?        S      0:00 /opt/kde3/bin/kdm
1141 tty1     S      0:00 /sbin/mingetty --noclear tty1
1142 tty2     S      0:00 /sbin/mingetty tty2
1143 tty3     S      0:00 /sbin/mingetty tty3
1144 tty4     S      0:00 /sbin/mingetty tty4
1831 pts/1    S      0:00 man grep
1927 pts/3    R      0:00 grep X

Listing 1 is part of what was returned after I entered ps xa, to show all processes on the system, at the command prompt. Take a look at the last two lines. Process ID 1831 is the man program; the input argument to man is grep. In the final line, process ID 1927 is the grep program running with an input argument of X. If these were the only processes running, entering grep -c grep would return the incorrect value of 2. So, we need to parse the output correctly to count grep processes differently from processes where grep is only a command-line argument.

Listing 2 is the code for the function pc, which gives an accurate count of the number of times a program is running. It makes heavy use of the Standard Template Library (STL). If the STL is new to you, stop reading and get a book about it right now. My students swear by it and swear at me for not telling them about it sooner. The simplification the STL imposes on strings and streams alone is worth the time it takes to learn.

Listing 2. The pc function counts the number of instances of a program in the output of ps.



1  #define  BUFFERSIZE 255
2  #include <stdio.h>
3  #include <fstream>
4  #include <string>
5  #include <unistd.h>
6  #include <stdlib.h>
7  #include <limits.h>
8  using namespace std;
9  int pc(char *processname)
10 {
11 int count=0;
12 FILE *f1;
13 char buffer[BUFFERSIZE];
14 if(processname==NULL) return -1;
15 string programstring;
16 string progname=string(processname,
        strlen(processname));
17 string cmdstr="/bin/ps xa | /bin/grep " +
        progname + " > /tmp/zzqxy.txt";
18 f1=popen(cmdstr.c_str(),"w");
19 pclose(f1);
20 ifstream f2("/tmp/zzqxy.txt");
21 while(!f2.eof())
22   {
23   f2.getline(buffer,BUFFERSIZE);
24   strtok(buffer," ");
25   for(int i=0;i<4;i++)
       programstring=strtok(NULL," ");
26   if(programstring.rfind("/")!=ULONG_MAX)
         programstring=programstring.substr
                      (programstring.rfind("/") + 1,
                      programstring.length());
28    if(programstring==progname) count++;
29    };
30 f2.close();
31 unlink("/tmp/zzqxy.txt");
32 return count;
33 }


Let's take a quick code walk. Lines 3 and 4 are not missing the .h extension; all STLs omit it. Now look at line 8. If you omit this line, every command you use in the STLs has to be prefaced with std::. Line 14 is an idiot check to ensure that the function cannot be called with a NULL input. Now, for the STL; line 16 creates the string equivalent of the char * input argument. Line 17 builds the command string necessary for the first step in generating a correct program count. STL really shines here. String concatenation is done with the + operator, so no more string overruns or strcpy commands to worry about. STL takes care of all of this for you. If you were to enter the string part of line 17 at the prompt, it would look like this:

ps xa | /bin/grep  progname > /tmp/zzqxy.txt

where progname is the name of the process for which you are searching. The command ps xa lists all processes. The output from ps is piped to the standard input of grep, which outputs any line that matches progname. Finally, all output from grep is redirected to the file /tmp/zzqxy.txt. Embedding this line in the popen command allows easy execution of that nasty command string from within the program.

For the second part of the program, go back to Listing 1 for moment. What we really want to look at is the fourth field of each line because this is the program name. For our purposes, all the rest of the information on that line can be ignored. The only question is how to extract that fourth field. Time for a little text processing and another burst of STL. Line 20 opens the file we created back in line 17. None of the usual C++ file I/O here; all you need to do is supply the filename. Using ifstream implies an opening to read. Lines 24 and 25 use the wonderful strtok function to extract the fourth field from each line. strtok is a little different in that the first time you use it, you must supply the name of the character string to break up (tokenize) and the delimiter, the character between the tokens. In this case, the character between the tokens is a space. However, in all additional calls to strtok, replace the name of the character string with NULL.

One last problem to solve. We now have extracted the name of the program, but the name may be the fully qualified name (/bin/grep as opposed to grep). If we aren't careful, we could count a false case where a directory has the same name as the program we want. STL to the rescue. Line 26 uses the STL rfind command to break the program name down from the right side. If the program name is fully qualified, lines 26 and 27 clip off the program name immediately after the last / character. If the program is not fully qualified, line 26 fails by returning the value ULONG_MAX, 4,294,967,295, and we already know we have the program name. Either way, we now have the actual program name. Line 25 is a simple test to check whether this program is the one we want to count. If it is, it adds one to a counter. Finally, it gets a new line from the file and start all over again.

Now that the code is done, it is time to put it into a form that can be accessed easily by any other programs. This is a multistep process, so creating a Makefile simplifies things in the long run. Most newcomers to Linux look at one of those infinitely long Makefiles created for the big programs and run screaming from the room. Stay calm; this little project has a much kinder Makefile.

Listing 3. Makefile

1  CC=g++
2  PC=pc
3  INSTALL_PROGRAM=/usr/local/lib
4  INSTALL_INCLUDE=/usr/local/include
5  pc   : pc.o
6       ar crv libpc.a pc.o;
7       ranlib libpc.a
8  pc.o    : pc.cpp
9       ${CC} -c pc.cpp
10 install :
11      cp libpc.a ${INSTALL_PROGRAM}
12      cp pc ${INSTALL_INCLUDE}
13      ldconfig
14 clean        :
15      rm *.o;
16      rm ${INSTALL_PROGRAM}/libpc.a;
17      rm ${INSTALL_INCLUDE}/pc;
18      rm test
19 test :
20      ${CC} -o test test.cpp -l${PC}

The first four lines in the Makefile shown in Listing 3 are variables. Line 1 specifies which compiler to use. Line 2 is a reference to the completed library. Names of compiled libraries all start with lib. When referenced, a library with the -l flag identifies the file as a library and saves keystrokes. Lines 3 and 4 specify where the header and compiled library are to go when we are done.

When you type make, the make program looks for the first rule in the Makefile. In Listing 3 this is found on line 5. Typing make pc would do the same thing. Anything to the right of the colon on a rule line must be executed before that rule can be completed. In this example, rule pc.o must be completed first. Rule pc.o in line 8 has nothing to the right of the colon, so it can be carried out right now. This rule creates the file pc.o, so now the pc rule can be completed.

Remember, this Makefile is not a normal program—it has no main function. We need to put pc.o in a proper format so it can be linked easily into other program. Line 6 invokes the archive program. The archive program converts the unlinked file into the archive libpc.a. In line 7 ranlib creates an index for the contents of the archive. The program is now complete. All you have to do is to type make install to install the various parts of the archive into their proper places. In this example, the header for the archive goes into /usr/local/include, and the actual library goes to /usr/local/lib.

As a final step, you must run ldconfig in order to create the necessary symbolic links to make linking to the new library easy. The install rule, line 10, is the first case where the variables defined in lines 1–4 are used. When used, all variables start with a \${ and end with a }. Why use them you ask? It is possible that a variable could be referenced in many places throughout the Makefile. If you change the variable at the top, all the places where it is used further down also are changed. It beats having to make all the changes with find and replace. The clean rule, line 14, removes the archive and header and deletes all .o files from the directory. This is useful if you wish to uninstall the archive. The test rule, line 19, creates a small program that illustrates how to use the pc function in a C++ program. When you run test, make sure to type in ./test instead of only test. The program is listed in Listing 4. Notice that the included header, line 2, is pc and not pc.h. This shows that the library has been written using STL.

Listing 4. test.cpp, a Small Test Program



1  #include <iostream>
2  #include <pc>
3  using namespace std;
4  int main(int argc, char *argv[])
5       {
6       cout << "count=" << pc(argv[1]) << endl;
7       return 0;
8       }


I'm finished, but I hope you're not. Get interested in the STL.

William Simpson lives in Lawrence, Kansas, with his wife and daughter. When not coding Linux, he teaches Computer Science at Emporia State University in Emporia, Kansas. He can be reached at simpsonb@emporia.edu.

Load Disqus comments