Book Excerpt: Linux Programming by Example, Part 2

Reviews

by Arnold Robbins

on May 20, 2005

In Part 1 of this article, we began a tour of the V7 Unix ls program. We looked at how it parses options and at how it maintained the flist array as a two-level stack of files to print information about. We then saw how ls found and printed out the long format information about each file. This article continues the tour through the code.


238  long                           _long nblock(long size)_
239  nblock(size)
240  long size;
241  {
242      return((size+511)>>9);
243  }

The nblock() function reports how many disk blocks the file uses. This calculation is based on the file's size as returned by stat(). The V7 block size was 512 bytes--the size of a physical disk sector.

The calculation on line 242 looks a bit scary. The >>9 is a right-shift by nine bits. This divides by 512, to give the number of blocks. (On early hardware, a right-shift was much faster than division.) So far, so good. Now, a file of even one byte still takes up a whole disk block. However, 1 / 512 comes out as zero (integer division truncates), which is incorrect. This explains the size+511. By adding 511, the code ensures that the sum produces the correct number of blocks when it is divided by 512.

This calculation is only approximate, however. Very large files also have indirect blocks. Despite the claim in the V7 _ls_(1) manpage, this calculation does not account for indirect blocks.

Furthermore, consider the case of a file with large holes (created by seeking way past the end of the file with lseek()). Holes don't occupy disk blocks; however, this is not reflected in the size value. Thus, the calculation produced by nblock(), while usually correct, could produce results that are either smaller or larger than the real case.

For these reasons, the st_blocks member was added into the struct stat at 4.2 BSD, and then picked up for System V and POSIX.


245  int m1[] = { 1, S_IREAD>>0, 'r', '-' };
246  int m2[] = { 1, S_IWRITE>>0, 'w', '-' };
247  int m3[] = { 2, S_ISUID, 's', S_IEXEC>>0, 'x', '-' };
248  int m4[] = { 1, S_IREAD>>3, 'r', '-' };
249  int m5[] = { 1, S_IWRITE>>3, 'w', '-' };
250  int m6[] = { 2, S_ISGID, 's', S_IEXEC>>3, 'x', '-' };
251  int m7[] = { 1, S_IREAD>>6, 'r', '-' };
252  int m8[] = { 1, S_IWRITE>>6, 'w', '-' };
253  int m9[] = { 2, S_ISVTX, 't', S_IEXEC>>6, 'x', '-' };
254
255  int *m[] = { m1, m2, m3, m4, m5, m6, m7, m8, m9};
256
257  pmode(aflag)                         void pmode(int aflag)
258  {
259      register int **mp;
260
261      flags = aflag;
262      for (mp = &m[0]; mp < &m[sizeof(m)/sizeof(m[0])];)
263          select(*mp++);
264  }
265
266  select(pairp)                        void select(register int
*pairp)
267  register int *pairp;
268  {
269      register int n;
270
271      n = *pairp++;
272      while (--n>=0 && (flags&*pairp++)==0)
273          pairp++;
274      putchar(*pairp);
275  }

Lines 245-275 print the file's permissions. The code is compact and rather elegant; it requires careful study.

Lines 245-253: The arrays m1 through m9 encode the permission bits to check for along with the corresponding characters to print. There is one array per character to print in the file mode. The first element of each array is the number of (permission, character) pairs encoded in that particular array. The final element is the character to print in the event that none of the given permission bits are found.
Note also how the permissions are specified as I_READ<<0, I_READ>>3, I_READ<<6 and so on. The individual constants for each bit (S_IRUSR, S_IRGRP, etc.) had not been invented yet. (See Table 4.5 in my book.)
Line 255: The m array points to each of the m1 through m9 arrays.
Lines 257-264: The pmode() function first sets the global variable flags to the passed-in parameter aflag. It then loops through the m array, passing each element to the select() function. The passed-in element represents one of the m1 to m9 arrays.
Lines 266-275: The select() function understands the layout of each m1 through m9 array. n is the number of pairs in the array (the first element); line 271 sets it. Lines 272-273 look for permission bits, checking the global variable flags set previously on line 261.

Note the use of the ++ operator, both in the loop test and in the loop body. The effect is to skip over pairs in the array as long as the permission bit in the first element of the pair is not found in flags.

When the loop ends, either the permission bit has been found, in which case pairp points at the second element of the pair, which is the correct character to print, or it has not been found, in which case pairp points at the default character. In either case, line 274 prints the character that pairp points to.

A final point worth noting is that in C, character constants (such as x) have type int, not char. This is different in C++: there, character constants do have type char. This difference does not affect this particular code. So there's no problem putting such constants into an integer array; everything works correctly.


277  char *                    _char *makename(char *dir, char *file)_
278  makename(dir, file)
279  char *dir, *file;
280  {
281      static char dfile[100];
282      register char *dp, *fp;
283      register int i;
284
285      dp = dfile;
286      fp = dir;
287      while (*fp)
288          *dp++ = *fp++;
289      *dp++ = '/';
290      fp = file;
291      for (i=0; i<DIRSIZ; i++)
292          *dp++ = *fp++;
293      *dp = 0;
294      return(dfile);
295  }

Lines 277-295 define the makename() function. Its job is to concatenate a directory name and a filename, separated by a slash character, and produce a string. It does this in the static buffer dfile. Note that dfile is only 100 characters long and that no error checking is done.

The code itself is straightforward, copying characters one at a time. makename() is used by the readdir() function.


297  readdir(dir)                               _void readdir(char*dir)_
298  char *dir;
299  {
300      static struct direct dentry;
301      register int j;
302      register struct lbuf *ep;
303
304      if ((dirf = fopen(dir, "r")) == NULL) {
305          printf("%s unreadable\n", dir);
306          return;
307      }
308      tblocks = 0;
309      for(;;) {
310          if (fread((char *)&dentry, sizeof(dentry), 1, dirf) != 1)
311              break;
312          if (dentry.d_ino==0
313           || aflg==0 && dentry.d_name[0]=='.' &&
(dentry.d_name[1]=='\0'
314              || dentry.d_name[1]=='.' && dentry.d_name[2]=='\0'))
315              continue;
316          ep = gstat(makename(dir, dentry.d_name), 0);
317          if (ep==NULL)
318              continue;
319          if (ep->lnum != -1)
320              ep->lnum = dentry.d_ino;
321          for (j=0; j<DIRSIZ; j++)
322              ep->ln.lname[j] = dentry.d_name[j];
323      }
324      fclose(dirf);
325  }

Lines 297-325 define the readdir() function, whose job is to read the contents of directories named on the command line.

Lines 304-307 open the directory for reading, returning if fopen() fails. Line 308 initializes the global variable tblocks to 0. This was used earlier (lines 153-154) to print the total number of blocks used by files in a directory.

Lines 309-323 are a loop that reads directory entries and adds them to the flist array. Lines 310-311 read one entry, exiting the loop upon end-of-file.

Lines 312-315 skip uninteresting entries. If the inode number is zero, this slot isn't used. Otherwise, if -a was not given and the filename is either . or .., skip it.

Lines 316-318 call gstat() with the full name of the file, and a second argument of false, indicating that it's not from the command line. gstat() updates the global lastp pointer and the flist array. A NULL return value indicates some sort of failure.

Lines 319-322 save the inode number and name in the struct lbuf. If ep->lnum comes back from gstat() set to -1, it means that the stat() operation on the file failed. Finally, line 324 closes the directory.

The following function, gstat() (lines 327-398), is the core function for the operation of retrieving and storing file information.


327  struct lbuf *                _struct lbuf *gstat(char *file, int argfl)_
328  gstat(file, argfl)
329  char *file;
330  {
331      extern char *malloc();
332      struct stat statb;
333      register struct lbuf *rep;
334      static int nomocore;
335
336      if (nomocore)            _Ran out of memory earlier_
337          return(NULL);
338      rep = (struct lbuf *)malloc(sizeof(struct lbuf));
339      if (rep==NULL) {
340          fprintf(stderr, "ls: out of memory\n");
341          nomocore = 1;
342          return(NULL);
343      }
344      if (lastp >= &flist[NFILES]) {
345          static int msg;      _Check whether too many files given_
346          lastp--;
347          if (msg==0) {
348              fprintf(stderr, "ls: too many files\n");
349              msg++;
350          }
351      }
352      *lastp++ = rep;          _Fill in information_
353      rep->lflags = 0;
354      rep->lnum = 0;
355      rep->ltype = '-';        _Default file type_

The static variable nomocore [sic] indicates that malloc() failed upon an earlier call. Since it's static, it's automatically initialized to 0 (that is, false). If it's true upon entry, gstat() just returns NULL. Otherwise, if malloc() fails, ls prints an error message, sets nomocore to true, and returns NULL (lines 334-343).

Lines 344-351 make sure that there is still room left in the flist array. If not, ls prints a message (but only once; note the use of the static variable msg), and then reuses the last slot in flist.

Line 352 makes the slot lastp points to point to the new struct lbuf (rep). This also updates lastp, which is used for sorting in main() (lines 142 and 152). Lines 353-355 set default values for the flags, inode number, and type fields in the struct lbuf.


356      if (argfl || statreq) {
357          if (stat(file, &statb)<0) {           _stat() failed_
358              printf("%s not found\n", file);
359              statb.st_ino = -1;
360              statb.st_size = 0;
361              statb.st_mode = 0;
362              if (argfl) {
363                  lastp--;
364                  return(0);
365              }
366          }
367          rep->lnum = statb.st_ino;             _stat() OK, copy info_
368          rep->lsize = statb.st_size;
369          switch(statb.st_mode&S_IFMT) {
370
371          case S_IFDIR:
372              rep->ltype = 'd';
373              break;
374
375          case S_IFBLK:
376              rep->ltype = 'b';
377              rep->lsize = statb.st_rdev;
378              break;
379
380          case S_IFCHR:
381              rep->ltype = 'c';
382              rep->lsize = statb.st_rdev;
383              break;
384          }
385          rep->lflags = statb.st_mode & ~S_IFMT;
386          rep->luid = statb.st_uid;
387          rep->lgid = statb.st_gid;
388          rep->lnl = statb.st_nlink;
389          if(uflg)
390              rep->lmtime = statb.st_atime;
391          else if (cflg)
392              rep->lmtime = statb.st_ctime;
393          else
394              rep->lmtime = statb.st_mtime;
395          tblocks += nblock(statb.st_size);
396      }
397      return(rep);
398  }

Lines 356-396 handle the call to stat(). If this is a command-line argument or if statreq is true because of an option, the code fills in the struct lbuf as follows:

Lines 357-366: Call stat() and if it fails, print an error message and set values as appropriate, then return NULL (expressed as 0).
Lines 367-368: Set the inode number and size fields from the struct stat if the stat() succeeded.
Lines 369-384: Handle the special cases of directory, block device, and character device. In all cases the code updates the ltype field. For devices, the lsize value is replaced with the ltype field. For devices, the lsize value is replaced with the st_rdev value.
Lines 385-388: Fill in the lflags, luid, lgid and lnl fields from the corresponding fields in the struct stat. Line 385 removes the file-type bits, leaving the 12 permissions bits (read/write/execute for user/group/other, and setuid, setgid, and save-text).
Lines 389-394: Based on command-line options, use one of the three time fields from the struct stat for the lmtime field in the struct lbuf.
Line 395: Update the global variable tblocks with the number of blocks in the file.


400  compar(pp1, pp2)                     _int compar(struct lbuf_
**pp1,
401  struct lbuf **pp1, **pp2;            _struct lbuf_
**pp2)
402  {
403      register struct lbuf *p1, *p2;
404
405      p1 = *pp1;
406      p2 = *pp2;
407      if (dflg==0) {
408          if (p1->lflags&ISARG && p1->ltype=='d') {
409              if (!(p2->lflags&ISARG && p2->ltype=='d'))
410                  return(1);
411          } else {
412              if (p2->lflags&ISARG && p2->ltype=='d')
413                  return(-1);
414          }
415      }
416      if (tflg) {
417          if(p2->lmtime == p1->lmtime)
418              return(0);
419          if(p2->lmtime > p1->lmtime)
420              return(rflg);
421          return(-rflg);
422      }
423      return(rflg * strcmp(p1->lflags&ISARG? p1->ln.namep:
p1->ln.lname,
424                  p2->lflags&ISARG? p2->ln.namep: p2->ln.lname));
425  }

The compar() function is dense: there's a lot happening in little space. The first thing to remember is the meaning of the return value: A negative value means that the first file should sort to an earlier spot in the array than the second, zero means the files are equal, and a positive value means that the second file should sort to an earlier spot than the first.

The next thing to understand is that ls prints the contents of directories after it prints information about files. Thus the result of sorting should be that all directories named on the command line follow all files named on the command line.

Finally, the rflg variable helps implement the -r option, which reverses the sorting order. It is initialized to 1 (line 30). If -r is used, rflg is set to -1 (lines 89-91).

The following pseudocode describes the logic of compar(); the line numbers in the left margin correspond to those of ls.c:


     407  if LS HAS TO READ DIRECTORIES  # dflg == 0
     408      if P1 IS A COMMAND-LINE ARG AND P1 IS A DIRECTORY
     409          if P2 IS NOT A COMMAND-LINE ARG AND IS NOT A DIRECTORY
     410              return 1    # first comes after second
                  else
                      FALL THROUGH TO TIME TEST
     411      else
                  # p1 is not a command-line directory
     412          if P2 IS A COMMAND-LINE ARG AND IS A DIRECTORY
     413              return -1   # first comes before second
                  else
                      FALL THROUGH TO TIME TEST

     416  if SORTING IS BASED ON TIME  # tflg is true
              # compare times:
     417      if P2'S TIME IS EQUAL TO P1'S TIME
     418          return 0
     419      if P2'S TIME > P1'S TIME
     420          RETURN THE VALUE OF RFLG (POSITIVE OR NEGATIVE)
              # p2's time < p1's time
     421      RETURN OPPOSITE OF RFLG (NEGATIVE OR POSITIVE)

     423  Multiply rflg by the result of strcmp()
     424  on the two names and return the result

The arguments to strcmp() on lines 423-424 look messy. What's going on is that different members of the ln union in the struct lbuf must be used, depending on whether the filename is a command-line argument or was read from a directory.

Summary

The V7 ls is a relatively small program, yet it touches on many of the fundamental aspects of Unix programming: file I/O, file metadata, directory contents, users and groups, time and date values, sorting and dynamic memory management.

The most notable external difference between V7 ls and modern ls is the treatment of the -a and -l options. The V7 version has many fewer options than do modern versions; a noticeable lack is the -R recursive option.

The management of flist is a clean way to use the limited memory of the PDP-11 architecture yet still provide as much information as possible. The struct lbuf nicely abstracts the information of interest from the struct stat; this simplifies the code considerably. The code for printing the nine permission bits is compact and elegant.

Some parts of ls use surprisingly small limits, such as the upper bound of 1024 on the number of files or the buffer size of 100 in makename().

Arnold Robbins is a professional programmer and technical author. He is the long-time maintainer of gawk, the GNU Project's version of the Awk programming language. He is the author of numerous well-known technical books, such as Linux Programming by Example: The Fundamentals, recently published by Prentice Hall, as well as Unix In A Nutshell, Learning the vi Editor and Effective awk Programming, published by O'Reilly Media Inc. He is happily married with four wonderful children. Arnold is an amateur Talmudist and particularly enjoys studying the Jerusalem Talmud. He can be reached at arnold@skeeve.com.

Load Disqus comments