Book Excerpt: A Practical Guide to Linux Commands, Editors, and Shell Programming

The next example produces another report based on the cars file. This report uses nested if...else control structures to substitute values based on the contents of the price field. The program has no pattern part; it processes every record.

$ cat price_range
    {
    if             ($5 <= 5000)               $5 = "inexpensive"
        else if    (5000 < $5 && $5 < 10000)  $5 = "please ask"
        else if    (10000 <= $5)              $5 = "expensive"
    #
    printf "%-10s %-8s    %2d    %5d    %-12s\n",\
    $1, $2, $3, $4, $5
    }

$ gawk -f price_range cars
plym       fury        1970       73    inexpensive
chevy      malibu      1999       60    inexpensive
ford       mustang     1965       45    expensive
volvo      s80         1998      102    please ask
ford       thundbd     2003       15    expensive
chevy      malibu      2000       50    inexpensive
bmw        325i        1985      115    inexpensive
honda      accord      2001       30    please ask
ford       taurus      2004       10    expensive
toyota     rav4        2002      180    inexpensive
chevy      impala      1985       85    inexpensive
ford       explor      2003       25    please ask

Associative arrays

Next the manuf associative array uses the contents of the first field of each record in the cars file as an index. The array consists of the elements manuf[plym], manuf[chevy], manuf[ford], and so on. Each new element is initialized to 0 (zero) as it is created. The ++ operator increments the variable it follows.

for structure

The action following the END pattern is a for structure, which loops through the elements of an associative array. A pipe sends the output through sort to produce an alphabetical list of cars and the quantities in stock. Because it is a shell script and not a gawk program file, you must have both read and execute permission to the manuf file to execute it as a command.

$ cat manuf
gawk '      {manuf[$1]++}
END     {for (name in manuf) print name, manuf[name]}
' cars |
sort

$ ./manuf
bmw 1
chevy 3
ford 4
honda 1
plym 1
toyota 1
volvo 1

The next program, manuf.sh, is a more general shell script that includes error checking. This script lists and counts the contents of a column in a file, with both the column number and the name of the file specified on the command line.

The first action (the one that starts with {count) uses the shell variable $1 in the middle of the gawk program to specify an array index. Because of the way the single quotation marks are paired, the $1 that appears to be within single quotation marks is actually not quoted: The two quoted strings in the gawk program surround, but do not include, the $1. Because the $1 is not quoted, and because this is a shell script, the shell substitutes the value of the first command-line argument in place of $1 (page 441). As a result, the $1 is interpreted before the gawk command is invoked. The leading dollar sign (the one before the first single quotation mark on that line) causes gawk to interpret what the shell substitutes as a field number.

$ cat manuf.sh
if [ $# != 2 ]
    then
        echo "Usage: manuf.sh field file"
        exit 1
fi
gawk < $2 '
        {count[$'$1']++}
END     {for (item in count) printf "%-20s%-20s\n",\
            item, count[item]}' |
sort
$ ./manuf.sh
Usage: manuf.sh field file

$ ./manuf.sh 1 cars
bmw                 1
chevy               3
ford                4
honda               1
plym                1
toyota              1
volvo               1

$ ./manuf.sh 3 cars
1965                1
1970                1
1985                2
1998                1
1999                1
2000                1
2001                1
2002                1
2003                2
2004                1

A way around the tricky use of quotation marks that allow parameter expansion within the gawk program is to use the –v option on the command line to pass the field number to gawk as a variable. This change makes it easier for someone else to read and debug the script. You call the manuf2.sh script the same way you call manuf.sh:

$ cat manuf2.sh
if [ $# != 2 ]
        then
                echo "Usage: manuf.sh field file"
                exit 1
fi
gawk -v "field=$1" < $2 '
                {count[$field]++}
END             {for (item in count) printf "%-20s%-20s\n",\
                        item, count[item]}' |
sort

The word_usage script displays a word usage list for a file you specify on the command line. The tr utility (page 864) lists the words from standard input, one to a line. The sort utility orders the file, putting the most frequently used words first. The script sorts groups of words that are used the same number of times in alphabetical order.

$ cat word_usage
tr -cs 'a-zA-Z' '[\n]' < $1 |
gawk        '
        {count[$1]++}
END     {for (item in count) printf "%-15s%3s\n", item, count[item]}' |
sort +1nr +0f -1
$ ./word_usage textfile
the             42
file            29
fsck            27
system          22
you             22
to              21
it              17
SIZE            14
and             13
MODE            13
...

Following is a similar program in a different format. The style mimics that of a C program and may be easier to read and work with for more complex gawk programs.

$ cat word_count
tr -cs 'a-zA-Z' '[\n]' < $1 |
gawk '      {
        count[$1]++
}
END     {
        for (item in count)
            {
            if (count[item] > 4)
                {
                printf "%-15s%3s\n", item, count[item]
                }
        }
} ' |
sort +1nr +0f -1

The tail utility displays the last ten lines of output, illustrating that words occurring fewer than five times are not listed:

$ ./word_count textfile | tail
directories      5
if               5
information      5
INODE            5
more             5
no               5
on               5
response         5
this             5
will             5

The next example shows one way to put a date on a report. The first line of input to the gawk program comes from date. The program reads this line as record number 1 (NR = = 1), processes it accordingly, and processes all subsequent lines with the action associated with the next pattern (NR > 1).

$ cat report
if (test $# = 0) then
    echo "You must supply a filename."
    exit 1
fi
(date; cat $1) |
gawk '
NR == 1         {print "Report for", $1, $2, $3 ", " $6}
NR >  1          {print $5 "\t" $1}'

$ ./report cars
Report for Mon Jan 31, 2010
2500    plym
3000    chevy
10000   ford
9850    volvo
10500   ford
3500    chevy
450     bmw
6000    honda
17000   ford
750     toyota
1550    chevy
9500    ford

The next example sums each of the columns in a file you specify on the command line; it takes its input from the numbers file. The program performs error checking, reporting on and discarding rows that contain nonnumeric entries. It uses the next command (with the comment skip bad records) to skip the rest of the commands for the current record if the record contains a nonnumeric entry. At the end of the program, gawk displays a grand total for the file.

$ cat numbers
10      20      30.3    40.5
20      30      45.7    66.1
30      xyz     50      70
40      75      107.2   55.6
50      20      30.3    40.5
60      30      45.O    66.1
70      1134.7  50      70
80      75      107.2   55.6
90      176     30.3    40.5
100     1027.45 45.7    66.1
110     123     50      57a.5
120     75      107.2   55.6

$ cat tally
gawk '      BEGIN       {
                ORS = ""
                }

NR == 1     {                                   # first record only
    nfields = NF                                # set nfields to number of
    }                                           # fields in the record (NF)
    {
    if ($0 ~ /[^0-9. \t]/)                      # check each record to see if it contains
        {                                       # any characters that are not numbers,
        print "\nRecord " NR " skipped:\n\t"    # periods, spaces, or TABs
        print $0 "\n"
        next                                    # skip bad records
        }
    else
        {
        for (count = 1; count <= nfields; count++)       # for good records loop through fields
            {
            printf "%10.2f", $count > "tally.out"
            sum[count] += $count
            gtotal += $count
            }
        print "\n" > "tally.out"
        }
    }

END     {                                            # after processing last record
    for (count = 1; count <= nfields; count++)       # print summary
        {
        print "   -------" > "tally.out"
        }
    print "\n" > "tally.out"
    for (count = 1; count <= nfields; count++)
        {
        printf "%10.2f", sum[count] > "tally.out"
        }
    print "\n\n        Grand Total " gtotal "\n" > "tally.out"
} ' < numbers
$ ./tally

Record 3 skipped:
        30      xyz     50      70

Record 6 skipped:
        60      30      45.O    66.1

Record 11 skipped:
        110     123     50      57a.5

$ cat tally.out
     10.00     20.00     30.30     40.50
     20.00     30.00     45.70     66.10
     40.00     75.00    107.20     55.60
     50.00     20.00     30.30     40.50
     70.00   1134.70     50.00     70.00
     80.00     75.00    107.20     55.60
     90.00    176.00     30.30     40.50
    100.00   1027.45     45.70     66.10
    120.00     75.00    107.20     55.60
   -------   -------   -------   -------
    580.00   2633.15    553.90    490.50

        Grand Total 4257.55

The next example reads the passwd file, listing users who do not have passwords and users who have duplicate user ID numbers. (The pwck utility [Linux only] performs similar checks.) Because Mac OS X uses Open Directory and not the passwd file, this example will not work under OS X.

$ cat /etc/passwd
bill::102:100:ext 123:/home/bill:/bin/bash
roy:x:104:100:ext 475:/home/roy:/bin/bash
tom:x:105:100:ext 476:/home/tom:/bin/bash
lynn:x:166:100:ext 500:/home/lynn:/bin/bash
mark:x:107:100:ext 112:/home/mark:/bin/bash
sales:x:108:100:ext 102:/m/market:/bin/bash
anne:x:109:100:ext 355:/home/anne:/bin/bash
toni::164:100:ext 357:/home/toni:/bin/bash
ginny:x:115:100:ext 109:/home/ginny:/bin/bash
chuck:x:116:100:ext 146:/home/chuck:/bin/bash
neil:x:164:100:ext 159:/home/neil:/bin/bash
rmi:x:118:100:ext 178:/home/rmi:/bin/bash
vern:x:119:100:ext 201:/home/vern:/bin/bash
bob:x:120:100:ext 227:/home/bob:/bin/bash
janet:x:122:100:ext 229:/home/janet:/bin/bash
maggie:x:124:100:ext 244:/home/maggie:/bin/bash
dan::126:100::/home/dan:/bin/bash
dave:x:108:100:ext 427:/home/dave:/bin/bash
mary:x:129:100:ext 303:/home/mary:/bin/bash
$ cat passwd_check
gawk < /etc/passwd '     BEGIN   {
    uid[void] = ""                          # tell gawk that uid is an array
    }
    {                                       # no pattern indicates process all records
    dup = 0                                 # initialize duplicate flag
    split($0, field, ":")                   # split into fields delimited by ":"
    if (field[2] == "")                     # check for null password field
        {
        if (field[5] == "")                 # check for null info field
            {
            print field[1] " has no password."
            }
        else
            {
            print field[1] " ("field[5]") has no password."
            }
        }
    for (name in uid)                       # loop through uid array
        {
        if (uid[name] == field[3])          # check for second use of UID
            {
            print field[1] " has the same UID as " name " : UID = " uid[name]
            dup = 1                         # set duplicate flag
            }
        }
    if (!dup)                               # same as if (dup == 0)
                                            # assign UID and login name to uid array
        {
        uid[field[1]] = field[3]
        }
    }'
$ ./passwd_check
bill (ext 123) has no password.
toni (ext 357) has no password.
neil has the same UID as toni : UID = 164
dan has no password.
dave has the same UID as sales : UID = 108

The next example shows a complete interactive shell script that uses gawk to generate a report on the cars file based on price ranges:

$ cat list_cars
trap 'rm -f $$.tem > /dev/null;echo $0 aborted.;exit 1' 1 2 15
echo -n "Price range (for example, 5000 7500):"
read lowrange hirange

echo '
                               Miles
Make       Model       Year    (000)         Price
--------------------------------------------------' > $$.tem
gawk < cars '
$5 >= '$lowrange' && $5 <= '$hirange' {
        if ($1 ~ /ply/)  $1 = "plymouth"
        if ($1 ~ /chev/) $1 = "chevrolet"
        printf "%-10s %-8s    %2d    %5d    $ %8.2f\n", $1, $2, $3, $4,
$5
        }' | sort -n +5 >> $$.tem
cat $$.tem
rm $$.tem

$ ./list_cars
Price range (for example, 5000 7500):3000 8000

                               Miles
Make       Model       Year    (000)         Price
--------------------------------------------------
chevrolet  malibu      1999       60    $  3000.00
chevrolet  malibu      2000       50    $  3500.00
honda      accord      2001       30    $  6000.00

$ ./list_cars
Price range (for example, 5000 7500):0 2000

                               Miles
Make       Model       Year    (000)         Price
--------------------------------------------------
bmw        325i        1985      115    $   450.00
toyota     rav4        2002      180    $   750.00
chevrolet  impala      1985       85    $  1550.00

$ ./list_cars
Price range (for example, 5000 7500):15000 100000

                               Miles
Make       Model       Year    (000)         Price
--------------------------------------------------
ford       taurus      2004       10    $ 17000.00

optional

Advanced gawk Programming

This section discusses some of the advanced features of AWK. It covers how to control input using the getline statement, how to use a coprocess to exchange information between gawk and a program running in the background, and how to use a coprocess to exchange data over a network. Coprocesses are available under gawk only; they are not available under awk and mawk.

getline: Controlling Input

Using the getline statement gives you more control over the data gawk reads than other methods of input do. When you provide a variable name as an argument to getline, getline reads data into that variable. The BEGIN block of the g1 program uses getline to read one line into the variable aa from standard input:

$ cat g1
BEGIN   {
        getline aa
        print aa
        }
$ echo aaaa | gawk -f g1
aaaa

The next few examples use the alpha file:

$ cat alpha
aaaaaaaaa
bbbbbbbbb
ccccccccc
ddddddddd

Even when g1 is given more than one line of input, it processes only the first line:

$ gawk -f g1 < alpha
aaaaaaaaa

When getline is not given an argument, it reads input into $0 and modifies the field variables ($1, $2, . . .):

$ gawk 'BEGIN {getline;print $1}' < alpha
aaaaaaaaa

The g2 program uses a while loop in the BEGIN block to loop over the lines in standard input. The getline statement reads each line into holdme and print outputs each value of holdme.

$ cat g2
BEGIN       {
        while (getline holdme)
            print holdme
        }
$ gawk -f g2 < alpha
aaaaaaaaa
bbbbbbbbb
ccccccccc
ddddddddd

The g3 program demonstrates that gawk automatically reads each line of input into $0 when it has statements in its body (and not just a BEGIN block). This program outputs the record number (NR), the string $0:, and the value of $0 (the current record) for each line of input.

$ cat g3
        {print NR, "$0:", $0}

$ gawk -f g3 < alpha
1 $0: aaaaaaaaa
2 $0: bbbbbbbbb
3 $0: ccccccccc
4 $0: ddddddddd

Next g4 demonstrates that getline works independently of gawk’s automatic reads and $0. When getline reads data into a variable, it does not modify either $0 or any of the fields in the current record ($1, $2, . . .). The first statement in g4, which is the same as the statement in g3, outputs the line that gawk has automatically read. The getline statement reads the next line of input into the variable named aa. The third statement outputs the record number, the string aa:, and the value of aa. The output from g4 shows that getline processes records independently of gawk’s automatic reads.

$ cat g4
        {
        print NR, "$0:", $0
        getline aa
        print NR, "aa:", aa
        }

$ gawk -f g4 < alpha
1 $0: aaaaaaaaa
2 aa: bbbbbbbbb
3 $0: ccccccccc
4 aa: ddddddddd

The g5 program outputs each line of input except for those lines that begin with the letter b. The first print statement outputs each line that gawk reads automatically. Next the /^b/ pattern selects all lines that begin with b for special processing. The action uses getline to read the next line of input into the variable hold, outputs the string skip this line: followed by the value of hold, and outputs the value of $1. The $1 holds the value of the first field of the record that gawk read automatically, not the record read by getline. The final statement displays a string and the value of NR, the current record number. Even though getline does not change $0 when it reads data into a variable, gawk increments NR.

$ cat g5
        # print all lines except those read with getline
        {print "line #", NR, $0}

# if line begins with "b" process it specially
/^b/    {
        # use getline to read the next line into variable named hold
        getline hold

        # print value of hold
        print "skip this line:", hold

        # $0 is not affected when getline reads data into a variable
        # $1 still holds previous value
        print "previous line began with:", $1
        }

        {
        print ">>>> finished processing line #", NR
        print ""
        }
$ gawk -f g5 < alpha
line # 1 aaaaaaaaa
>>>> finished processing line # 1

line # 2 bbbbbbbbb
skip this line: ccccccccc
previous line began with: bbbbbbbbb
>>>> finished processing line # 3

line # 4 ddddddddd
>>>> finished processing line # 4


© Copyright 2010 Mark G. Sobell. All rights reserved.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Practical Guide to Linux

Antivirus's picture

Great guide and tuto !

I needed learn command for Linux. I just start use this os and want see all possibility.

I think you can do more things if you understant how work basic interface.

Thank's again.

Good week end :)

Great book

Wil20's picture

I bought this book to learn Linux commands. It's quite easy to undersand. I recommand this book !

Vince from Roulette Website

excelent subject

evenstood's picture

great article with great tuto, thanks for your share and your time which you spend for us !

Nico from : guide de jeux

thanks dear,

 Self Dumping Hopper's picture

thanks dear,

I like this site, simply

 Self Dumping Hopper's picture

I like this site, simply amazing.I bookmark and check back soon. Please check out my site as well and let me know what you think.

Book

Gilbert's picture

There really is a lot of detail in this one article. How many pages was this?! Anyway, it is filled with some very useful information. Thanks for taking the time to research and post it for us.

An online casino for the ages...I hope that I can continue casino gratuit and casino en ligne gambling here witht the casino software that looks pretty good and offers the progressive slots.
White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState