Book Excerpt: A Practical Guide to Linux Commands, Editors, and Shell Programming
The next example produces another report based on the cars file. This report uses nested if...else control structures to substitute values based on the contents of the price field. The program has no pattern part; it processes every record.
$ cat price_range
{
if ($5 <= 5000) $5 = "inexpensive"
else if (5000 < $5 && $5 < 10000) $5 = "please ask"
else if (10000 <= $5) $5 = "expensive"
#
printf "%-10s %-8s %2d %5d %-12s\n",\
$1, $2, $3, $4, $5
}
$ gawk -f price_range cars
plym fury 1970 73 inexpensive
chevy malibu 1999 60 inexpensive
ford mustang 1965 45 expensive
volvo s80 1998 102 please ask
ford thundbd 2003 15 expensive
chevy malibu 2000 50 inexpensive
bmw 325i 1985 115 inexpensive
honda accord 2001 30 please ask
ford taurus 2004 10 expensive
toyota rav4 2002 180 inexpensive
chevy impala 1985 85 inexpensive
ford explor 2003 25 please ask
Next the manuf associative array uses the contents of the first field of each record in the cars file as an index. The array consists of the elements manuf[plym], manuf[chevy], manuf[ford], and so on. Each new element is initialized to 0 (zero) as it is created. The ++ operator increments the variable it follows.
for structure
The action following the END pattern is a for structure, which loops through the elements of an associative array. A pipe sends the output through sort to produce an alphabetical list of cars and the quantities in stock. Because it is a shell script and not a gawk program file, you must have both read and execute permission to the manuf file to execute it as a command.
$ cat manuf
gawk ' {manuf[$1]++}
END {for (name in manuf) print name, manuf[name]}
' cars |
sort
$ ./manuf
bmw 1
chevy 3
ford 4
honda 1
plym 1
toyota 1
volvo 1
The next program, manuf.sh, is a more general shell script that includes error checking. This script lists and counts the contents of a column in a file, with both the column number and the name of the file specified on the command line.
The first action (the one that starts with {count) uses the shell variable $1 in the middle of the gawk program to specify an array index. Because of the way the single quotation marks are paired, the $1 that appears to be within single quotation marks is actually not quoted: The two quoted strings in the gawk program surround, but do not include, the $1. Because the $1 is not quoted, and because this is a shell script, the shell substitutes the value of the first command-line argument in place of $1 (page 441). As a result, the $1 is interpreted before the gawk command is invoked. The leading dollar sign (the one before the first single quotation mark on that line) causes gawk to interpret what the shell substitutes as a field number.
$ cat manuf.sh
if [ $# != 2 ]
then
echo "Usage: manuf.sh field file"
exit 1
fi
gawk < $2 '
{count[$'$1']++}
END {for (item in count) printf "%-20s%-20s\n",\
item, count[item]}' |
sort
$ ./manuf.sh
Usage: manuf.sh field file
$ ./manuf.sh 1 cars
bmw 1
chevy 3
ford 4
honda 1
plym 1
toyota 1
volvo 1
$ ./manuf.sh 3 cars
1965 1
1970 1
1985 2
1998 1
1999 1
2000 1
2001 1
2002 1
2003 2
2004 1
A way around the tricky use of quotation marks that allow parameter expansion within the gawk program is to use the –v option on the command line to pass the field number to gawk as a variable. This change makes it easier for someone else to read and debug the script. You call the manuf2.sh script the same way you call manuf.sh:
$ cat manuf2.sh
if [ $# != 2 ]
then
echo "Usage: manuf.sh field file"
exit 1
fi
gawk -v "field=$1" < $2 '
{count[$field]++}
END {for (item in count) printf "%-20s%-20s\n",\
item, count[item]}' |
sort
The word_usage script displays a word usage list for a file you specify on the command line. The tr utility (page 864) lists the words from standard input, one to a line. The sort utility orders the file, putting the most frequently used words first. The script sorts groups of words that are used the same number of times in alphabetical order.
$ cat word_usage
tr -cs 'a-zA-Z' '[\n]' < $1 |
gawk '
{count[$1]++}
END {for (item in count) printf "%-15s%3s\n", item, count[item]}' |
sort +1nr +0f -1
$ ./word_usage textfile
the 42
file 29
fsck 27
system 22
you 22
to 21
it 17
SIZE 14
and 13
MODE 13
...
Following is a similar program in a different format. The style mimics that of a C program and may be easier to read and work with for more complex gawk programs.
$ cat word_count
tr -cs 'a-zA-Z' '[\n]' < $1 |
gawk ' {
count[$1]++
}
END {
for (item in count)
{
if (count[item] > 4)
{
printf "%-15s%3s\n", item, count[item]
}
}
} ' |
sort +1nr +0f -1
The tail utility displays the last ten lines of output, illustrating that words occurring fewer than five times are not listed:
$ ./word_count textfile | tail directories 5 if 5 information 5 INODE 5 more 5 no 5 on 5 response 5 this 5 will 5
The next example shows one way to put a date on a report. The first line of input to the gawk program comes from date. The program reads this line as record number 1 (NR = = 1), processes it accordingly, and processes all subsequent lines with the action associated with the next pattern (NR > 1).
$ cat report
if (test $# = 0) then
echo "You must supply a filename."
exit 1
fi
(date; cat $1) |
gawk '
NR == 1 {print "Report for", $1, $2, $3 ", " $6}
NR > 1 {print $5 "\t" $1}'
$ ./report cars
Report for Mon Jan 31, 2010
2500 plym
3000 chevy
10000 ford
9850 volvo
10500 ford
3500 chevy
450 bmw
6000 honda
17000 ford
750 toyota
1550 chevy
9500 ford
The next example sums each of the columns in a file you specify on the command line; it takes its input from the numbers file. The program performs error checking, reporting on and discarding rows that contain nonnumeric entries. It uses the next command (with the comment skip bad records) to skip the rest of the commands for the current record if the record contains a nonnumeric entry. At the end of the program, gawk displays a grand total for the file.
$ cat numbers
10 20 30.3 40.5
20 30 45.7 66.1
30 xyz 50 70
40 75 107.2 55.6
50 20 30.3 40.5
60 30 45.O 66.1
70 1134.7 50 70
80 75 107.2 55.6
90 176 30.3 40.5
100 1027.45 45.7 66.1
110 123 50 57a.5
120 75 107.2 55.6
$ cat tally
gawk ' BEGIN {
ORS = ""
}
NR == 1 { # first record only
nfields = NF # set nfields to number of
} # fields in the record (NF)
{
if ($0 ~ /[^0-9. \t]/) # check each record to see if it contains
{ # any characters that are not numbers,
print "\nRecord " NR " skipped:\n\t" # periods, spaces, or TABs
print $0 "\n"
next # skip bad records
}
else
{
for (count = 1; count <= nfields; count++) # for good records loop through fields
{
printf "%10.2f", $count > "tally.out"
sum[count] += $count
gtotal += $count
}
print "\n" > "tally.out"
}
}
END { # after processing last record
for (count = 1; count <= nfields; count++) # print summary
{
print " -------" > "tally.out"
}
print "\n" > "tally.out"
for (count = 1; count <= nfields; count++)
{
printf "%10.2f", sum[count] > "tally.out"
}
print "\n\n Grand Total " gtotal "\n" > "tally.out"
} ' < numbers
$ ./tally
Record 3 skipped:
30 xyz 50 70
Record 6 skipped:
60 30 45.O 66.1
Record 11 skipped:
110 123 50 57a.5
$ cat tally.out
10.00 20.00 30.30 40.50
20.00 30.00 45.70 66.10
40.00 75.00 107.20 55.60
50.00 20.00 30.30 40.50
70.00 1134.70 50.00 70.00
80.00 75.00 107.20 55.60
90.00 176.00 30.30 40.50
100.00 1027.45 45.70 66.10
120.00 75.00 107.20 55.60
------- ------- ------- -------
580.00 2633.15 553.90 490.50
Grand Total 4257.55
The next example reads the passwd file, listing users who do not have passwords and users who have duplicate user ID numbers. (The pwck utility [Linux only] performs similar checks.) Because Mac OS X uses Open Directory and not the passwd file, this example will not work under OS X.
$ cat /etc/passwd
bill::102:100:ext 123:/home/bill:/bin/bash
roy:x:104:100:ext 475:/home/roy:/bin/bash
tom:x:105:100:ext 476:/home/tom:/bin/bash
lynn:x:166:100:ext 500:/home/lynn:/bin/bash
mark:x:107:100:ext 112:/home/mark:/bin/bash
sales:x:108:100:ext 102:/m/market:/bin/bash
anne:x:109:100:ext 355:/home/anne:/bin/bash
toni::164:100:ext 357:/home/toni:/bin/bash
ginny:x:115:100:ext 109:/home/ginny:/bin/bash
chuck:x:116:100:ext 146:/home/chuck:/bin/bash
neil:x:164:100:ext 159:/home/neil:/bin/bash
rmi:x:118:100:ext 178:/home/rmi:/bin/bash
vern:x:119:100:ext 201:/home/vern:/bin/bash
bob:x:120:100:ext 227:/home/bob:/bin/bash
janet:x:122:100:ext 229:/home/janet:/bin/bash
maggie:x:124:100:ext 244:/home/maggie:/bin/bash
dan::126:100::/home/dan:/bin/bash
dave:x:108:100:ext 427:/home/dave:/bin/bash
mary:x:129:100:ext 303:/home/mary:/bin/bash
$ cat passwd_check
gawk < /etc/passwd ' BEGIN {
uid[void] = "" # tell gawk that uid is an array
}
{ # no pattern indicates process all records
dup = 0 # initialize duplicate flag
split($0, field, ":") # split into fields delimited by ":"
if (field[2] == "") # check for null password field
{
if (field[5] == "") # check for null info field
{
print field[1] " has no password."
}
else
{
print field[1] " ("field[5]") has no password."
}
}
for (name in uid) # loop through uid array
{
if (uid[name] == field[3]) # check for second use of UID
{
print field[1] " has the same UID as " name " : UID = " uid[name]
dup = 1 # set duplicate flag
}
}
if (!dup) # same as if (dup == 0)
# assign UID and login name to uid array
{
uid[field[1]] = field[3]
}
}'
$ ./passwd_check
bill (ext 123) has no password.
toni (ext 357) has no password.
neil has the same UID as toni : UID = 164
dan has no password.
dave has the same UID as sales : UID = 108
The next example shows a complete interactive shell script that uses gawk to generate a report on the cars file based on price ranges:
$ cat list_cars
trap 'rm -f $$.tem > /dev/null;echo $0 aborted.;exit 1' 1 2 15
echo -n "Price range (for example, 5000 7500):"
read lowrange hirange
echo '
Miles
Make Model Year (000) Price
--------------------------------------------------' > $$.tem
gawk < cars '
$5 >= '$lowrange' && $5 <= '$hirange' {
if ($1 ~ /ply/) $1 = "plymouth"
if ($1 ~ /chev/) $1 = "chevrolet"
printf "%-10s %-8s %2d %5d $ %8.2f\n", $1, $2, $3, $4,
$5
}' | sort -n +5 >> $$.tem
cat $$.tem
rm $$.tem
$ ./list_cars
Price range (for example, 5000 7500):3000 8000
Miles
Make Model Year (000) Price
--------------------------------------------------
chevrolet malibu 1999 60 $ 3000.00
chevrolet malibu 2000 50 $ 3500.00
honda accord 2001 30 $ 6000.00
$ ./list_cars
Price range (for example, 5000 7500):0 2000
Miles
Make Model Year (000) Price
--------------------------------------------------
bmw 325i 1985 115 $ 450.00
toyota rav4 2002 180 $ 750.00
chevrolet impala 1985 85 $ 1550.00
$ ./list_cars
Price range (for example, 5000 7500):15000 100000
Miles
Make Model Year (000) Price
--------------------------------------------------
ford taurus 2004 10 $ 17000.00
optional
Advanced gawk Programming
This section discusses some of the advanced features of AWK. It covers how to control input using the getline statement, how to use a coprocess to exchange information between gawk and a program running in the background, and how to use a coprocess to exchange data over a network. Coprocesses are available under gawk only; they are not available under awk and mawk.
getline: Controlling Input
Using the getline statement gives you more control over the data gawk reads than other methods of input do. When you provide a variable name as an argument to getline, getline reads data into that variable. The BEGIN block of the g1 program uses getline to read one line into the variable aa from standard input:
$ cat g1 BEGIN { getline aa print aa } $ echo aaaa | gawk -f g1 aaaaThe next few examples use the alpha file:
$ cat alpha aaaaaaaaa bbbbbbbbb ccccccccc dddddddddEven when g1 is given more than one line of input, it processes only the first line:
$ gawk -f g1 < alpha aaaaaaaaaWhen getline is not given an argument, it reads input into $0 and modifies the field variables ($1, $2, . . .):
$ gawk 'BEGIN {getline;print $1}' < alpha aaaaaaaaaThe g2 program uses a while loop in the BEGIN block to loop over the lines in standard input. The getline statement reads each line into holdme and print outputs each value of holdme.
$ cat g2 BEGIN { while (getline holdme) print holdme } $ gawk -f g2 < alpha aaaaaaaaa bbbbbbbbb ccccccccc dddddddddThe g3 program demonstrates that gawk automatically reads each line of input into $0 when it has statements in its body (and not just a BEGIN block). This program outputs the record number (NR), the string $0:, and the value of $0 (the current record) for each line of input.
$ cat g3 {print NR, "$0:", $0} $ gawk -f g3 < alpha 1 $0: aaaaaaaaa 2 $0: bbbbbbbbb 3 $0: ccccccccc 4 $0: dddddddddNext g4 demonstrates that getline works independently of gawk’s automatic reads and $0. When getline reads data into a variable, it does not modify either $0 or any of the fields in the current record ($1, $2, . . .). The first statement in g4, which is the same as the statement in g3, outputs the line that gawk has automatically read. The getline statement reads the next line of input into the variable named aa. The third statement outputs the record number, the string aa:, and the value of aa. The output from g4 shows that getline processes records independently of gawk’s automatic reads.
$ cat g4 { print NR, "$0:", $0 getline aa print NR, "aa:", aa } $ gawk -f g4 < alpha 1 $0: aaaaaaaaa 2 aa: bbbbbbbbb 3 $0: ccccccccc 4 aa: dddddddddThe g5 program outputs each line of input except for those lines that begin with the letter b. The first print statement outputs each line that gawk reads automatically. Next the /^b/ pattern selects all lines that begin with b for special processing. The action uses getline to read the next line of input into the variable hold, outputs the string skip this line: followed by the value of hold, and outputs the value of $1. The $1 holds the value of the first field of the record that gawk read automatically, not the record read by getline. The final statement displays a string and the value of NR, the current record number. Even though getline does not change $0 when it reads data into a variable, gawk increments NR.
$ cat g5 # print all lines except those read with getline {print "line #", NR, $0} # if line begins with "b" process it specially /^b/ { # use getline to read the next line into variable named hold getline hold # print value of hold print "skip this line:", hold # $0 is not affected when getline reads data into a variable # $1 still holds previous value print "previous line began with:", $1 } { print ">>>> finished processing line #", NR print "" }$ gawk -f g5 < alpha line # 1 aaaaaaaaa >>>> finished processing line # 1 line # 2 bbbbbbbbb skip this line: ccccccccc previous line began with: bbbbbbbbb >>>> finished processing line # 3 line # 4 ddddddddd >>>> finished processing line # 4
© Copyright 2010 Mark G. Sobell. All rights reserved.
- « first
- ‹ previous
- 1
- 2
- 3
- 4
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Linux Systems Administrator
- New Products
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Have you tried Boxen? It's a
58 min 55 sec ago - seo services in india
5 hours 30 min ago - For KDE install kio-mtp
5 hours 31 min ago - Evernote is much more...
7 hours 31 min ago - Reply to comment | Linux Journal
16 hours 16 min ago - Dynamic DNS
16 hours 50 min ago - Reply to comment | Linux Journal
17 hours 49 min ago - Reply to comment | Linux Journal
18 hours 39 min ago - Not free anymore
22 hours 41 min ago - Great
1 day 2 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Practical Guide to Linux
Great guide and tuto !
I needed learn command for Linux. I just start use this os and want see all possibility.
I think you can do more things if you understant how work basic interface.
Thank's again.
Good week end :)
Great book
I bought this book to learn Linux commands. It's quite easy to undersand. I recommand this book !
Vince from Roulette Website
excelent subject
great article with great tuto, thanks for your share and your time which you spend for us !
Nico from : guide de jeux
thanks dear,
thanks dear,
I like this site, simply
I like this site, simply amazing.I bookmark and check back soon. Please check out my site as well and let me know what you think.
Book
There really is a lot of detail in this one article. How many pages was this?! Anyway, it is filled with some very useful information. Thanks for taking the time to research and post it for us.