Introduction to Gawk
The control statements in the gawk language closely resemble those found in C, thus making gawk more easily written and understood by C programmers. gawk contains the pre- and post-increment and decrement operators ++ and --, as well as an if-else statement that looks very much like the one found in C. Also multi-line blocks of code are grouped within { and }. Even the for loop seems to have been taken right out of a C programming book.
This allows you to “mix and match” code which takes advantage of gawk's pattern matching with code that uses more traditional control structures, so if patterns are not sufficient for your task (or you are not sure how to use them to accomplish your task) you can use standard programming techniques as well. Conventional programming with gawk is not covered here; the gawk info page (run info gawk) documents this well, and the goal of this article is to demonstrate gawk's distinguishing features.
Another timesaving feature of gawk is that there is no need to declare a variable before using it. A variable can be a string, an integer, or a floating point number depending on the value assigned to it. gawk will handle conversions for you automatically. As a result, an expression such as total = 2 + "3" is valid and will give the expected result, 5. To make your job even easier, gawk will initialize each variable when it is used for the first time, setting it to 0 for an integer or "" for an integer or a string, respectively. This takes away any worries about uninitialized variables.
gawk also carries this ease of use of variables to arrays. There is no need to declare an array before using it, or even to specify a maximum size for that array. To create an array, simply use it and gawk will allocate the required space for you. As you add more data to the array, its size will automatically expand to accomodate it.
However, the array indices in gawk differ from those in languages such as C, in that gawk indices are associative, rather than numeric.
In an associative array, the array index is associated with the value assigned to it. This means that you can write expressions such as theArray["text"]="this is a line". If you wish, you can still use an integer as the index, as in theArray[50] = "some value". It is also possible to use a mixture of strings, integers, and even floating point numbers as indices in the same array, since gawk treats all indices as strings. So the expression theArray[50] = "some value" is equivalent to theArray["50"] = "some value".
To make working with arrays as easy as possible, awk provides the programmer with several powerful array operators. For example, to test whether a value is present in an array you can use the in operator. For example:
if (someValue in theArray) {
# action to take if somevalue is in theArray
}
else {
# an alternate action if it is not present
}
To perform an action on all values in an array, such as printing each value contained in it, you can use a variation of the for loop, for example:
for (i in theArray) print i
gawk sets the variable i to the next value in theArray on each pass through the loop and then prints it.
To remove a value from an array, simply use the delete operator. For example, delete theArray["word"] will remove "word" from theArray.
With associative arrays, you can quickly build powerful applications without concern for the traditional overhead of declaring the array, allocating the memory, or searching for an item in the array. And size is not a factor—the following gawk program easily read and stored all 45,101 words from the file /usr/dict/words into an associative array (in this case, using the number of the current line as the array index):
{ words[NR] = $1 }
END { print NR " words read" }
Such a task would be much more involved in C, as you would need to determine how you want to store all the words (An array declared with a size sufficient for all 45101 character strings? A linked list? A binary tree?). You may argue that with C you are free to choose a data structure which will provide much more efficient memory allocation and faster access speed than is possible with an associative array. While this may be true, it does not tell the whole story—it will certainly take you some time to write and test this C program (and very likely, more time to debug it). The power of the associative arrays and the simple, transparent memory management built into gawk means that you are free from dealing with such concerns—just tell gawk what you want and it handles much of the hard work behind the scenes.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- A Topic for Discussion - Open Source Feature-Richness?
- Dart: a New Web Programming Experience
- Developer Poll
- What's the tweeting protocol?
- May 2013 Issue of Linux Journal: Raspberry Pi
- Reply to comment | Linux Journal
1 hour 21 min ago - Reply to comment | Linux Journal
2 hours 38 min ago - great post
3 hours 13 min ago - Google Docs
3 hours 35 min ago - Reply to comment | Linux Journal
8 hours 24 min ago - Reply to comment | Linux Journal
9 hours 10 min ago - Web Hosting IQ
10 hours 44 min ago - Thanks for taking the time to
12 hours 21 min ago - Linux is good
14 hours 19 min ago - Reply to comment | Linux Journal
14 hours 36 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
How Slow??
Hi,
You say gawk is slower than Perl. Do you know how much slower? Are there any benchmarks? I've heared that there is an AWK compiler. Do you know anything about it?