GNU Awk 4.1: Teaching an Old Bird Some New Tricks, Part II
Why Do You Need Extensions?
Consider this: an
awk program cannot even change its working directory with
chdir system call!
awk is thus
a closed language—one that provides
you with only the facilities that the implementors chose to provide and
no more. That's not much fun. (Well,
awk is fun, but it's still limited.)
By contrast, modern scripting languages are all open and extensible;
Perl, Tcl, Python and Ruby all have thousands of available modules that can
be loaded at runtime. It's past time that
do that too.
What You Can Do from an Extension
It is best to think of extension functions as user-defined functions
written in another language. They cannot do everything a user-defined
function can (such as call an
awk function, manipulate the fields, read records
getline and so on), but what they can do is enough to make
gawk more open,
and let it interface with the underlying operating system and with
other C (or C++) libraries. In particular, you can:
Pass scalars by value and arrays by reference.
Create and modify new global variables and arrays.
Access the built-in variables (read-only, although you can update
Register a function to be called when
Print warning and/or fatal error messages.
Update the built-in variable
ERRNOfor when something goes wrong.
Hook into the I/O redirection mechanisms, providing your own "special" filenames and/or two-way communicators.
And of course, register new functions that can be called from
The API provides a number of data types to make it easier to communicate
gawk. For example,
gawk strings can contain embedded NUL characters
(all bits zero), so strings have a pointer and a length.
reference-counted strings internally, so there are ways to tell
to reuse a value it already knows about.
In addition, the API lets you "flatten"
awk's associative arrays into
an array of structs for easy iteration in C code, without having to call
gawk each time you want to move to the next element in an array.
A full description of the API is beyond the scope of this article; however, the manual includes a full chapter, with examples, describing the API and showing how to use it.
The extension mechanism has been designed to work on multiple operating
systems. At the time of this writing, it works on any *nix system that supports
dlopen() API. This includes Mac OS X. The basic mechanism also
works on Microsoft Windows using MinGW. However, support to build
the sample extensions was not included in the 4.1 release since it was
not ready. This support will be included in the first patch release,
whenever that will be, although not all of the sample extensions can work on
gawk distribution provides a number of small, sample extensions.
Their main purpose is to serve as examples of how to use the API, but
nonetheless they should be usable for real work also. The full list is
documented in the manual. Some of the more interesting ones are:
The "filefuncs" extension, which provides
stat()functions, and also an interface to the fts(3) suite of routines for walking a file hierarchy.
The "fnmatch" extension, which provides an
awkversion of the fnmatch(3) suite.
The "readdir" extension, which returns records for the contents of directories named on the
gawkcommand line or read with
getline. (Normally, it's a nonfatal error to try to read a directory. With other
awks, it's fatal.)
The "inplace" extension, which simulates the GNU
sed -ifeature for in-place editing of command-line data files.
Additional, more specialized extensions illustrate the use of parts of the API not covered by the extensions just listed.
The gawkextlib Project
gawk supports the major
xgawk features, the
have reoriented their project around their specific extensions. It no
longer includes the forked
gawk code base. To emphasize this change in
orientation, they renamed their project "gawkextlib".
It is their (and my) hope that this project can serve as a central
clearinghouse for new
gawk extensions that may be written
community over time.
The gawkextlib project currently has four extensions:
The XML extension, which adds several new variables and an input parser, letting
gawkparse XML files in a natural fashion. This extension is built on top of the Expat XML parser. This is a powerful extension; instead of having to try to parse XML files with regular expressions manually, the Expat parser does it for you, including all the icky validation stuff that would be really hard to do in straight
The PostgreSQL extension, which provides functions for talking to PostgreSQL databases.
The GD graphics library extension, for use with the GD graphics library (see Resources).
The MPFR library extension. This extension gives you access to a number of MPFR functions that are not accessible from
gawk's built-in MPFR support.
I feel that
gawk as a language has largely reached maturity, and do
not wish to add too many more features. That said, there are a few
items still open for exploration:
Additional numeric facilities, such as possible integration with a decimal arithmetic library.
A way to map
gawkarrays onto external storage, such as DBM arrays or SQL databases.
A "namespace" facility for extension functions and variables, and possibly regular
gawk-level variables and functions as well. This would be a major design activity.
Of course, describing the above items does not constitute a commitment to do any of them.
The new API and extension facility opens new horizons for
awk programmers. I am very excited about it, and I hope to see
gawk used for many new things where it simply was not applicable before.
Thanks to Scott Deifik, Dr Brian W. Kernighan, Dr Nelson Beebe and Eli Zaretskii for comments on the initial draft of this article.
gawk development team deserves kudos for their work on
this release. It was very much a team effort.
"GNU Awk 4.0: Teaching an Old Bird Some New Tricks", LJ, September 2011: http://www.linuxjournaldigital.com/linuxjournal/201109#pg94
gawk distribution: http://ftp.gnu.org/gnu/gawk/gawk-4.1.0.tar.gz
Documentation On-line: http://www.gnu.org/software/gawk/manual
Arbitrary Precision Arithmetic with
gawkextlib Home Page: http://gawkextlib.sourceforge.net
gawkextlib Download: http://sourceforge.net/projects/gawkextlib
The GD Graphics Library: http://www.boutell.com/gd/manual2.0.33.html
The Expat XML Parser: http://expat.sourceforge.net
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Sony Settles in Linux Battle
- Libarchive Security Flaw Discovered
- Profiles and RC Files
- Maru OS Brings Debian to Your Phone
- The Giant Zero, Part 0.x
- Snappy Moves to New Platforms
- Astronomy for KDE
- Git 2.9 Released
- Understanding Ceph and Its Place in the Market
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide