GNU Awk 4.1: Teaching an Old Bird Some New Tricks, Part II

Why Do You Need Extensions?

Consider this: an awk program cannot even change its working directory with the chdir system call! awk is thus a closed language—one that provides you with only the facilities that the implementors chose to provide and no more. That's not much fun. (Well, awk is fun, but it's still limited.)

By contrast, modern scripting languages are all open and extensible; Perl, Tcl, Python and Ruby all have thousands of available modules that can be loaded at runtime. It's past time that gawk could do that too.

What You Can Do from an Extension

It is best to think of extension functions as user-defined functions written in another language. They cannot do everything a user-defined function can (such as call an awk function, manipulate the fields, read records with getline and so on), but what they can do is enough to make gawk more open, and let it interface with the underlying operating system and with other C (or C++) libraries. In particular, you can:

  • Pass scalars by value and arrays by reference.

  • Create and modify new global variables and arrays.

  • Access the built-in variables (read-only, although you can update PROCINFO).

  • Register a function to be called when gawk exits.

  • Print warning and/or fatal error messages.

  • Update the built-in variable ERRNO for when something goes wrong.

  • Hook into the I/O redirection mechanisms, providing your own "special" filenames and/or two-way communicators.

  • And of course, register new functions that can be called from gawk.

The API provides a number of data types to make it easier to communicate with gawk. For example, gawk strings can contain embedded NUL characters (all bits zero), so strings have a pointer and a length. gawk maintains reference-counted strings internally, so there are ways to tell gawk to reuse a value it already knows about.

In addition, the API lets you "flatten" awk's associative arrays into an array of structs for easy iteration in C code, without having to call into gawk each time you want to move to the next element in an array.

A full description of the API is beyond the scope of this article; however, the manual includes a full chapter, with examples, describing the API and showing how to use it.

OS Independence

The extension mechanism has been designed to work on multiple operating systems. At the time of this writing, it works on any *nix system that supports the POSIX dlopen() API. This includes Mac OS X. The basic mechanism also works on Microsoft Windows using MinGW. However, support to build the sample extensions was not included in the 4.1 release since it was not ready. This support will be included in the first patch release, whenever that will be, although not all of the sample extensions can work on Windows.

Sample Extensions

The gawk distribution provides a number of small, sample extensions. Their main purpose is to serve as examples of how to use the API, but nonetheless they should be usable for real work also. The full list is documented in the manual. Some of the more interesting ones are:

  • The "filefuncs" extension, which provides chdir() and stat() functions, and also an interface to the fts(3) suite of routines for walking a file hierarchy.

  • The "fnmatch" extension, which provides an awk version of the fnmatch(3) suite.

  • The "readdir" extension, which returns records for the contents of directories named on the gawk command line or read with getline. (Normally, it's a nonfatal error to try to read a directory. With other awks, it's fatal.)

  • The "inplace" extension, which simulates the GNU sed -i feature for in-place editing of command-line data files.

Additional, more specialized extensions illustrate the use of parts of the API not covered by the extensions just listed.

The gawkextlib Project

Now that gawk supports the major xgawk features, the xgawk developers have reoriented their project around their specific extensions. It no longer includes the forked gawk code base. To emphasize this change in orientation, they renamed their project "gawkextlib".

It is their (and my) hope that this project can serve as a central clearinghouse for new gawk extensions that may be written by the awk community over time.

The gawkextlib project currently has four extensions:

  • The XML extension, which adds several new variables and an input parser, letting gawk parse XML files in a natural fashion. This extension is built on top of the Expat XML parser. This is a powerful extension; instead of having to try to parse XML files with regular expressions manually, the Expat parser does it for you, including all the icky validation stuff that would be really hard to do in straight awk code.

  • The PostgreSQL extension, which provides functions for talking to PostgreSQL databases.

  • The GD graphics library extension, for use with the GD graphics library (see Resources).

  • The MPFR library extension. This extension gives you access to a number of MPFR functions that are not accessible from gawk's built-in MPFR support.

The Future

I feel that gawk as a language has largely reached maturity, and do not wish to add too many more features. That said, there are a few items still open for exploration:

  • Additional numeric facilities, such as possible integration with a decimal arithmetic library.

  • A way to map gawk arrays onto external storage, such as DBM arrays or SQL databases.

  • A "namespace" facility for extension functions and variables, and possibly regular gawk-level variables and functions as well. This would be a major design activity.

Of course, describing the above items does not constitute a commitment to do any of them.

Conclusion

The new API and extension facility opens new horizons for gawk and for awk programmers. I am very excited about it, and I hope to see gawk used for many new things where it simply was not applicable before.

Acknowledgements

Thanks to Scott Deifik, Dr Brian W. Kernighan, Dr Nelson Beebe and Eli Zaretskii for comments on the initial draft of this article.

The entire gawk development team deserves kudos for their work on this release. It was very much a team effort.

Resources

"GNU Awk 4.0: Teaching an Old Bird Some New Tricks", LJ, September 2011: http://www.linuxjournaldigital.com/linuxjournal/201109#pg94

The gawk distribution: http://ftp.gnu.org/gnu/gawk/gawk-4.1.0.tar.gz

Documentation On-line: http://www.gnu.org/software/gawk/manual

Arbitrary Precision Arithmetic with gawk: http://www.gnu.org/software/gawk/manual/html_node/Arbitrary-Precision-Arithmetic.html#Arbitrary-Precision-Arithmetic

Dynamic Extensions: http://www.gnu.org/software/gawk/manual/html_node/Dynamic-Extensions.html#Dynamic-Extensions

gawkextlib Home Page: http://gawkextlib.sourceforge.net

gawkextlib Download: http://sourceforge.net/projects/gawkextlib

The GD Graphics Library: http://www.boutell.com/gd/manual2.0.33.html

The Expat XML Parser: http://expat.sourceforge.net

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It is not always easy to

James Stiffon's picture

It is not always easy to teach the lesson effectively especially to your average students. Everyone know that students have to deal with writing papers and assignments so this is the best place http://www.globalwritings.com/ to hire a professional writer for easy completion of your assignments and other essay papers.

tesla_model_s_official_4That’

sollen's picture

tesla_model_s_official_4That’s not the case with the Tesla Model S. It comes not long after the retirement of the Tesla Roadster, a car we thoroughly Train Work Lights enjoyed but found a bit too raw, a bit too rough around the edges for general consumption.

I came in the next day only

professional editing services's picture

I came in the next day only to find very complex passwords written on sticky notes and affixed to everyone's monitors. Security software is no match for a Sharpie marker and a Post-It. It was a lesson well learned. This month is our Security issue, and although we don't have an answer to the Sticky Notes of Doom, we do have some great articles on Linux-related security.
http://www.proofreadingservices.info/our-services/editing-services/

Cukup satu kali klik

fahmi aulia noor's picture

Cukup satu kali klik http://portalmadura.com/ Semua Informasi tentang Madura bisa diperoleh.

This article is very

Anonymous's picture

This article is very interesting. Your article affects many "hot" issues of our society. It is impossible to be uninvolved to these problems. There are many articles out there on this particular point, but you have captured another side of the topic. This article gives good ideas and concepts. Keep it up.
http://nybreakingnews.com/

Thanks

Training K3 Umum's picture

Informasi Jadwal dan pelatihan training ahli k3 umum 2014
http://phitagoras.co.id/training-ahli-k3-ak3-umum.html

thanks for share, great

bayuwangi's picture

thanks for share, great post..

kumpulan game online

AK3 Umum

Cungkring's picture

Informasi Jadwal dan pelatihan training ahli k3 umum
http://traininghse.com/training-ahli-k3-umum/

Thx gan

Training K3 Rumah Sakit's picture

training migas cek disini -> Training Migas
Training Ohsas 18001 cek disini -> Training Ohsas 18001
Jadwal Training K3 cek disini -> Jadwal Training K3
Jadwal Training P3K cek disini -> Jadwal Training P3K

++++

cnbestmall's picture

++++ http://www.cnbestmall.com ++++

Accept Paypal and Credit Card, FREE SHIPPING

Nike AIr max, Shox, Rift, dunk, blazer, air force 1 shoes: 48 USD

Nike free running shoes: 42 USD

D&G, LV, Gucci, parda DC, polo, puma, supra shoes: 42 USD.

Timberland boot: 50 USD

T-shirts (polo, ed hardy, lacoste,gucci, lv, etc) $28

Jeans (AF, armani, bape, BBC, CA, coogi, D&G, Diesel, Evisu, Levis, gucci, true religion, versace) 45 USD

Down Coat jacket parka vests (moncler, canada goose, barbour, parajumpers, woolrich) 168 USD-268 USD

++++ http://www.cnbestmall.com ++++

It is true kids and any

Anonymous's picture

It is true kids and any person in the new one is taught like a bird of first day. This is because one has no idea what is going on in that field so teaching such person is never easy. However, the custom essays writing service smartcustomwriting helping you to write readers and learners friendly papers at cheap price.

Reply to comment | Linux Journal

κατασκευη ιστοσελιδων's picture

This is the reason why every business owner wants to have his business website completed
in the least possible time. Firms really should understand that
competitors in business is cut throat and corporations no longer possess the luxury
of becoming complacent on the subject of on the net advertising and
marketing. The appearance of your website has an extremely crucial part on the reputation of your
business.

Reply to comment | Linux Journal

Baca selengkapnya →'s picture

This gives you huge number of hits and promotion of your product.
These strategies include working with both traditional media and with digital and online media.
Life stage information deals with tasks that are important during each stage.

thanks

Anonymous's picture

Small error?

A.'s picture

I don't speak gawk, but in the example regarding the arithmetic precision there is an extra closing curly bracket (without a matching opening one). I presume this is an error.

Reply to comment | Linux Journal

otterbox for iphone 5 review's picture

My spouse and I absolutely love your blog and
find nearly all of your post's to be exactly I'm looking for.
Does one offer guest writers to write content available for
you? I wouldn't mind producing a post or elaborating on a lot
of the subjects you write related to here.

Again, awesome site!

Reply to comment | Linux Journal

hostgator coupon january's picture

Its not my first time to pay a quick visit this website, i am browsing this web site
dailly and obtain fastidious data from here everyday.

alfaonline

anikfaaz's picture

alfaonline.com : Toko belanja online murah, Promo heboh jual barang hanya Rp 1,- http://sooboos.com/toko-belanja-online/alfaonlinecom-toko-belanja-online...

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix