GNU Awk 4.1: Teaching an Old Bird Some New Tricks, Part II

In an earlier article ("GNU Awk 4.0: Teaching an Old Bird Some New Tricks", published in the September 2011 issue of Linux Journal), I gave a brief history of awk and gawk and provided a high-level overview of the many new features in gawk 4.0. I recommend reading that article first, although you can read this one without doing so, if you wish.

gawk 4.0 itself was released in June 2011. Since then, the gawk development team has not been resting on its laurels! gawk 4.1, released in May 2013, contains a number of new features, and that's what I cover here.

Unlike gawk 4.0, there are considerably fewer changes at the language level (although there are some). The changes this time around are more concerned with internals, and with the ability to interface to the outside world. So let's get started.

Reduced Footprint

For many years, when you built gawk, you got two executables: the regular interpreter, gawk, and pgawk, its profiling twin brother, which ran awk programs (more slowly) and produced a statement count execution profile showing how many times each line of code was executed.

With gawk 4.0, you got an additional executable, dgawk, the gawk debugger. Although the three versions shared most of the same code, the core parts that actually executed your awk program were compiled differently in each one.

For gawk 4.1, all three executables have been merged into a single program, named just gawk. Although the combined executable is larger, it is still smaller than having three separate executables, and in addition, the documentation is simpler and easier to understand (and maintain!).

To accommodate this change, the options had to change slightly. You now use -D to run the debugger, -p to do profiling and -o for pretty-printing without profiling.

Arbitrary Precision Arithmetic with MPFR and GMP

An important new feature that is visible for the awk programmer is arbitrary precision floating-point arithmetic with the GNU MPFR and GMP libraries.

This is an optional feature: if you have the MPFR and GMP libraries installed when you configure and build gawk, gawk automatically will be able to use them.

Note that I said "be able to use them". You still have to choose to do so either by using the -M option (or --bignum, if you prefer long options), or by setting the special variable PREC to the desired floating-point precision.

The precision is the number of bits kept in the floating-point mantissa. The default is 53, which is the same as that used by hardware double-precision floating point. From the gawk manual:

$ gawk -M -v PREC=100 'BEGIN { x = 1.0e-400; print x + 0}
> PREC = "double"; print x + 0 }'

You see that regular hardware can't handle an exponent of -400, whereas MPFR can.

An additional new variable, ROUNDMODE, sets the rounding mode for calculations and printing arbitrary precision values.

In the past several years, for reasons I don't quite understand, I've gotten bug reports from people who expect gawk's arithmetic to work exactly like "real" arithmetic done with pencil and paper. In other words, they want what is known in Computer Science as decimal arithmetic. I'm not sure why they expect this, but as we all should know, computers don't quite work that way.

MPFR does not give you decimal arithmetic. However, if you understand what you're doing and how to use it, you can get results that are likely to be good enough for your purposes.

The manual has a full chapter that describes the issues involved with floating-point arithmetic, what it means when you increase the precision, and how to use the various rounding modes supported by MPFR.

New Arrays Provide Indirect Variable Access

There are three new arrays:

  • SYMTAB: provides access to awk-level variables.

  • FUNCTAB: lists the names of all user-defined and extension functions.

  • PROCINFO["identifiers"]: lists all known identifiers and what gawk knows about their types after it has parsed the program.

Of these, SYMTAB is the most interesting, since it provides indirect access to any variable. For example:

$ gawk 'BEGIN { a = 5 ; print "a =", a
> SYMTAB["a"] += 37
> print "a is now", a }'
a = 5
a is now 42

With the isarray() built-in function, you can "walk" the entire symbol table and print out all variable and array values, if you choose to do so.

Dynamic Extensions

The most exciting change in gawk 4.1 is its ability to interface to the outside world. For many years, gawk had an "extension" or "plug in" mechanism that let a programmer write a new "built-in" function in C, and load it into the running gawk interpreter at runtime.

This mechanism required understanding something of the gawk internals and making use of gawk's internal data structures and functions. Although it was documented minimally, and it worked, it had several drawbacks. The most notable one was that there was no backward compatibility across releases.

Nonetheless, a group of developers forked gawk to create xgawk (XML gawk) and developed a number of dynamic extensions and new facilities for the core executable.

For many years, I had been wanting to provide a defined C API for writing extensions that would not be dependent upon the gawk internals and that possibly could provide binary compatibility across releases.

For gawk 4.1, together with the xgawk developers, we finally made this happen.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It is not always easy to

James Stiffon's picture

It is not always easy to teach the lesson effectively especially to your average students. Everyone know that students have to deal with writing papers and assignments so this is the best place to hire a professional writer for easy completion of your assignments and other essay papers.


sollen's picture

tesla_model_s_official_4That’s not the case with the Tesla Model S. It comes not long after the retirement of the Tesla Roadster, a car we thoroughly Train Work Lights enjoyed but found a bit too raw, a bit too rough around the edges for general consumption.

I came in the next day only

professional editing services's picture

I came in the next day only to find very complex passwords written on sticky notes and affixed to everyone's monitors. Security software is no match for a Sharpie marker and a Post-It. It was a lesson well learned. This month is our Security issue, and although we don't have an answer to the Sticky Notes of Doom, we do have some great articles on Linux-related security.

Cukup satu kali klik

fahmi aulia noor's picture

Cukup satu kali klik Semua Informasi tentang Madura bisa diperoleh.

This article is very

Anonymous's picture

This article is very interesting. Your article affects many "hot" issues of our society. It is impossible to be uninvolved to these problems. There are many articles out there on this particular point, but you have captured another side of the topic. This article gives good ideas and concepts. Keep it up.


Training K3 Umum's picture

Informasi Jadwal dan pelatihan training ahli k3 umum 2014

thanks for share, great

bayuwangi's picture

thanks for share, great post..

kumpulan game online

AK3 Umum

Cungkring's picture

Informasi Jadwal dan pelatihan training ahli k3 umum

Thx gan

Training K3 Rumah Sakit's picture

training migas cek disini -> Training Migas
Training Ohsas 18001 cek disini -> Training Ohsas 18001
Jadwal Training K3 cek disini -> Jadwal Training K3
Jadwal Training P3K cek disini -> Jadwal Training P3K


cnbestmall's picture

++++ ++++

Accept Paypal and Credit Card, FREE SHIPPING

Nike AIr max, Shox, Rift, dunk, blazer, air force 1 shoes: 48 USD

Nike free running shoes: 42 USD

D&G, LV, Gucci, parda DC, polo, puma, supra shoes: 42 USD.

Timberland boot: 50 USD

T-shirts (polo, ed hardy, lacoste,gucci, lv, etc) $28

Jeans (AF, armani, bape, BBC, CA, coogi, D&G, Diesel, Evisu, Levis, gucci, true religion, versace) 45 USD

Down Coat jacket parka vests (moncler, canada goose, barbour, parajumpers, woolrich) 168 USD-268 USD

++++ ++++

It is true kids and any

Anonymous's picture

It is true kids and any person in the new one is taught like a bird of first day. This is because one has no idea what is going on in that field so teaching such person is never easy. However, the custom essays writing service smartcustomwriting helping you to write readers and learners friendly papers at cheap price.

Reply to comment | Linux Journal

κατασκευη ιστοσελιδων's picture

This is the reason why every business owner wants to have his business website completed
in the least possible time. Firms really should understand that
competitors in business is cut throat and corporations no longer possess the luxury
of becoming complacent on the subject of on the net advertising and
marketing. The appearance of your website has an extremely crucial part on the reputation of your

Reply to comment | Linux Journal

Baca selengkapnya →'s picture

This gives you huge number of hits and promotion of your product.
These strategies include working with both traditional media and with digital and online media.
Life stage information deals with tasks that are important during each stage.


Anonymous's picture

Small error?

A.'s picture

I don't speak gawk, but in the example regarding the arithmetic precision there is an extra closing curly bracket (without a matching opening one). I presume this is an error.

Reply to comment | Linux Journal

otterbox for iphone 5 review's picture

My spouse and I absolutely love your blog and
find nearly all of your post's to be exactly I'm looking for.
Does one offer guest writers to write content available for
you? I wouldn't mind producing a post or elaborating on a lot
of the subjects you write related to here.

Again, awesome site!

Reply to comment | Linux Journal

hostgator coupon january's picture

Its not my first time to pay a quick visit this website, i am browsing this web site
dailly and obtain fastidious data from here everyday.


anikfaaz's picture : Toko belanja online murah, Promo heboh jual barang hanya Rp 1,-