Linux and the Alpha

This is the first of a 2 part series, an introduction to the Alpha family of computers in preparation for giving us the techniques for optimizing code on this high-performance platform in Part 2.
Performance Counters

The Alpha chips, like most other modern CPUs, provide a variety of performance counters. These allow measuring various event counts or rates, such as the number of cache misses, instruction issue rate, branch mispredicts or instruction frequency. Unfortunately, I am not aware of any Linux API that would provide access to these counters. This is particularly unfortunate since both the Pentium and the Pentium Pro chips provide similar counters. Digital UNIX gives access to these counters via the uprofile and kprofile programs, and an ioctl-based interface documented in the pfm(7) man page. Hopefully, something similar (but more general) will eventually become available for Linux. With the proper tools, these counters can provide a wealth of information.

GNU gprof

Most readers are probably familiar with the original gprof (see Reference 3). It's a handy tool to determine the primary performance bottlenecks at the function level. However, with the help of gcc, GNU gprof can also look inside a function. We illustrate this with a truly trivial function that computes the factorial. Assume we've typed up the factorial function and a simple test program in file fact.c. We can then compile that program like this (assuming GNU libc version 2.0 or later is installed):

gcc -g -O -a fact.c -lc

Invoking the resulting a.out binary once produces a gmon.out file that contains the execution counts for each basic block in the program. We can look at these counts by invoking gprof, specifying the -l and --annotate options. This command generates a source-code listing that shows how many times a basic block in each line of source code has been executed.

Listing 1. Basic-block Execution Counts

Our factorial example results in the listing shown in Listing 1. The basic block starting at the printf line in function main() was executed once, so it has been annotated with a 1. For the factorial function, the function prologue and epilogue were executed 20 times each, so the first and last line of function fact are annotated with 20. Of these 20 calls, 19 resulted in a recursive call to fact, and the remaining call simply returned 1. Correspondingly, the then branch of the if statement has been annotated with 19, whereas the else statement has an annotation of 1. It's that simple.

There certainly are no surprises in the behavior of function fact(), but in realistic, more complicated functions or in code that was written by somebody else, this knowledge can be very helpful to avoid wasting time optimizing rarely executed code.

Optimization Techniques: Making Your Applications Fly

Next month, we'll look into the techniques that can greatly improve the performance of a given piece of code. Most of them are not novel. Some of them have been around for so long that it would be difficult, if not impossible, to give proper credit. Others are “obvious” (once you know them). The key point is that it is the characteristics of modern, particularly Alpha-based, systems which make these techniques so important and worthwhile.

Acknowledgments

The author would like to thank Richard Henderson of Texas A&M University and Erik Troan of Red Hat Software for reviewing this paper on short notice. Their feedback greatly improved its quality. Errors and omissions are the sole responsibility of the author.

David is a graduate student in the Ph.D. program of the Computer Science department at the University of Arizona. He plans on graduating in August 1997 and to finally get a “Real Job”. After a short intermezzo with Linux involving Reed-Solomon codes and the floppy-tape driver, he forgot about Linux until the need arose for a low-cost, high-performance system based on Digital's Alpha processor. As a result, he got involved in the Linux/Alpha port and has been sticking around in the free software community ever since. When not playing with computers, he enjoys the outdoors with his lovely wife. He can be reached via e-mail at David.Mosberger@acm.org.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

measuring cpu cycle counts

amrutansu's picture

is dere any way in wich i can measure the cpu cycle counts of a pentium processor??
i mean i want 2 measure the cpu cycle counts elapsed before and after a bulk of data has been transformed.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix