Linux and the Alpha
The Alpha chips, like most other modern CPUs, provide a variety of performance counters. These allow measuring various event counts or rates, such as the number of cache misses, instruction issue rate, branch mispredicts or instruction frequency. Unfortunately, I am not aware of any Linux API that would provide access to these counters. This is particularly unfortunate since both the Pentium and the Pentium Pro chips provide similar counters. Digital UNIX gives access to these counters via the uprofile and kprofile programs, and an ioctl-based interface documented in the pfm(7) man page. Hopefully, something similar (but more general) will eventually become available for Linux. With the proper tools, these counters can provide a wealth of information.
Most readers are probably familiar with the original gprof (see Reference 3). It's a handy tool to determine the primary performance bottlenecks at the function level. However, with the help of gcc, GNU gprof can also look inside a function. We illustrate this with a truly trivial function that computes the factorial. Assume we've typed up the factorial function and a simple test program in file fact.c. We can then compile that program like this (assuming GNU libc version 2.0 or later is installed):
gcc -g -O -a fact.c -lc
Invoking the resulting a.out binary once produces a gmon.out file that contains the execution counts for each basic block in the program. We can look at these counts by invoking gprof, specifying the -l and --annotate options. This command generates a source-code listing that shows how many times a basic block in each line of source code has been executed.
Our factorial example results in the listing shown in Listing 1. The basic block starting at the printf line in function main() was executed once, so it has been annotated with a 1. For the factorial function, the function prologue and epilogue were executed 20 times each, so the first and last line of function fact are annotated with 20. Of these 20 calls, 19 resulted in a recursive call to fact, and the remaining call simply returned 1. Correspondingly, the then branch of the if statement has been annotated with 19, whereas the else statement has an annotation of 1. It's that simple.
There certainly are no surprises in the behavior of function fact(), but in realistic, more complicated functions or in code that was written by somebody else, this knowledge can be very helpful to avoid wasting time optimizing rarely executed code.
Next month, we'll look into the techniques that can greatly improve the performance of a given piece of code. Most of them are not novel. Some of them have been around for so long that it would be difficult, if not impossible, to give proper credit. Others are “obvious” (once you know them). The key point is that it is the characteristics of modern, particularly Alpha-based, systems which make these techniques so important and worthwhile.
The author would like to thank Richard Henderson of Texas A&M University and Erik Troan of Red Hat Software for reviewing this paper on short notice. Their feedback greatly improved its quality. Errors and omissions are the sole responsibility of the author.
David is a graduate student in the Ph.D. program of the Computer Science department at the University of Arizona. He plans on graduating in August 1997 and to finally get a “Real Job”. After a short intermezzo with Linux involving Reed-Solomon codes and the floppy-tape driver, he forgot about Linux until the need arose for a low-cost, high-performance system based on Digital's Alpha processor. As a result, he got involved in the Linux/Alpha port and has been sticking around in the free software community ever since. When not playing with computers, he enjoys the outdoors with his lovely wife. He can be reached via e-mail at David.Mosberger@acm.org.
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- The Secret Password Is...
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- New Products
4 hours 17 min ago
- Keeping track of IP address
6 hours 8 min ago
- Roll your own dynamic dns
11 hours 22 min ago
- Please correct the URL for Salt Stack's web site
14 hours 33 min ago
- Android is Linux -- why no better inter-operation
16 hours 48 min ago
- Connecting Android device to desktop Linux via USB
17 hours 17 min ago
- Find new cell phone and tablet pc
18 hours 15 min ago
19 hours 44 min ago
- Automatically updating Guest Additions
20 hours 52 min ago
- I like your topic on android
21 hours 39 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?