Improving Perl Application Performance
A fellow developer and I have been working on a data collection application primarily written in Perl. The application retrieves measurement files from a directory, parses the files, performs some statistical calculations and writes the results to a database. We needed to improve the application's performance so that it would handle a considerable load while being used in production.
This paper introduces four performance-tuning steps: identification, benchmarking, refactoring and verification. These steps are applied to an existing application to improve its performance. A function is identified as being a possible performance problem, and a baseline benchmark of that function is established. Several optimizations are applied iteratively to the function, and the performance improvements are compared against the baseline.
The first task at hand in improving the performance of an application is to determine what parts of the application are not performing as well as they should. In this case I used two techniques to identify potential performance problems, code review and profiling.
A performance code review is the process of reading through the code looking for suspicious operations. The advantage of code review is the reviewer can observe the flow of data through the application. Understanding the flow of data through the application helps identify any control loops that can be eliminated. It also helps identify sections of code that should be further scrutinized with application profiling. I do not advise combining a performance code review with other types of code review, such as a code review for standards compliance.
Application profiling is the process of monitoring the execution of an application to determine where the most time is spent and how frequently operations are performed. In this case, I used a Perl package called Benchmark::Timer. This package provides functions that I use to mark the beginning and end of interesting sections of code. Each of these marked sections of code are identified by a label. When the program is run and a marked section is entered, the time taken within that marked section is recorded.
Adding profiling sections to an application is an intrusive technique; it changes the behavior of the code. In other words, it is possible for the profiling code to overshadow or obscure a performance problem. In the early stages of performance tuning, this may not be a problem because the magnitude of the performance problem will be significantly larger than the performance impact of the profiling code. However, as performance issues are eliminated, it is more likely that a subsequent performance issue will be harder to distinguish. Like many things, performance improvement is an iterative process.
In our case, profiling some sections of the code indicated that a considerable amount of time was being spent calculating statistics of data collected off the machine. I reviewed the code related to these statistics calculations and noticed that a function to calculate standard deviation, std_dev, was used frequently. The std_dev calculation caught my eye for two reasons. First, because calculating the standard deviation requires calculating the mean and the mean of the sum of squares for the entire measurement set, the naï¿½e calculation for std_dev uses two loops when it could be done with one loop. Secondly, I noticed that the entire data array was being passed into the std_dev function on the stack rather than being passed as a reference. I thought these two items together might indicate a performance issue worth examining.
After identifying a function that could be improved, I proceeded to the next step, benchmarking the function. Benchmarking is the process of establishing a baseline measurement for comparison. Creating a benchmark is the only way to know whether a modification actually has improved the performance of something. All the benchmarks presented here are time-based. Fortunately, a Perl package called Benchmark was developed specifically for generating time-based benchmarks.
I copied the std_dev function (Listing 1) out of the application and into a test script. By moving the function to a test script, I could benchmark it without affecting the data collection application. In order to get a representative benchmark, I needed to duplicate the load that existed in the data collection application. After examining the data processed by the data collection application, I determined that a shuffled set of all the numbers between 0 and 999,999 would be adequate.
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- High-Availability Storage with HA-LVM
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- You're the Boss with UBOS
- The Usability of GNOME
- Linux for Astronomers
- Multitenant Sites
- Many Drives, One Folder