Improving Perl Application Performance

The four basic performance-tuning steps to improve an existing application's performance.
Conclusions

During this process I identified a function that probably wasn't performing as well as it could. I was able to achieve several modest performance gains by refining the logic of the calculation in Perl. I also tried using an open-source package, only to find that it was 48% worse than my original function. Finally, I implemented the standard deviation function in C and exposed it to Perl through an XS layer. The C version showed a 1,175% speedup compared to the original Perl version. Improvements are summarized in Figure 1.

Figure 1. Comparison of All Implementations

In most cases, I have seen Perl performance that rivals C; however, this obviously isn't one of those cases. Perl is a good general-purpose language, and one of its benefits is the ability to step out of the language and implement code in a lower-level language. Don't be afraid of language mix-ins when you really need to improve performance, as long as you understand that there is a maintenance cost. The disadvantage of introducing additional languages is that it will increase the burden for those that must maintain the application in the future. They will need to know C and understand XS functions. However, in our case, the improved performance significantly outweighed the impact of supporting XS.

Bruce W. Lowther (blowther@micron.com) is a software engineer for Micron Technology, Inc., in Boise, Idaho. He has worked at Micron for nine years and has spent the past five years there working on tools to help integrate semiconductor equipment into the Micron manufacturing process. He received his undergraduate and Master's degrees in Computer Science from the University of Idaho.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

standard deviations..

Nagilum's picture

use Statistics::Descriptive;
my $stat = Statistics::Descriptive::Sparse->new();
$stat->add_data(331806,331766,328056);
print $stat->standard_deviation() . "\n";

-> 2153.60937343181

@scratch=(331806,331766,328056);
sub std_dev_ref_sum {
my $ar = shift;
my $elements = scalar @$ar;
my $sum = 0;
my $sumsq = 0;
foreach (@$ar) {
$sum += $_;
$sumsq += ($_ **2);
}
return sqrt( $sumsq/$elements -
(($sum/$elements) ** 2));
}
print std_dev_ref_sum(\@scratch) . "\n";

-> 1758.41469005422

Someone makes a mistake here..

Difference between standard deviation, knowing full population

anonymous's picture

The difference between the two calculations:

The calculation in the Statistics::Descriptive package assumes that the data available is a sample from the population, does not contain the full population. See: http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_SD
In the Statistics::Descriptive documentation, this is referenced by the note: "Returns the standard deviation of the data. Division by n-1 is used."

The calculation used in the article assumes that the data represents the full population.

Err... No.

Gordan Bobic's picture

In most cases, I have seen Perl performance that rivals C;

I would love to see you demonstrate even just one example where this is the case. The gain of _only_ 11.75x of your "C" over Perl in the case you describe is because you used XS for the implementation and not pure C with XS to just glue the two together. For big arrays you'll find it's faster to transcribe the Perl array into a C array of floats, and to do the work in pure C. Perl is usually about two orders of magnitude (100x) slower than C or decently coded C++.

What you say about object oriented interfaces slowing things down is also completely untrue. The only thing you'll save by using procedural rather than OO implementation is a pointer dereference when you call the std_dev method on the object - which is negligible compared to the calculations inside the function.

Re: Improving Perl Application Performance

Anonymous's picture

Hopefully, in the future, there will be less of a need for this sort of thing... With any luck, Perl6 and Parrot will prove to be faster, and far easier to integrate with C. In fact, the equivalent Parrot routines are already only about 3x slower than the equivalent C program, and both are far faster than Perl5 is today. (code follows)
-- pb

time N0 # time
mul N0, 1048576.0
mod N0, N0, 2000000000.0
set I0, N0 # seed
new P0, .Random # rng
set P0, I0 # seed the rng
set I0, 1000000 # array size
set I1, I0
set I2, 100 # loops
new P1, .SArray
set P1, I1
SETRND:
set N0, P0 # random numbers
mul N0, N0, I0
dec I1
set P1[I1], N0
if I1, SETRND
time N4
SDLOOP:
set I1, P1 # array size
set N3, I1
div N3, 1, N3 # 1 / array size
set N1, 0
set N2, 0
STDDEV:
dec I1
set N0, P1[I1]
add N1, N1, N0 # sum
mul N0, N0, N0
add N2, N2, N0 # sumsq
if I1, STDDEV
mul N1, N1, N3 # sum / array size
mul N1, N1, N1 # (squared)
mul N2, N2, N3 # sumsq / array size
sub N2, N2, N1 # -
pow N2, N2, 0.5 # sqrt
dec I2
if I2, SDLOOP
time N5
sub N4, N5, N4
print N4 # time elapsed in bench loop
print "
"
end

That is parrot? That looks

Anonymous's picture

That is parrot? That looks like shit. I love perl but its as good as dead with this perl6 garbage.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix