The Gemcutter's Workshop: Canada on Rails

Recapping another busy couple of weeks in Ruby land as well as the first international Rails conference.


The past two weeks have been another busy bi-week in terms of Ruby releases
and community activity. I'd like to start out with a couple of big release
announcements and a mailing list posting and then move on to two big events.
News from the Community
Eric and Ryan have kept up the pace with new releases of ParseTree
and ZenTest, along with
a
teaser
about an upcoming addition to ZenTest.

Zed Shaw has been hard at work on
Mongrel,
punching out a couple of new releases. He's shooting for a
0.4 release quite soon now.

The Rails team also has been busy, whipping out both
1.1.1
and
1.1.2
releases.

James Gray announced that he's hit the bottom of his Ruby Quiz
submission stack and asked for new submissions. A number of
responses came in, and he's well stocked now for quite a while. The
first quiz that appeared after the call for submissions was quite popular.
I'm responsible for the next one. Hopefully,
it will draw as much attention.

Finally, it's worth noting that the excellent
Ruby for
Rails
book, by David Alan Black, now is available in
PDF and should be hitting the bookstores at the beginning of May. This
is an excellent book and may claim the top spot in my personal
list of the best Ruby books available.
Coverity
Coverity has developed a suite of static code
analysis tools for C and C++. They're currently working under a contract
with the Department of Homeland Security to analyze the code bases of a
number of important open-source tools. Members of the projects Coverity
is working with have had good things to say about the process. And many
projects are showing substantial improvement.

Ruby is a recent addition to Coverity's list. Although it's nice to see Ruby
accorded that kind of respect, the addition is good in two other ways.
First, it allows us to compare the Perl, Python and Ruby code bases.
This point isn't really important, but it is interesting. Second, it
gives the Ruby core team some targets to watch as new releases approach.

Perl and Python have been on the list longer than Ruby has, and both are
showing improvement. Their original measurements are shown below:


Lang	LoC		orig defects	defect rate

Perl	485,001		89		0.185
Python	273,980		96		0.350

The next table shows the current measurements for Perl and Python, with
Ruby's first (and current) measurements added.


Lang	LoC		cur defects	defect rate

Perl	485,001		67		0.138
Python	273,980		14		0.051
Ruby	258,908		30		0.116

It's pretty cool to see that the Perl and Python communities have done a
good job of correcting the errors that Coverity found in the code bases.
It's also interesting to see that Ruby compares well with the original
Perl and Python defect rates. And, Ruby doesn't look too bad against their
current defect rates either. In fact, it compares well with a lot of
other projects out there, such as emacs, 0.133; gcc, 0.253; FreeBSD,
0.396; or Linux 2.6, 0.220.

Hopefully, we'll see a decrease in our defect rate over time, like most
of the other projects on Coverity's report. To this end, we have a great
example to follow--AMANDA. AMANDA started out with a defect rate
of approximately 1.0. It currently looks like this:


Project		LoC	cur defects	defect rate

AMANDA		88,414	0		0.000

The difference is so great that a company involved in AMANDA development
wrote an article about it that said, among other things:

What happened next is truly remarkable. The Amanda development community
... quickly responded to address this situation. Within one week,
Amanda developers fixed the entire list of identified bugs. As it
currently stands, there are 0 outstanding bugs detected by the Coverity
scan.

Canada on Rails
Canada on Rails has been a big event in the Ruby community. Billed as the first international event focused
on Rails, Canada on Rails has drawn a lot of attention and a lot of
people. I've tried to gather up some of the coverage here.

Some notable non-Ruby names attended Canada on Rails, including
Tim Bray, who wrote:

I was far from the only Rails interested-but-inexperienced poseur,
there were a lot of people there to find out what it's all about. I talked
to a mostly-PHP developer from Calgary and tried to convince her that
Rails ought to be able to do most of what she does, only cleaner and better.
On the other hand, I spent one session sitting next to a guy who has a
Rails shop in New York, and was hip to the very latest YARV gossip.
Mostly young, unsurprising; mostly male, sigh.

Ryan Davis kept collective
notes using SubEthaEdit. Day 1 notes can be read
here.
My favorite comment was: "Eclipse: . . . Gateway drug for Java users."

Amy Hoy teased us with
an
initial post
. Hopefully, more is coming soon. Alex Combas also
provided excellent coverage on his blog.

Several of the speakers have posts up as well:

  • Robby Russell talked about his new
    acts_as_legacy
    project
    . He also blogged about Day 1 of the conference
    here
    and here.
  • Jason Voorhis talked about internationalization and posted
    his slides in
    PDF
    .
  • David Astels blogged about being interviewed and casually
    mentioned that a DVD of the conference will be available--I wonder if
    it will be available to non-attendees. He also discussed his talk on Behavior
    Driven Design
  • Thomas Fuchs let us know that he was en
    route
    to the conference. Hopefully, he'll have a retrospective
    post up soon.

Optimizing Ruby Code
One of the rules I find myself being more and more concerned with following
is "Make it right, then make it fast". The more I work with dynamic
languages such as Ruby, the easier it becomes to follow this rule and the bigger the
payoff becomes for doing so. In that mind, I'd like to discuss some
fundamentals for optimizing Ruby code.

Any time you optimize, you need to follow some simple steps:

  • Get the code working. You don't want to optimize broken code.
  • Profile your code. Know where the bottlenecks are so you can
    optimize the right parts.
  • Benchmark your code and the alternatives. Don't replace something
    unless it's worth it.
  • If you need to, go to another language for speed. This is your
    last resort.

I'm not going to spend any time here talking about the first step. Hopefully,
you've already got a handle on it. If not, refer to my last two
articles, found here and
here. They both talk about Test
First programming and related topics.

Moving on to the second step, the Ruby profiler is easy to use, but it runs
much more slowly than Ruby itself. To profile a program, simply do:


$ ruby -rprofile yourprog

This command produces a report that looks something like this trimmed
version:


  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 15.00     0.15      0.15       45     3.33    65.33  Kernel.require
 14.00     0.29      0.14      532     0.26     0.45  Gem::Specification#copy_
  6.00     0.35      0.06      438     0.14     0.16  Kernel.dup
  6.00     0.41      0.06       74     0.81    15.27  Array#each
  4.00     0.45      0.04      226     0.18     0.27  String#gsub!
  3.00     0.48      0.03       82     0.37     0.37  String#gsub

The meaning of each column is as follows:

  • % time: the percentage of total time spent in this
    method.
  • cumulative seconds: the total number of running seconds in
    this and all previous methods.
  • self seconds: the number of seconds spent in this
    method.
  • calls: the number of times this method was
    called.
  • self ms/call: the time spent in this method per
    call.
  • total ms/call: the total time spent in this method or in
    methods it calls.
  • name: the name of the method.

As you profile code, you will see a lot of methods that you can't do
much about, such as Kernel.dup. You'll also see some that are more
fruitful for you to pursue.

Benchmarking different options is at the heart of optimizing.
Fortunately, it's easy to do and the output is easy to read. Here's a
quick example that benchmarks different kinds of iterators and looping in
Ruby:


require 'benchmark'

n = 10_000_000

Benchmark.bm(15) do |x|
  x.report("for loop:") { for i in 1..n; a = "1"; end }
  x.report("times:") { n.times do ; a = "1"; end }
  x.report("upto:") { 1.upto(n) do ; a = "1"; end }
end

Running this code generates a report like this one:


                     user     system      total        real
for loop:        3.060000   0.000000   3.060000 (  3.137070)
times:           3.290000   0.000000   3.290000 (  3.308736)
upto:            3.370000   0.000000   3.370000 (  3.372559)

This report shows that if speed matters, you probably want to use a for
loop, although it won't make a huge difference. Choosing the right algorithm for
the right method usually is where you get your biggest win, so spend
your time on profiling and benchmarking.

If you absolutely have to go to another language, Ruby has a very clean
interface for writing and using C extensions. But even it probably is too much work
when you could use RubyInLine
instead. RubyInline allows you to write C code within your Ruby program.
This code is compiled and linked to your program, potentially representing a huge
speed increase. Ryan's documentation shows a 4x speed up between:


  def factorial(n)
    f = 1
    n.downto(2) { |x| f *= x }
    f
  end

and


  inline do |builder|
    builder.c "
    long factorial_c(int max) {
      int i=max, result=1;
      while (i >= 2) { result *= i--; }
      return result;
    }"
  end

If you've gotten all you can out of choosing your algorithms well, this
might be your last, best hope.

______________________

--
-pate
http://on-ruby.blogspot.com

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Performance Unit Tests

Sean Carley's picture

You could make performance into a unit test using bm. Then as you optimize, you could tell when you had done enough. Also, if you added something later that "broke" performance, you would know immediately.

Too subjective and too prone to error.

zenspider's picture

Too subjective and too prone to error.

What I do is I actually run my profile runs against my unit tests. Assuming I have good coverage then the results aren't a total farce and any optimizations I do directly affect my feedback loop.

Too subjective and too prone

zenspider's picture

Too subjective and too prone to error.

What I do is I actually run my profile runs against my unit tests. Assuming I have good coverage then the results aren't a total farce and any optimizations I do directly affect my feedback loop.

Good Idea

pate's picture

You'd want to keep performance tests separate from your regular unit tests though (or build a tricky way to track the base performance on a given system). There are so many things that can affect the numbers outside the script. Hardware, OS, potentially even the version of Ruby (what if I run it on a YARV enabled build?).

That sounds like a pretty interesting tool to build though.

Optimising helped by, (unit), tests

paddy3118's picture

Good tests developed with the working program pay dividends when optimising as they can quickly show when an optimisation change makes the program fail.
The other four bulleted points were great (and apply to other languages too).
- Pad.

Good Point

pate's picture

This is true. Any time you want to make changes to a program without changing it's functional behavior, Unit Tests are the way to go.

I've talked about unit testing before (both in this column and elsewhere). I'll be talking about it again in the context of refactoring soon.

-pate

Coverity false positives

Jack Diederich's picture

Coverity errs on the side of caution. If it isn't sure that a pointer can't be NULL before use it will mark it as problematic. A developer who _is_ sure it can't be NULL will then mark the report as NOT_A_BUG. False positives are one reason why the perl & python counts dropped dramatically after the initial results. Some real bugs were fixed too, of course.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState