What's new in the Ruby World: rocaml

by Pat Eyler

Last week, Mauricio Fernandez announced a new Ruby to OCaml bridge that he’s working on, called rocaml. With the growing interest in functional languages in the Ruby world, this seemed like the sort of thing I needed to talk to him about, so I sent off a quick set of questions, and this is what I heard back1.

The strict typing and academic rigor of a language like OCaml and the free-wheeling nature of Ruby seem rather antithetical. What made you want to combine them?

I wouldn’t approach the relationship between Ruby and OCaml from that point of view :) OCaml and Ruby are no more opposed than C and Ruby, and I don’t think anybody questions the utility of writing extensions in C. Coding in Ruby is generally much more enjoyable than in C, but Rubyists, of all people, still do it for two reasons:

  1. for speed
  2. to access functionality provided by external libs with a C interface

rocaml is only concerned with (1), and must thus be judged by how easily you can make your Ruby application perform better. In this regard, OCaml can be a better choice than C because:

  • there’s a huge difference in the level at which you can approach the problem to be solved
  • OCaml provides so many guarantees C just cannot. You’ll often see your C extension segfault while you’re developing it (and even worse, when it’s in production too :|), and this just doesn’t happen with OCaml.
  • rocaml allows you to pass rich native data types between Ruby and OCaml very conveniently. Whereas RubyInline performs only a few basic conversions (strings and integer types) by default AFAIK (it can be extended to recognize more complex types, but you have to write the conversion routines by hand), rocaml allows you to pass e.g. native lists of arrays of arrays of Structs containing floats from Ruby to OCaml and vice versa, and will verify that you’re passing values of the correct type to the OCaml functions, ensuring type-safety. Operating with native types is important: it means that your OCaml code needs not be sprinkled with functions to convert from an opaque Ruby VALUE type to concrete types, i.e. that your OCaml code must not be developed with Ruby in mind, and can be used standalone or come from a third party library that doesn’t know anything about Ruby.

Going back to your question, it turns out that Ruby and OCaml have several things in common: they are both type-safe, strongly typed, and you could even say that OCaml is duck-typed to some extent. But most importantly, they are both pleasant to work with. One thing separating them is their relative performance at some tasks; this difference is what makes rocaml worthwhile, very much like temperature differences can be used to perform work (in the physical sense).

What kinds of things are you using ROCaml for? Have you heard of other people using it for other things?

I don’t know if anybody is using rocaml yet; after all, the public has only been aware of its existence for a week, and all I did was post an announcement to the ruby-talk mailing list. I’ve seen in my httpd logs that a few people have made a local copy of the darcs repository, though.

As for what I am using rocaml for, I have an application where DB queries can be a bottleneck. I’m talking about everything from DB query execution to ORM to processing of the results within Ruby—queries that take several seconds (before the ORM even kicks in) and which you just cannot let PostgreSQL perform fast enough by adding indexes or denormalizing some tables. I made a moderately-sized system in Objective Caml that can yield the desired performance(I’m talking about up to 2-3 orders of magnitude speed gains). While I was working on it, I didn’t think about how it would interface to Ruby; I only concerned myself with how clean the design was, and how well it could perform.

When the OCaml part was close to being “ready”, I created rocaml to generate the necessary interface. Having native Ruby to/from OCaml conversions was an important aspect of it, as I didn’t want to pollute the OCaml code with things like (int_of_VALUE num) or (string_of_VALUE s).

What kind of performance are you seeing with it?

The short answer is: rocaml extensions will be as fast as the underlying OCaml code. This means you often get a couple orders of magnitude faster. rocaml’s source tree includes a couple interesting examples:

  • a ~30-line implementation of sets based on RB trees, with 3X faster lookup than RBTree (which is written in C)
  • a family of 2-line specialized marshallers (built upon OCaml’s Marshal module) that often operate ~5X faster than Ruby’s (again, we’re talking about C code here)

The performance you get out of extensions written with rocaml is pretty much the speed you can get out of your OCaml code2, except for the FFI and data conversion overheads. Since rocaml converts Ruby VALUEs to native OCaml types, there’s a small overhead; for instance, if you have a function taking a string and you call it on a Ruby String object, the underlying string will have to be copied. The cost of argument passing/return value copying is thus proportional to the size of the objects being passed from/to Ruby. In practice, it will normally be irrelevant; e.g. passing an array of floats takes half as much time as iterating over the elements in Ruby without doing anything (array.each{ }).

There is one case where you want to pass large values from(to) Ruby to(from) OCaml and copying the whole thing would be too expensive: data structures. Say you have a sorted array of strings and need to find the position (index) of a given string using binary search. Using an OCaml function of type string array -> string -> int (equivalent to def binary_search(array, string) ....) would make no sense, since we’d end up copying the whole array in order to turn it into an OCaml array before performing the search!

rocaml can create opaque Ruby objects that wrap OCaml ones, in order to keep the data structures on OCaml’s side and avoid large argument passing costs. This is what the sets based on RB trees shipped with rocaml use.

The dual of that is to wrap Ruby VALUES in order to use them in OCaml. This is doable, but the other approach is preferable in general because it allows you to write/reuse OCaml code oblivious of Ruby and it is faster. That said, I might implement it anyway :)

1 Well, this is most of what I got back. The rest is over at my On Ruby blog

2 which as you know is often close to 50% of the best you can get out of C, using many more lines of code to match the algorithm - otherwise OCaml will be faster, since algorithms often matter more than implementations (when talking about languages in the same performance category).

Load Disqus comments