At the Forge - Shoehorning Data into a Database

 in
Databases make the world go round, but sometimes fitting a round data peg into a square database hole is less than ideal. What's a programmer to do?
Polymorphic Associations

The way that PostgreSQL has integrated this type of object hierarchy into its relational system is impressive, flexible and useful. And yet, because it is unique to PostgreSQL, it means that no higher-level, database-agnostic application framework can support it. This especially is true in Ruby on Rails, which tries to treat all databases as similar or identical, going so far as to encourage programmers to use a Ruby-based domain-specific language (migrations) to create and modify database definitions. Using PostgreSQL's inheritance features might work, but it will take a fair amount of twisting to make it compatible with Rails.

Besides, Rails already has a feature, known as polymorphic associations, that lets us work with distinct types of items as if they were part of a single class. This isn't the same as an object hierarchy—we can't say that secretaries and programmers are both types of employees. But, we can say that secretaries and programmers are both employable and treat them as similar via that description.

To begin, you might remember that Rails has something known as associations, which allow us to connect one model to another. For example, let's say that each company has one or more employees. Thus, we can create some simple models. We can generate migrations with:

./script/generate model company name:string
./script/generate model employee first_name:string
    last_name:string email_address:string company_id:integer

Then, we can turn the automatically generated migration files into actual database tables with the following:

rake db:migrate

Now, we can indicate that each company can have one or more employees by modifying the model files. For example, we add the following to employee.rb:


class Company < ActiveRecord::Base
  has_many :employees
end

Similarly, we can say:


class Employee < ActiveRecord::Base
  belongs_to :company
end

With has_many and belongs_to in place, we now have created an “association” between these two models. This might not seem too exciting, but it means we can treat the two tables as object classes and each row in the table as an instance:


xyz = Company.create(:name => 'XYZ Corporation')

george = Employee.create(:first_name => 'George',
    :last_name => 'Washington',
    :email_address => 'georgie@whitehouse.gov',
    :company_id => xyz.id)

Now, we can say:

p xyz.employees.first

and we get back our george user. Similarly, we can say:

p george.company

and get back our xyz company. This is all standard stuff for Rails programmers, and it is part of the ActiveRecord feature known as associations. You can create all sorts of associations, giving them arbitrary names. For example, we could say:


class Company < ActiveRecord::Base
  has_many :employees
  has_many :employees_with_a, :class_name => 'Employee',
            :conditions => "first_name ilike '%a%'"
end

With this in place, and after restarting the console (or typing reload!), we now can say:

xyz = Company.find_by_name('XYZ Corporation')

xyz.employees_with_a

This prints the empty list—not surprising, given that we have defined only a single employee so far, and his name didn't contain an a. But, now we can create a second employee:


jane = Employee.create(:first_name => 'Jane',
                       :last_name => 'Austin',
                       :email_address => 'jane@bookauthor.com',
                       :company_id => xyz.id)

If we run our association again:

xyz.employees_with_a

now we get our jane employee.

This is all well and good, but what happens if we want to represent different types of employees, each of whom is employed by a company, but with different associated data? This is where polymorphic associations become useful. In order for this to work, we need to change the definitions of our models, as well as the relationships among them (if you're playing along at home, blow away the existing Employee and Company models before continuing):

./script/generate model company name:string
./script/generate model contract employable_id:integer
 employable_type:string company_id:integer
./script/generate model programmer main_language:string
 first_name:string last_name:string email_address:string
./script/generate model secretary words_per_minute:integer
 first_name:string last_name:string email_address:string

The above invocations of script/generate create four different models: one for a company, another for a programmer, another for a secretary and a fourth for a contract. Our PostgreSQL model allowed us to have a single Employee table and to have programmers and secretaries inherit from that table. Rails doesn't let us specify that one model inherits from another. Rather, we use Rails to describe the relationships among the models. Companies are connected to programmers and secretaries via employment contracts.

Because we are looking at the relationships among standalone models, rather than an inheritance hierarchy, there's no obviously good place in which to stick attributes that are common to programmers and secretaries. In the end, I decided to put the attributes in the programmer and secretary models, respectively, despite the repetition.

Now, let's define the associations:


class Company < ActiveRecord::Base
  has_many :contracts
end

class Contract < ActiveRecord::Base
  belongs_to :company
  belongs_to :employable, :polymorphic => true
end

class Programmer < ActiveRecord::Base
  has_many :contracts, :as => :employable
  has_many :companies, :through => :contracts
end

class Secretary < ActiveRecord::Base
  has_many :contracts, :as => :employable
  has_many :companies, :through => :contracts
end

In other words, each company has many contracts. Each contract joins together a company and someone who is employable. Who is employable? Right now, only programmers and secretaries fit the bill, connecting to the employable interface with contracts, and then to a company via a contract.

Behind the scenes, Rails is pulling a nasty trick, one that should make any good database programmer feel sick. The contract model includes two fields (employable_id and employable_type), which point to a single row in a particular table. In some ways, this is sort of a poor man's foreign key. But the difference is that the foreign key can point to any of several tables. And, of course, there is no error checking; only the application can stop me from entering a random text string in the employable_type column.

So, now we can create some relationships:


xyz = Company.create(:name => 'XYZ Corporation')

p1 = Programmer.create(:first_name => 'Linus',
                       :last_name => 'Torvalds',
                       :email_address => 'torvalds@osdl.org',
                       :main_language => 'C')

Contract.create(:employable => p1, :company => xyz)

s1 = Secretary.create(:first_name => 'Condoleezza',
                      :last_name => 'Rice',
                      :email_address => 'rice@state.gov',
                      :words_per_minute => 90)

Contract.create(:employable => s1, :company => xyz)

That's already pretty remarkable. Because both programmers and secretaries are employable (as they both expose the employable interface to the contracts model, using has_many :as), we can join each of them to an instance of the contract model.

But, it gets better, if we add a few more associations:


class Contract < ActiveRecord::Base
  belongs_to :company
  belongs_to :employable, :polymorphic => true

  belongs_to :programmer,
    :class_name => 'Programmer', :foreign_key => 'employable_id'
  belongs_to :secretary,
    :class_name => 'Secretary', :foreign_key => 'employable_id'
end

class Company < ActiveRecord::Base
  has_many :contracts

  has_many :programmers, :through => :contracts,
           :source => :programmer,
           :conditions => "contracts.employable_type = 'Programmer' "

  has_many :secretaries, :through => :contracts,
           :source => :secretary,
           :conditions => "contracts.employable_type = 'Secretary' "

end

With this in place, we now have a complete bidirectional association between programmers and secretaries on one side and companies on the other. Thus, we can say:


>> xyz.programmers
=> [#<Programmer id: 1, main_language: "C", first_name: "Linus",
last_name: "Torvalds", email_address: "torvalds@osdl.org", created_at:
"2008-06-12 00:47:58", updated_at: "2008-06-12 00:47:58">]

>> xyz.secretaries
=> [#<Secretary id: 1, words_per_minute: 90, first_name:
"Condoleezza", last_name: "Rice", email_address: "rice@state.gov",
created_at: "2008-06-12 00:54:34", updated_at: "2008-06-12
00:54:34">]

But, we also can say:


>> Programmer.find(1).companies
=> [#<Company id: 1, name: "XYZ Corporation", created_at: "2008-06-12
    00:47:18", updated_at: "2008-06-12 00:47:18">]

Moreover, we can iterate over xyz.contracts, bringing together the secretaries and programmers models into one package:

>> xyz.contracts.each {|c| puts c.employable.first_name}
Linus
Condoleezza

Although Rails does not provide inheritance within the models, polymorphic associations make it possible to come close to such functionality. You also get a bunch of convenience functions that make it more natural to work with these additional attributes.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState