At the Forge - Shoehorning Data into a Database

 in
Databases make the world go round, but sometimes fitting a round data peg into a square database hole is less than ideal. What's a programmer to do?
Polymorphic Associations

The way that PostgreSQL has integrated this type of object hierarchy into its relational system is impressive, flexible and useful. And yet, because it is unique to PostgreSQL, it means that no higher-level, database-agnostic application framework can support it. This especially is true in Ruby on Rails, which tries to treat all databases as similar or identical, going so far as to encourage programmers to use a Ruby-based domain-specific language (migrations) to create and modify database definitions. Using PostgreSQL's inheritance features might work, but it will take a fair amount of twisting to make it compatible with Rails.

Besides, Rails already has a feature, known as polymorphic associations, that lets us work with distinct types of items as if they were part of a single class. This isn't the same as an object hierarchy—we can't say that secretaries and programmers are both types of employees. But, we can say that secretaries and programmers are both employable and treat them as similar via that description.

To begin, you might remember that Rails has something known as associations, which allow us to connect one model to another. For example, let's say that each company has one or more employees. Thus, we can create some simple models. We can generate migrations with:

./script/generate model company name:string
./script/generate model employee first_name:string
    last_name:string email_address:string company_id:integer

Then, we can turn the automatically generated migration files into actual database tables with the following:

rake db:migrate

Now, we can indicate that each company can have one or more employees by modifying the model files. For example, we add the following to employee.rb:


class Company < ActiveRecord::Base
  has_many :employees
end

Similarly, we can say:


class Employee < ActiveRecord::Base
  belongs_to :company
end

With has_many and belongs_to in place, we now have created an “association” between these two models. This might not seem too exciting, but it means we can treat the two tables as object classes and each row in the table as an instance:


xyz = Company.create(:name => 'XYZ Corporation')

george = Employee.create(:first_name => 'George',
    :last_name => 'Washington',
    :email_address => 'georgie@whitehouse.gov',
    :company_id => xyz.id)

Now, we can say:

p xyz.employees.first

and we get back our george user. Similarly, we can say:

p george.company

and get back our xyz company. This is all standard stuff for Rails programmers, and it is part of the ActiveRecord feature known as associations. You can create all sorts of associations, giving them arbitrary names. For example, we could say:


class Company < ActiveRecord::Base
  has_many :employees
  has_many :employees_with_a, :class_name => 'Employee',
            :conditions => "first_name ilike '%a%'"
end

With this in place, and after restarting the console (or typing reload!), we now can say:

xyz = Company.find_by_name('XYZ Corporation')

xyz.employees_with_a

This prints the empty list—not surprising, given that we have defined only a single employee so far, and his name didn't contain an a. But, now we can create a second employee:


jane = Employee.create(:first_name => 'Jane',
                       :last_name => 'Austin',
                       :email_address => 'jane@bookauthor.com',
                       :company_id => xyz.id)

If we run our association again:

xyz.employees_with_a

now we get our jane employee.

This is all well and good, but what happens if we want to represent different types of employees, each of whom is employed by a company, but with different associated data? This is where polymorphic associations become useful. In order for this to work, we need to change the definitions of our models, as well as the relationships among them (if you're playing along at home, blow away the existing Employee and Company models before continuing):

./script/generate model company name:string
./script/generate model contract employable_id:integer
 employable_type:string company_id:integer
./script/generate model programmer main_language:string
 first_name:string last_name:string email_address:string
./script/generate model secretary words_per_minute:integer
 first_name:string last_name:string email_address:string

The above invocations of script/generate create four different models: one for a company, another for a programmer, another for a secretary and a fourth for a contract. Our PostgreSQL model allowed us to have a single Employee table and to have programmers and secretaries inherit from that table. Rails doesn't let us specify that one model inherits from another. Rather, we use Rails to describe the relationships among the models. Companies are connected to programmers and secretaries via employment contracts.

Because we are looking at the relationships among standalone models, rather than an inheritance hierarchy, there's no obviously good place in which to stick attributes that are common to programmers and secretaries. In the end, I decided to put the attributes in the programmer and secretary models, respectively, despite the repetition.

Now, let's define the associations:


class Company < ActiveRecord::Base
  has_many :contracts
end

class Contract < ActiveRecord::Base
  belongs_to :company
  belongs_to :employable, :polymorphic => true
end

class Programmer < ActiveRecord::Base
  has_many :contracts, :as => :employable
  has_many :companies, :through => :contracts
end

class Secretary < ActiveRecord::Base
  has_many :contracts, :as => :employable
  has_many :companies, :through => :contracts
end

In other words, each company has many contracts. Each contract joins together a company and someone who is employable. Who is employable? Right now, only programmers and secretaries fit the bill, connecting to the employable interface with contracts, and then to a company via a contract.

Behind the scenes, Rails is pulling a nasty trick, one that should make any good database programmer feel sick. The contract model includes two fields (employable_id and employable_type), which point to a single row in a particular table. In some ways, this is sort of a poor man's foreign key. But the difference is that the foreign key can point to any of several tables. And, of course, there is no error checking; only the application can stop me from entering a random text string in the employable_type column.

So, now we can create some relationships:


xyz = Company.create(:name => 'XYZ Corporation')

p1 = Programmer.create(:first_name => 'Linus',
                       :last_name => 'Torvalds',
                       :email_address => 'torvalds@osdl.org',
                       :main_language => 'C')

Contract.create(:employable => p1, :company => xyz)

s1 = Secretary.create(:first_name => 'Condoleezza',
                      :last_name => 'Rice',
                      :email_address => 'rice@state.gov',
                      :words_per_minute => 90)

Contract.create(:employable => s1, :company => xyz)

That's already pretty remarkable. Because both programmers and secretaries are employable (as they both expose the employable interface to the contracts model, using has_many :as), we can join each of them to an instance of the contract model.

But, it gets better, if we add a few more associations:


class Contract < ActiveRecord::Base
  belongs_to :company
  belongs_to :employable, :polymorphic => true

  belongs_to :programmer,
    :class_name => 'Programmer', :foreign_key => 'employable_id'
  belongs_to :secretary,
    :class_name => 'Secretary', :foreign_key => 'employable_id'
end

class Company < ActiveRecord::Base
  has_many :contracts

  has_many :programmers, :through => :contracts,
           :source => :programmer,
           :conditions => "contracts.employable_type = 'Programmer' "

  has_many :secretaries, :through => :contracts,
           :source => :secretary,
           :conditions => "contracts.employable_type = 'Secretary' "

end

With this in place, we now have a complete bidirectional association between programmers and secretaries on one side and companies on the other. Thus, we can say:


>> xyz.programmers
=> [#<Programmer id: 1, main_language: "C", first_name: "Linus",
last_name: "Torvalds", email_address: "torvalds@osdl.org", created_at:
"2008-06-12 00:47:58", updated_at: "2008-06-12 00:47:58">]

>> xyz.secretaries
=> [#<Secretary id: 1, words_per_minute: 90, first_name:
"Condoleezza", last_name: "Rice", email_address: "rice@state.gov",
created_at: "2008-06-12 00:54:34", updated_at: "2008-06-12
00:54:34">]

But, we also can say:


>> Programmer.find(1).companies
=> [#<Company id: 1, name: "XYZ Corporation", created_at: "2008-06-12
    00:47:18", updated_at: "2008-06-12 00:47:18">]

Moreover, we can iterate over xyz.contracts, bringing together the secretaries and programmers models into one package:

>> xyz.contracts.each {|c| puts c.employable.first_name}
Linus
Condoleezza

Although Rails does not provide inheritance within the models, polymorphic associations make it possible to come close to such functionality. You also get a bunch of convenience functions that make it more natural to work with these additional attributes.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix