#239 ActiveRecord::Relation Walkthrough

Nov 07, 2010 | 11 minutes | Active Record, Rails 3.0, Code Walkthrough

The new Active Record query methods in Rails 3 might seem like magic at first. But in this episode I unravel how it works by browsing the Rails 3 source code.

Click to Play Video ▶

Download:
mp4Full Size H.264 Video (34.3 MB)
m4vSmaller H.264 Video (17.7 MB)
webmFull Size VP8 Video (38.5 MB)
ogvFull Size Theora Video (48.3 MB)

One of Rails 3’s best new features in is the new Active Record query syntax. Episode 202 [watch, read] covered this new syntax in some detail so if you’re not yet familiar with it it’s well worth taking a look at that episode before reading this one. When you first use the new syntax it might appear that some magic is going on behind the scenes but here we’ll take you on a tour of the relevant parts of the Rails source code to show you exactly how it works.

Getting The Source

If you don’t have a copy of the Rails source code to hand then it’s worth getting a copy so that you can refer to it as you read this episode. All you need to do is clone the git repository from Github with the following command.

          terminal
        
$ git clone git://github.com/rails/rails.git

Once the repository has finished downloading you can switch to the version we’re using here by checking out the appropriate branch.

          terminal
        
$ git checkout v3.0.1

We’re mainly interested in the ActiveRecord code so we’ll move into the relevant directory.

          terminal
        
$ cd activerecord/lib/active_record

The code for ActiveRecord is pretty large and is contained over a number of files. We’ll only be looking at a few of these in this episode.

          terminal
        
$ ls -F
aggregations.rb           nested_attributes.rb
association_preload.rb    observer.rb
associations/             persistence.rb
associations.rb           query_cache.rb
attribute_methods/        railtie.rb
attribute_methods.rb      railties/
autosave_association.rb   reflection.rb
base.rb                   relation/
callbacks.rb              relation.rb
connection_adapters/      schema.rb
counter_cache.rb          schema_dumper.rb
dynamic_finder_match.rb   serialization.rb
dynamic_scope_match.rb    serializers/
errors.rb                 session_store.rb
fixtures.rb               test_case.rb
locale/                   timestamp.rb
locking/                  transactions.rb
log_subscriber.rb         validations/
migration.rb              validations.rb
named_scope.rb            version.rb

Experimenting In the Console

Before we dive in to the code let’s get a better idea of what we’re searching for by experimenting in the console of a Rails 3 application. The application we’re using here is a simple todo list app with several Task models. We can get all of the tasks by using Task.all:

          terminal
        
> Task.all
 => [#<Task id: 1, project_id: 1, name: "paint fence", completed_at: nil, created_at: "2010-11-08 21:25:05", updated_at: "2010-11-08 21:32:21", priority: 2>, 
 #<Task id: 2, project_id: 1, name: "weed garden", completed_at: nil, created_at: "2010-11-08 21:25:29", updated_at: "2010-11-08 21:27:04", priority: 3>, 
 #<Task id: 3, project_id: 1, name: "mow lawn", completed_at: nil, created_at: "2010-11-08 21:25:37", updated_at: "2010-11-08 21:26:42", priority: 3>]

The new Active Record query syntax makes it simple to, say, get all of the tasks with a priority of 3.

          terminal
        
> Task.where(:priority => 3)
 => [#<Task id: 2, project_id: 1, name: "weed garden", completed_at: nil, created_at: "2010-11-08 21:25:29", updated_at: "2010-11-08 21:27:04", priority: 3>, 
 #<Task id: 3, project_id: 1, name: "mow lawn", completed_at: nil, created_at: "2010-11-08 21:25:37", updated_at: "2010-11-08 21:26:42", priority: 3>]

What’s returned by this query looks like an array of records but if we call class on it we’ll see that it’s actually an instance of ActiveRecord::Relation.

          terminal
        
> Task.where(:priority => 3).class
 => ActiveRecord::Relation

If we add another option to the query and call class on that we’ll get an object of the same type returned.

          terminal
        
> Task.where(:priority => 3).limit(2).class
 => ActiveRecord::Relation

The Relation Class

Having queries return an ActiveRecord::Relation object allows us to chain queries together and this Relation class is at the heart of the new query syntax. Let’s take a look at this class by searching through the ActiveRecord source code for a file called relation.rb.

At the top of the class a number of constants are defined, one of which is a Struct. If you’re not familiar with structs these are a way of quickly defining a class dynamically by passing in a list of attributes in the constructor.

          /rails/active_record/lib/active_record/relation.rb
        
require 'active_support/core_ext/object/blank'

module ActiveRecord
  # = Active Record Relation
  class Relation
    JoinOperation = Struct.new(:relation, :join_class, :on)
    ASSOCIATION_METHODS = [:includes, :eager_load, :preload]
    MULTI_VALUE_METHODS = [:select, :group, :order, :joins, :where, :having]
    SINGLE_VALUE_METHODS = [:limit, :offset, :lock, :readonly, :create_with, :from]

    include FinderMethods, Calculations, SpawnMethods, QueryMethods, Batches

Next the class includes a number of modules and these modules contain most of the class’s features. The modules’ files are contained in a relation directory within the active_record directory. We’ll take a look at one of these now: query_methods.rb.

This class contains the methods that we use in the new query syntax: includes, select, group, order, joins and so on. All of these methods behave very similarly here, calling clone. This clones the Relation object, returning a new Relation object rather than altering the existing one. They then call tap on the cloned object which returns the object after the block has executed on it. In each block we add the arguments that are passed into the method to the appropriate set of values in the Relation object.

          /rails/active_record/lib/active_record/relation/query_methods.rb
        
def group(*args)
  clone.tap {|r| r.group_values += args.flatten if args.present? }
end

def order(*args)
  clone.tap {|r| r.order_values += args if args.present? }
end

def reorder(*args)
 clone.tap {|r| r.order_values = args if args.present? }
end

So earlier when we called Task.where(:priority => 3) in the console it returned a instance of Relation and when we called limit(2) on that Relation the limit method in the QueryMethods module was called and returned a cloned Relation object. But what about the initial call to where? We know that limit is being called on a Relation but what about the where call? This is called directly on the Task model and therefore on ActiveRecord::Base rather than Relation so where is the initial Relation object created?

To answer this we’ll search through the ActiveRecord source code. If we search for “def where” we’ll find a match, but only in the QueryMethods module we were just looking in. A search for “def self.where” returns nothing either. Another way that methods can be defined is with the delegate keyword and if we search the code with the regular expression “delegate.+ :where” we’ll get some interesting results.

The second match delegates a lot of query methods and it looks like this is what we’re after.

          rails/activerecord/lib/active_record/base.rb
        
delegate :select, :group, :order, :reorder, :limit, :joins, :where, :preload, :eager_load, :includes, :from, :lock, :readonly, :having, :create_with, :to => :scoped

This line lists all of the query methods and delegates them all to scoped. So, what does scoped do? If we search across the project again we’ll find this method in the named_scope.rb file.

The NamedScope module is included in ActiveRecord::Base so we have access to all of its methods in there. The scoped method is fairly simple, calling relation and then merging in any options that it has into that.

          rails/activerecord/lib/active_record/named_scope.rb
        
def scoped(options = nil)
  if options
    scoped.apply_finder_options(options)
  else
    current_scoped_methods ? relation.merge (current_scoped_methods) : relation.clone
  end
end

Let’s look next at the relation method which is defined in ActiveRecord::Base.

          rails/activerecord/lib/active_record/base.rb
        
private
def relation #:nodoc:
  @relation ||= Relation.new(self, arel_table)
  finder_needs_type_condition? ? @relation.where(type_condition) : @relation
end

Here is where the Relation object is instantiated. We pass it self, which is an ActiveRecord model class and arel_table, which is an Arel::Table object. The method then returns that Relation. (The condition that adds a some where conditions first is related to single-table inheritance.) The arel_table method is defined in the same class and just creates a new Arel::Table object.

          rails/activerecord/lib/active_record/base.rb
        
def arel_table
  @arel_table ||= Arel::Table.new(table_name, arel_engine)
end

Arel

The question now is “what is Arel”? Arel is an external dependency so we won’t find it in the Rails source code, but it’s worth taking a look at the source, which can be found on Github. Arel is a framework that simplifies the generation of complex SQL queries and ActiveRecord uses this to do just that, like this:

          ruby
        
users.where(users[:name].eq('amy'))
# => SELECT * FROM users WHERE users.name = 'amy'

Now that we know what an Arel::Table is we can go back to the relation method. This returns a Relation object so let’s take a look at the Relation class. The initializer for this class just takes in the class and table that are passed to it and stores them in an instance variable.

Back in the Rails console we now know know what happens when we call

          terminal
        
Task.where(:priority => 3).limit(2).class

A new Relation object is created when we call where and when we call limit on that the relation is cloned and the additional arguments are added and stored in the cloned object. When we call class on this the query isn’t performed, but if we remove .class from the end of the command the query will be run and we’ll see a list of objects returned.

          terminal
        
> Task.where(:priority => 3).limit(2)
 => [#<Task id: 2, project_id: 1, name: "weed garden", completed_at: nil, created_at: "2010-11-08 21:25:29", updated_at: "2010-11-08 21:27:04", priority: 3>, 
 #<Task id: 3, project_id: 1, name: "mow lawn", completed_at: nil, created_at: "2010-11-08 21:25:37", updated_at: "2010-11-08 21:26:42", priority: 3>]

The query must be performed somewhere and what’s happening behind the scenes in the console is that inspect is called on the command that is being run. Relation overrides the default inspect method. Let’s take a look at what the overridden method does.

          /rails/active_record/lib/active_record/relation.rb
        
def inspect
  to_a.inspect
end

All that inspect does here does is call to_a.inspect on the relation. Following the code in Relation the to_a method looks like this:

          /rails/active_record/lib/active_record/relation.rb
        
def to_a
  return @records if loaded?

  @records = eager_loading? ? find_with_associations : @klass.find_by_sql(arel.to_sql)

  preload = @preload_values
  preload +=  @includes_values unless eager_loading?
  preload.each {|associations| @klass.send(:preload_associations, @records, associations) }

  # @readonly_value is true only if set explicitly. @implicit_readonly is true if there
  # are JOINS and no explicit SELECT.
  readonly = @readonly_value.nil? ? @implicit_readonly : @readonly_value
  @records.each { |record| record.readonly! } if readonly

  @loaded = true
  @records
end

This method returns the records if they already exist, otherwise it fetches them and then returns them. The interesting part of this method is the part that fetches the methods, specifically this part: @klass.find_by_sql(arel.to_sql). This code calls find_by_sql on a model, in this case our Task model and passes in arel.to_sql. The arel method that is used here is defined in the QueryMethods module that we saw earlier. All this method does is call another method called build_arel and cache the result into an instance variable and it’s in the build_arel method where all of the work takes place.

          /rails/active_record/lib/active_record/relation/query_methods.rb
        
def build_arel
  arel = table
  
  arel = build_joins(arel, @joins_values) unless &crarr; 
    @joins_values.empty?

  (@where_values - ['']).uniq.each do |where|
    case where
    when Arel::SqlLiteral
      arel = arel.where(where)
    else
      sql = where.is_a?(String) ? where : where.to_sql
      arel = arel.where(Arel::SqlLiteral.new("(#{sql})"))
    end
  end

  arel = arel.having(*@having_values.uniq.select{|h| h.present?}) unless @having_values.empty?

  arel = arel.take(@limit_value) if @limit_value
  arel = arel.skip(@offset_value) if @offset_value

  arel = arel.group(*@group_values.uniq.select{|g| g.present?}) unless @group_values.empty?

  arel = arel.order(*@order_values.uniq.select{|o| o.present?}) unless @order_values.empty?

  arel = build_select(arel, @select_values.uniq)

  arel = arel.from(@from_value) if @from_value
  arel = arel.lock(@lock_value) if @lock_value

  arel
end

This method fetches the Arel::Table that we saw earlier and then builds up a query, converting all of the data that we’ve been storing inside the Relation object and converting them into an Arel query which it then returns. Back in the Relation class the to_a method calls to_sql on this Arel query to convert it to SQL and then calls find_by_sql on the model so that an array of the appropriate records is returned.

Now that we have a basic understanding of how this class works there are a lot of other methods that we can explore by browsing the code in Relation. For example the create method calls another method called scoping and calls create on the @klass. This will create a new instance of a model and the scoping method will add itself to @klass’s scoped methods. What this means is that anything executed inside a scoping block will be scoped as if it were called directly on that relation object. The modules are worth exploring too, especially QueryMethods. There are a number of methods in there that you may not be aware of for example reorder which will reset the order arguments rather than appending to them as order does.

          /rails/active_record/lib/active_record/relation/query_methods.rb
        
def order(*args)
  clone.tap {|r| r.order_values += args if args.present? }
end

def reorder(*args)
  clone.tap {|r| r.order_values = args if args.present? }
end

There is also a reverse_order method that will reverse the order of the order clause.

The Calculations module contains methods for performing calculations on fields such as average, minimum and maximum. The SpawnMethods module is interesting because it allows you to interact with separate Relation objects, for example merging two relations. There are also except and only methods which we’ve not had time to experiment with yet. The best way to determine what these methods do is to open up the console of a Rails 3 application and try these methods out to see what they do. You can learn a lot of interesting techniques by browsing the code and experimenting with the methods you find in it.

That’s it for this episode on the internals of ActiveRecord::Relation. We encourage you to browse the Rails source code and experiment with any methods that you find that look interesting.