Rails: Performance Tuning Workflow

Note: Even if you're familiar with benchmarking and profiling, take a look at the profiling section of this post. There's a nice helper method and an interesting real world example.

Improving the runtime for a page from 0.75 to 0.25 seconds may not make much of a difference to the end user's experience. But it triples the number of requests that your app servers can handle. This post covers improving performance at this level. Before you get here, you need to have your database well tuned with proper indexes, etc. Going through this workflow will make your website a little snappier and increase your overall capacity. I recently went through this process with CarePages, and we were able to make significant performance improvements in one afternoon.

CPU Time is Precious

The first performance bottleneck for a web app is always external resources. Most web apps use a database, so that's the common resource that can slow down your web app. But let's say your database is well indexed and handling the load of the web app fine. Also, let's assume that you don't have any other slow external resources like webservices that you call. The next bottleneck is usually the CPU if you have your own servers. If your website is running on a small slice or on shared hosting, you might hit memory limits before you max out the CPU.

Consider a box with 8 CPU cores and 4 GB of memory. 8 Rails processes with no slow external resources could likely max out 8 CPU cores. But for the example let's say it takes 16 Rails processes under full utilization to max out the CPU. Even if each Rails instance is using 150 MB of memory, that's only 2.4 GB of memory.

So if CPU time is your precious resource, you need to know how you're spending it. It helps if you have some custom logging in place, but essentially the information that you need is which page is taking up the most CPU time. It might not be your slowest page. You need to multiply the mean runtime for a page by the number of times it's hit. So if one page averages 0.2 seconds runtime, but it gets hit 20 times more frequently than a page that takes 1 second, you're spending four times as much time processing the 0.2 second page.

Note that when I'm referring to a page, I mean the controller and action. For example, I'm not considering /people/1 and /people/3 separate pages - they're both the people/show page.

At CarePages we have an Apache log that logs the controller, action, and Rails runtime. We run a script that groups the runtimes by controller and action, and then calculates the total runtime for a controller/action divided by the total runtime for all requests. What we found the first time we did this was that one of our pages was taking nearly 40% of the overall CPU time. It wasn't the slowest page, but it was the most frequently hit page. We had two other pages that were taking about 19% and 13% of the overall CPU time. This gave us a good place to start - if we could make the 40% page twice as fast, we would lower our overall CPU usage by 20%.

If you go through this process and find a relatively even distribution of CPU time across pages, you should look for ways to improve performance across the board rather than focusing on improving the performance of a few individual pages.

Benchmarking

So now that you have a page to work on, it's time to benchmark. There are tools for Benchmarking in the form of Rails tests, and Rails 2.1 and 2.2 will have more of these tools, but I find that it's easiest to benchmark by hitting a web server. A QA environment is usually perfect for this as it will have hardware that is comparable to the hardware being used in production.

The goal of benchmarking to produce a baseline. We need to be able to make changes and know if those changes make things better, worse, or have no effect.

I usually use curl or apache bench for benchmarking. With either tool, you need to log in to your web app, navigate to the page you want to benchmark, and then grab the session cookie. Firefox's web developer toolbar provides a nice view of the cookies.

Once you have the cookie, hit the page using curl a few times and grep for the X-Runtime response header. I'm going to go through these steps myself right now so I can provide a real world example and get some work done while blogging. Replace the cookie and URLs that I'm using with URLs for your app.

curl --silent --head \
  --cookie "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \
  http://undisclosed.carepages.com/forums/cancer | grep X-Runtime

Throw it in a bash while loop to hit the page a few times.

while true; do curl --silent --head \
  --cookie "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \
  http://undisclosed.carepages.com/forums/cancer | grep X-Runtime; done

You should get a list of runtimes.

  X-Runtime: 0.32921
  X-Runtime: 0.33260
  X-Runtime: 0.33032
  X-Runtime: 0.33234
  X-Runtime: 0.33143
  X-Runtime: 0.33070
  X-Runtime: 0.33100
  X-Runtime: 0.32865
  X-Runtime: 0.33608

You can also use apache bench, which will calculate the mean response time per request.

ab -c 1 -n 20 \
  -C "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \
  http://undisclosed.carepages.com/forums/cancer

That command will make 20 requests, 1 at a time. Look for the time per request in the output. It should be fairly close to the X-Runtime values.

...
Time per request: 336.413 [ms] (mean)
...

Hopefully you get fairly consistent response times. If not, you'll need to keep the inconsistency in mind while making changes.

Now you have a baseline. In my output, I'm hitting the /forums/cancer page for CarePages. It consistently takes around 0.33 seconds to process.

Profiling

The next question is why is this page taking X seconds to process? You need to profile the code to find out where you're spending those CPU cycles. To do this you'll need to install the ruby-prof gem.

It's easiest to start by profiling the entire processing for a request, and then making it more focused only if necessary. I use an around_filter in my ApplicationController to make profiling easier.

require "ruby-prof"
class ApplicationController < ActionController::Base
  around_filter :profile
  
  def profile
    return yield if params[:profile].nil?
    result = RubyProf.profile { yield }
    printer = RubyProf::GraphPrinter.new(result)
    out = StringIO.new
    printer.print(out, 0)
    response.body.replace out.string
    response.content_type = "text/plain"
  end
end

This allows you to profile a request by tacking ?profile=true on to the end of a URL. The response will then contain the profiling results instead of the normal response body. You may want to add a line of code so that this is only available in non-production environments. You also could update this code to embed the profiling in the response body rather than replacing the entire body with it.

I'm going to use this to profile the cancer forum page that I benchmarked by opening http://undisclosed.carepages.com/forums/cancer?profile=true in my browser. If you do the same for your web app, you should get the profiling results back in plain text call graph form.

Here's a snippet of output from the profiling that I just did that looks quite suspicious. (I removed a few columns to make the output narrower).

--------------------------------------------------------------------------------
                      0.55      96/96     Rsm::SeoUrls::InstanceMethods#to_param
  65.48%   0.00%      0.55         96     Rsm::SeoUrls::InstanceMethods#seo_urls_permalink
                      0.00     96/281     String#tr
                      0.00     96/408     String#downcase
                      0.55    192/192     ActiveRecord::Base#attributes
                      0.00     96/283     String#+
                      0.00    96/1914     String#strip
                      0.00  192/12138     Hash#[]
                      0.00    96/2035     String#gsub
                      0.00   96/10887     Kernel#nil?
                      0.00    96/7006     String#to_s
                      0.00      96/96     String#squeeze
                      0.00    192/192     Topic#attribute_for_url
--------------------------------------------------------------------------------

The percentage in the left column is the percent of the total processing time that a certain method took. You want to look for anything that jumps out as taking longer than you think it should.

This chunk of profiling means that 65.48% of the time spent processing /forums/cancer was inside the Rsm::SeoUrls::InstanceMethods#seo_urls_permalink method. Spending more than half of the processing time generating permalinks is way too long.

Let's take a look at the seo_urls_permalink method from the seo_urls plugin.

def seo_urls_permalink
  if !self.attributes[attribute_for_url]
    begin
      string=self.title
    rescue
      begin
        string=self.name
      rescue
        raise("No attribute was specified for seo_urls to use and 'name' and 'title' weren't available")
      end
    end
  else
    string=self.attributes[attribute_for_url]
  end
  if string.nil?
    return ""
  else
    t = "-"+(string.to_s.downcase.strip.gsub(/[^-_\s[:alnum:]]/, '').squeeze(' ').tr(' ', '-'))
  end
end

There are definitely a few inefficiencies here. The first is the multiple rescues. They're pointless - Ruby's respond_to? method should be used instead. The other inefficiency is the call to attributes. We only need to access one attribute, but we're accessing the entire hash. This wouldn't be a problem, except that Rails does typecasting when accessing the attributes. In particular the time zone conversion on time columns is a little slow. Let's rewrite this method.

def seo_urls_permalink
  attr = attribute_for_url
  attr = :title if attr.nil? && respond_to?(:title)
  attr = :name if attr.nil? && respond_to?(:name)
  unless attr
    raise("No attribute was specified for seo_urls to use and 'name' and 'title' weren't available")
  end
  string = send(attr)
  return "" if string.nil?
  "-"+(string.to_s.downcase.strip.gsub(/[^-_\s[:alnum:]]/, '').squeeze(' ').tr(' ', '-'))
end

I ran the tests for the seo_urls plugin, and they passed. It's time to find out if this change makes a difference by running the benchmarks again.

while true; do curl --silent --head \
  --cookie "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \
  http://undisclosed.carepages.com/forums/cancer | grep X-Runtime; done

  X-Runtime: 0.07943
  X-Runtime: 0.16852
  X-Runtime: 0.07982
  X-Runtime: 0.16367
  X-Runtime: 0.07924
  X-Runtime: 0.16349
  X-Runtime: 0.07905
  X-Runtime: 0.16404
  X-Runtime: 0.07916
  X-Runtime: 0.16441
  X-Runtime: 0.07875

Awesome. The processing time is now half of what is was before. I'll submit a patch for the seo_urls plugin. Interestingly, some response times are quite a bit faster than others. I may be hitting the point where garbage collection is kicking in more during some requests than during others.

Here's the new profiling for my /forums/cancer page, looking at the seo_urls_permalink method.

--------------------------------------------------------------------------------
                      0.01        96/96     Rsm::SeoUrls::InstanceMethods#to_param
   4.76%   0.00%      0.01           96     Rsm::SeoUrls::InstanceMethods#seo_urls_permalink
                      0.00       58/333     Kernel#send-3
                      0.00      96/2207     String#to_s
                      0.00        96/96     String#squeeze
                      0.01        10/13     Kernel#send-5
                      0.00       96/282     String#tr
                      0.00       96/217     String#downcase
                      0.00       96/283     String#+
                      0.00       96/186     String#strip
                      0.00      96/2041     String#gsub
                      0.00      96/1766     Kernel#nil?
                      0.00      192/192     ActiveRecord::AttributeMethods#respond_to?
                      0.00       28/578     Kernel#send-2
--------------------------------------------------------------------------------

I'm now spending 4.76% of the total processing time in seo_urls_permalink, compared to 65.48% before.

Repeat

Continue repeating this process until the processing time distribution per controller and action is mostly even. At that point, it's time to look to improve the performance of the pages with the slowest mean processing time. Or you could look for ways to improve the performance of every page.

Don't get too caught up on the distribution though. If you have a page that takes 20% of the overall processing for your site, but you don't see any easy wins, take a look at the next page. Doubling the speed of page that takes 10% of the overall time will still increase capacity by 5%.