Note: Even if you're familiar with benchmarking and profiling, take a look at the profiling section of this post. There's a nice helper method and an interesting real world example.
Improving the runtime for a page from 0.75 to 0.25 seconds may not make much of a difference to the end user's experience. But it triples the number of requests that your app servers can handle. This post covers improving performance at this level. Before you get here, you need to have your database well tuned with proper indexes, etc. Going through this workflow will make your website a little snappier and increase your overall capacity. I recently went through this process with CarePages, and we were able to make significant performance improvements in one afternoon.
The first performance bottleneck for a web app is always external resources. Most web apps use a database, so that's the common resource that can slow down your web app. But let's say your database is well indexed and handling the load of the web app fine. Also, let's assume that you don't have any other slow external resources like webservices that you call. The next bottleneck is usually the CPU if you have your own servers. If your website is running on a small slice or on shared hosting, you might hit memory limits before you max out the CPU.
Consider a box with 8 CPU cores and 4 GB of memory. 8 Rails processes with no slow external resources could likely max out 8 CPU cores. But for the example let's say it takes 16 Rails processes under full utilization to max out the CPU. Even if each Rails instance is using 150 MB of memory, that's only 2.4 GB of memory.
So if CPU time is your precious resource, you need to know how you're spending it. It helps if you have some custom logging in place, but essentially the information that you need is which page is taking up the most CPU time. It might not be your slowest page. You need to multiply the mean runtime for a page by the number of times it's hit. So if one page averages 0.2 seconds runtime, but it gets hit 20 times more frequently than a page that takes 1 second, you're spending four times as much time processing the 0.2 second page.
Note that when I'm referring to a page, I mean the controller and action. For example, I'm not considering /people/1 and /people/3 separate pages - they're both the people/show page.
At CarePages we have an Apache log that logs the controller, action, and Rails runtime. We run a script that groups the runtimes by controller and action, and then calculates the total runtime for a controller/action divided by the total runtime for all requests. What we found the first time we did this was that one of our pages was taking nearly 40% of the overall CPU time. It wasn't the slowest page, but it was the most frequently hit page. We had two other pages that were taking about 19% and 13% of the overall CPU time. This gave us a good place to start - if we could make the 40% page twice as fast, we would lower our overall CPU usage by 20%.
If you go through this process and find a relatively even distribution of CPU time across pages, you should look for ways to improve performance across the board rather than focusing on improving the performance of a few individual pages.
So now that you have a page to work on, it's time to benchmark. There are tools for Benchmarking in the form of Rails tests, and Rails 2.1 and 2.2 will have more of these tools, but I find that it's easiest to benchmark by hitting a web server. A QA environment is usually perfect for this as it will have hardware that is comparable to the hardware being used in production.
The goal of benchmarking to produce a baseline. We need to be able to make changes and know if those changes make things better, worse, or have no effect.
I usually use curl or apache bench for benchmarking. With either tool, you need to log in to your web app, navigate to the page you want to benchmark, and then grab the session cookie. Firefox's web developer toolbar provides a nice view of the cookies.
Once you have the cookie, hit the page using curl a few times and grep for the X-Runtime response header. I'm going to go through these steps myself right now so I can provide a real world example and get some work done while blogging. Replace the cookie and URLs that I'm using with URLs for your app.
curl --silent --head \ --cookie "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \ http://undisclosed.carepages.com/forums/cancer | grep X-Runtime
Throw it in a bash while loop to hit the page a few times.
while true; do curl --silent --head \ --cookie "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \ http://undisclosed.carepages.com/forums/cancer | grep X-Runtime; done
You should get a list of runtimes.
X-Runtime: 0.32921 X-Runtime: 0.33260 X-Runtime: 0.33032 X-Runtime: 0.33234 X-Runtime: 0.33143 X-Runtime: 0.33070 X-Runtime: 0.33100 X-Runtime: 0.32865 X-Runtime: 0.33608
You can also use apache bench, which will calculate the mean response time per request.
ab -c 1 -n 20 \ -C "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \ http://undisclosed.carepages.com/forums/cancer
That command will make 20 requests, 1 at a time. Look for the time per request in the output. It should be fairly close to the X-Runtime values.
... Time per request: 336.413 [ms] (mean) ...
Hopefully you get fairly consistent response times. If not, you'll need to keep the inconsistency in mind while making changes.
Now you have a baseline. In my output, I'm hitting the /forums/cancer page for CarePages. It consistently takes around 0.33 seconds to process.
The next question is why is this page taking X seconds to process? You need to profile the code to find out where you're spending those CPU cycles. To do this you'll need to install the ruby-prof gem.
It's easiest to start by profiling the entire processing for a request, and then making it more focused only if necessary. I use an around_filter in my ApplicationController to make profiling easier.
require "ruby-prof" class ApplicationController < ActionController::Base around_filter :profile def profile return yield if params[:profile].nil? result = RubyProf.profile { yield } printer = RubyProf::GraphPrinter.new(result) out = StringIO.new printer.print(out, 0) response.body.replace out.string response.content_type = "text/plain" end end
This allows you to profile a request by tacking ?profile=true on to the end of a URL. The response will then contain the profiling results instead of the normal response body. You may want to add a line of code so that this is only available in non-production environments. You also could update this code to embed the profiling in the response body rather than replacing the entire body with it.
I'm going to use this to profile the cancer forum page that I benchmarked by opening http://undisclosed.carepages.com/forums/cancer?profile=true in my browser. If you do the same for your web app, you should get the profiling results back in plain text call graph form.
Here's a snippet of output from the profiling that I just did that looks quite suspicious. (I removed a few columns to make the output narrower).
-------------------------------------------------------------------------------- 0.55 96/96 Rsm::SeoUrls::InstanceMethods#to_param 65.48% 0.00% 0.55 96 Rsm::SeoUrls::InstanceMethods#seo_urls_permalink 0.00 96/281 String#tr 0.00 96/408 String#downcase 0.55 192/192 ActiveRecord::Base#attributes 0.00 96/283 String#+ 0.00 96/1914 String#strip 0.00 192/12138 Hash#[] 0.00 96/2035 String#gsub 0.00 96/10887 Kernel#nil? 0.00 96/7006 String#to_s 0.00 96/96 String#squeeze 0.00 192/192 Topic#attribute_for_url --------------------------------------------------------------------------------
The percentage in the left column is the percent of the total processing time that a certain method took. You want to look for anything that jumps out as taking longer than you think it should.
This chunk of profiling means that 65.48% of the time spent processing /forums/cancer was inside the Rsm::SeoUrls::InstanceMethods#seo_urls_permalink method. Spending more than half of the processing time generating permalinks is way too long.
Let's take a look at the seo_urls_permalink method from the seo_urls plugin.
def seo_urls_permalink if !self.attributes[attribute_for_url] begin string=self.title rescue begin string=self.name rescue raise("No attribute was specified for seo_urls to use and 'name' and 'title' weren't available") end end else string=self.attributes[attribute_for_url] end if string.nil? return "" else t = "-"+(string.to_s.downcase.strip.gsub(/[^-_\s[:alnum:]]/, '').squeeze(' ').tr(' ', '-')) end end
There are definitely a few inefficiencies here. The first is the multiple rescues. They're pointless - Ruby's respond_to? method should be used instead. The other inefficiency is the call to attributes. We only need to access one attribute, but we're accessing the entire hash. This wouldn't be a problem, except that Rails does typecasting when accessing the attributes. In particular the time zone conversion on time columns is a little slow. Let's rewrite this method.
def seo_urls_permalink attr = attribute_for_url attr = :title if attr.nil? && respond_to?(:title) attr = :name if attr.nil? && respond_to?(:name) unless attr raise("No attribute was specified for seo_urls to use and 'name' and 'title' weren't available") end string = send(attr) return "" if string.nil? "-"+(string.to_s.downcase.strip.gsub(/[^-_\s[:alnum:]]/, '').squeeze(' ').tr(' ', '-')) end
I ran the tests for the seo_urls plugin, and they passed. It's time to find out if this change makes a difference by running the benchmarks again.
while true; do curl --silent --head \ --cookie "_carepages_session=eb52948f037bda387dfa9d83b6bad62986a624cf" \ http://undisclosed.carepages.com/forums/cancer | grep X-Runtime; done
X-Runtime: 0.07943 X-Runtime: 0.16852 X-Runtime: 0.07982 X-Runtime: 0.16367 X-Runtime: 0.07924 X-Runtime: 0.16349 X-Runtime: 0.07905 X-Runtime: 0.16404 X-Runtime: 0.07916 X-Runtime: 0.16441 X-Runtime: 0.07875
Awesome. The processing time is now half of what is was before. I'll submit a patch for the seo_urls plugin. Interestingly, some response times are quite a bit faster than others. I may be hitting the point where garbage collection is kicking in more during some requests than during others.
Here's the new profiling for my /forums/cancer page, looking at the seo_urls_permalink method.
-------------------------------------------------------------------------------- 0.01 96/96 Rsm::SeoUrls::InstanceMethods#to_param 4.76% 0.00% 0.01 96 Rsm::SeoUrls::InstanceMethods#seo_urls_permalink 0.00 58/333 Kernel#send-3 0.00 96/2207 String#to_s 0.00 96/96 String#squeeze 0.01 10/13 Kernel#send-5 0.00 96/282 String#tr 0.00 96/217 String#downcase 0.00 96/283 String#+ 0.00 96/186 String#strip 0.00 96/2041 String#gsub 0.00 96/1766 Kernel#nil? 0.00 192/192 ActiveRecord::AttributeMethods#respond_to? 0.00 28/578 Kernel#send-2 --------------------------------------------------------------------------------
I'm now spending 4.76% of the total processing time in seo_urls_permalink, compared to 65.48% before.
Continue repeating this process until the processing time distribution per controller and action is mostly even. At that point, it's time to look to improve the performance of the pages with the slowest mean processing time. Or you could look for ways to improve the performance of every page.
Don't get too caught up on the distribution though. If you have a page that takes 20% of the overall processing for your site, but you don't see any easy wins, take a look at the next page. Doubling the speed of page that takes 10% of the overall time will still increase capacity by 5%.