A Little Cache Goes a Long Way

Drupal is the target of load testing for this series of articles. If this is your first time reading any of the articles in this series, please review the introduction and summary called Load Testing Drupal for a good context of what we are doing.

Our first test plan focuses on anonymous users coming to a Drupal site and browsing content. We believe this covers the majority of Drupal traffic in the real world, and reflecting actual user traffic patterns is a critical component of effective web load testing. This plan is a good place to put a stake in the ground.

As designed for all of our Drupal load testing, we will use a LAMP stack. The test results presented in this article will be from tests running on a small Amazon EC2 server instance. Our Drupal application has 1,000 users and 10,000 pages.

In the beginning, we didn’t change any of the default settings of Drupal, Apache, or MySQL. We left everything as it comes out of the box. We will make some simple adjustments later.

We set a benchmark of 5 second average response time as the upper limit of acceptable performance. Anything above 5 seconds was considered a failure.

Using LoadStorm, we created a test plan with two scenarios. The first specifies which pages the virtual user would visit and in a particular order. The second has our load testing tool randomly select pages on the site.

In the first scenario, we visited 5 different stories and returned to the home page between stories. There were 10 steps total and a random pause of 10-60 seconds between each step to simulate realistic user behavior (reading the page content before clicking again).

We ran three iterations of the test plan in order to confirm accuracy and eliminate statistical anomalies.
Each load test iteration ran for 30 minutes starting at 10 users and ramping up to 100 users. Here is a screenshot of the load test parameters.

The average results were disappointing to say the least. We were only able to get to an average of 67 users before our average response time started to go above 5 seconds – our benchmark time for failure.

Here are the graphs that LoadStorm produced during one of the test runs.


From the above graph we get a couple of data points that are of interest. At just over 14 minutes into the test you will notice in the top graph that both the Requests per second and Throughput start to vacillate. They begin to go up and down in a somewhat irregular pattern. This indicates that a limit of some kind has been reached. In a well performing system you should see both of those value continue to climb as the users increase. Obviously at some point every system will level off and that is the maximum number of users that system can handle. So even if you did not have the bottom graph you would know that something wasn’t quite right. However, without the bottom graph you wouldn’t know what your users were experiencing. At the same time you see, in the second screen shot, the Response time, average start to climb rapidly and not too much longer after that errors start to appear. Some systems will just slow down and not necessarily throw errors. The pages will just take a very long time to load. But in our case we start to slow down and we throw errors and all the errors are 408 – Request Time out.

In this first scenario we averaged 1.63 Requests/sec and 34 KB/sec Throughput.

So…in view of our dismal results out of the box we knew some changes had to be made. We are all about small changes for big improvements. We also knew that adding more memory or another processor was not an option. We wanted to get THIS configuration running as well as possible. One of the difficulties with performance engineering is that there are literally dozens of things that can be changed and millions of different combinations. Because of that, we start with small changes and see where we can get some big performance increases.

Turn On Caching

The first thing we did was change the configuration of Drupal and enable Normal Page Cache. We left the minimum Lifetime Cache at ‘None’ and Compression enabled. We then ran the same test again. And the results were impressive. We were able to run all 100 users with no issues and our Response time, average stayed well below .5 seconds. In fact, our Response time, average was .034 seconds.

Here are a couple of screen shots from one of the test runs.

”Load

Similar Posts