Response times and other performance measurements will be greatly affected by what pages are requested, what forms are submitted, and what buttons are clicked during a load test. Thus, a key aspect of being a good load tester is the ability to create test scenarios well.
How should you develop test cases? Hopefully this post will give you some useful suggestions you can put into practice.
The primary purpose of load testing is usually to find bottlenecks that decrease performance, and then mitigate or eliminate those bottlenecks. CEOs, CTOs, Vice Presidents of Marketing, and Product Managers want to make sure customers are not impacted negatively when the site is very successful. As developers, we need to make the app fast and efficient. We need to make the customer happy which makes our boss happy. If we run load tests that produce results that have no correlation to what happens in production, then our tests are failures. The test metrics are useless. No value is gained by the process. We have wasted our time.
It is important to simulate realistic behavior by virtual users and in the appropriate proportions because the performance of your system will be affected by not only the increase in load, but also in the types of processing needed to deliver the responses. Each page can have significant differences in resources needed to satisfy the user’s request. So if you run a load test that hits your home page 100,000 times per hour, you probably won’t answer many performance questions about your e-commerce application. The home page might have some images and Flash videos, but I suspect it won’t make any complex queries against your database. Realistic load testing needs to trigger the interaction between your various layers of architecture in order to find the bottlenecks that will decrease performance.
Get Real – Where to Invest Your Time
Some of the most significant aspects of getting realistic load testing scenarios are:
- Actions taken
- Volume allocation
- Think time
To “get real” in your performance testing, it’s a good idea to run load tests that are as similar as possible to the real world traffic your site experiences (or will experience).
Actions Taken and Types of Users
A test scenario should simulate how a user type moves through the web application. For example, a common type of user on my e-commerce site is an anonymous shopper perusing the product catalog. Another type of user is a buyer that is going through the shopping cart experience. Those two types of user represent two different test scenarios. We need to script the actions of each separately.
If you have an existing application, then it’s relatively easy to figure out what users are normally doing. Analyze your server logs to see what transactions are being requested. I like to sort the file by request and run a simple frequency distribution to see how many times each page is being requested. This makes it obvious where the top 5 traffic zones are. But it doesn’t tell me much about the entrance and exit from those zones. I recommend that you analyze the patterns of how people move around on your site.
If you have a new application without production log files, it is a good idea to capture the traffic patterns and frequencies through getting alpha/beta testers to try your site without any guidance from you. See where they go, what they do, etc. Then make your best guess based on these new users. There are services for usability testing that will record new user actions so that you can see how your load test should replicate realistic user activity.
Volume Allocation and Proper Ratios of Traffic
Back to our e-commerce site…. Unfortunately, we normally have about a 20:1 ratio of lookers to buyers. If a credit card purchase requires much more processing in my web application, it is logical to think that I want to test that thoroughly. Yes, indeed.
However, I want to be careful in how I control the load such that buyers don’t outnumber lookers. That wouldn’t be realistic. It would put too much burden on the system relative to the number of virtual users in the test execution.
I may have other types of users too. For instance, system administrators that are actively managing my site for product price changes. There may be customers that are posting comments about their favorite products in our catalog. Our marketing manager may be blogging on our site about the upcoming 50% sale for Black Friday. All of those can be realistic user types, and all of them are not equal for our load testing.
My suggestion is to pay close attention to how you allocate volume between scenarios. To carry our example a little farther, I would setup my load test to have 70% of traffic representing the lookers clicking through our products. Perhaps I would configure 15% of virtual users to be putting items in their shopping cart. 5% of users are posting comments about their customer service experience or how they enjoyed the products. 4% are my employees adding new winter boots or hats to our catalog, lowering prices on summer bathing suits, and removing pet rocks from products table. Finally, 3% are actually checking out by entering their credit card information and clicking that wonderful BUY NOW button.
By allocating the volume to each type of user, you will get a much more realistic picture of the performance of your application under load because the expensive queries will be in the correct proportion with the inexpensive cached static images.
Additionally related to volume, I can do the math and figure out that my company closes a product sale for about every 75 site visitors. Additionally, we can calculate the average number of products in each sale. So when I build a test scenario for a buyer, I’ll have them browsing through the catalog and put the right number of products in their cart and then checkout. Again, this probably makes a performance impact because of the way database queries are some of the slowest interactions in our system.
Think Time and Pauses Between User Actions
I have customers frequently ask me why LoadStorm has a minimum pause time. They want the load test to click from step to step in less than a second. My reply is that it is unrealistic to have actions of users like that.
People have to think about what they see, read it, make a decision, then click. Each person will respond to the delivered page in a slightly different time frame. Think time is variable because real users have a large deviation in their ability to process information. Some are visual learners, some are analytical, some are contemplative, some are driving personalities, some are impatient, some are slow readers, and some are indecisive.
So when you are load testing, it will be beneficial to have a randomized pause between steps in your scenario. This will realistically represent the length of time from when a person receives a response (page) from your server to the time that person requests a new page.
I used to believe that interpolation for performance metrics was acceptable. However, now I have found that I was wrong. Especially when it comes to think time. Please don’t fall into the trap of thinking that a web application will respond linearly with additional load regardless of the think time. For example, just because your app has a two second average response time for 1,000 concurrent users with a 15 second think time does NOT mean that you will get the same response for 3,000 users with a 45 second think time. The algebra may suggest the equation is valid, but it rarely holds up in real testing results. There are simply too many variables in the performance algorithm to make the jump based on pauses between steps.
Specifically, it is common for the virtual users in your load test that have longer think times to consume more resources than expected. The way you have coded your application will greatly affect the resources, and thus requests per page, attributable to each virtual user. Your pages may keep requesting data from the server while the user is simply reading. Not all requests require a user action. Also, if the user is aggressively clicking as soon as they spot an interesting link on the display, chances are that several resources won’t have been requested from the server by that time – resulting in less requests per page for aggressive clickers.
I recommend letting the tool generate a random think time for your scenarios. Most of the books I’ve read and performance engineers I’ve talked to will generally put a range on the pauses between 10 to 120 seconds. My suggestion is to review some of your web server logs to get a rough idea of how much time people take on each of your important pages, then put about 20-30 seconds on each side of that. For instance, if I find that people spend an average of 40 seconds on my site pages, I will set a random think time between 20 seconds and 60 seconds.
Summary – Realistic Test Scenarios Get More Accurate Results
Some web developers are happy to run a load test with one scenario. We have a few customers that do it. Apparently they just want to see how many times their home page can be hit per minute before their web server can’t respond. In some situations with certain system requirements, that may be just fine. But most web developers want test results that will predict what their web application will do when 1,000 concurrent users are actually utilizing the site in a normal way.
By planning well and gathering good information about current user types, we can make the requests as close to the real world traffic patterns as needed to get useful metrics. By applying load in the appropriate proportions, we can be assured that the correct number of database queries are triggered. By intelligently pausing the virtual user actions between requests, we can accurately generate requests in intervals such that our measurements of application layer interactions with storage and web server will reflect true CPU and memory consumption.
To disregard any of these key elements of constructing a realistic test scenario is to potentially skew the resulting metrics by as much as 10,000%. It is simply bad load testing! Please don’t ignore these. If you do, you are probably giving some manager false hope about the upcoming marketing campaign – which could jeopardize your job. In the very least, it jeopardizes the reputation of your web development skills.
As with any software development techniques, there are places where “too much of a good thing” can be a detriment. Sometimes performance engineers will tell you to invest many hours in arcane details of load test creation in order to create the most realistic scenarios possible. I’m a strong believer in the Pareto principle – 20% of the effort to set up a good load test will reap 80% of the benefit. I’ve seen professional testers spend 80% of their time fussing over the minutia that generated about 20% of the value attributable to load testing. Be careful on where you invest your time. In part 2 of this article, we cover four items that I deem insignificant which probably will not bring ROI to your load testing.