Yesterday a customer called me to ask about how his current traffic levels translate to the load he should use for performance testing. We discussed it in several ways, and he said I had been helpful. This morning he sent me an email with a fantastic suggestion to write a blog post about our discussion.
The focus was to determine a virtual user calculation to use when setting up load tests. Sounded good to me, and here is how he framed the question in the email:
For example, while looking at Google Analytics for a given average day, during a peak hour we had:
- 2000 visitors in 60 minutes
- 10,000 page views
- avg page views 5
- avg time on site 7 minutes
So I wanted to figure out how many users should we feed the LoadStorm system to simulate this traffic as a base line. Does this math look correct for this case?
2000 users in 1 hour (60 minutes), 7 min time on site
60 minutes / 7 min = 8.5
2000 / 8.5 = 235 Users
Establishing an Algorithm
Well, his calculations seem to be logical and have accurate math. I would first say that a test of 235 concurrent users seems like a good baseline based on the Analytics numbers. Each test scenario should have an average duration of 7 minutes to reflect the “avg time on site” metric. That would lead to the conclusion that the total number of users turned over about 9 times during the hour. Put another way, the total 2,000 visitors do not translate to 2,000 concurrent users because each user only stays a few minutes. If they stayed for an hour each, then we need to test for 2,000 concurrently. They don’t stay long; therefore, dividing the 2,000 users by 8.5 (visit duration) would tell you that approximately 235 users were using the site concurrently.
This approach is going to tell us how many users we have on average. Let’s put this into a mathematical formula:
U = V / (60/D)
U is the number of load test virtual users (that’s what we are trying to figure out)
V is the average number of visitors per hour
D is the average duration of a visitor
60 is the number of minutes in an hour
Let’s state this formula again in English like a math word problem:
Load Test Virtual Users is equal to the Average Visitors per Hour divided by the User Turnover Rate per Hour
Average Can Be a Good Baseline for Load Testing
One flaw to this calculation is that is presumes the traffic to be evenly spread across the hour. From my experience, it is more common to see larger spikes and troughs in concurrent users. It could be that of the 2,000 visitors per hour, 900 of them hit the site in the first 10 minutes, followed by 40 minutes of 400 users, and then 700 users in the last 10 minutes. In this example, we would want to load test for 900 concurrent users because that is the “high water mark”. 900 users is the level of load that the target web application needs to support at a minimum.
That to me is a better baseline, but it isn’t always to establish that number from a source like Google Analytics. It would be useful to review the web server log files to get a determination of peak users in a span of time. For instance, in the above example we are looking at the time span to be about 10 minutes (visit duration). I recommend putting your log files into a spreadsheet and running some simple calculations to identify the high water mark for the visit duration. That number will be higher than the average, and it will be a better reflection of what your system needs to handle on a regular basis.
With a good baseline in hand, you now need to set some targets for higher volume. It’s great to know that you usually have bursts of traffic around 900 concurrent users, but a performance test at that level of load doesn’t tell you anything about growth. I recommend that load tests should be conducted in order to confirm volume goals can be reached. We should be seeking knowledge about our system’s performance at peak times…even guessing the future peaks.
Many times these objectives should be set with your executives or marketing managers involved. They know what the company is expecting for revenue and customer growth. The marketing department leaders should know what campaigns are going to drive larger numbers of prospects to the site. Product managers should know what new companies are signing contracts to use your web applications.
So let’s say that your marketing department is going to run some social network marketing campaigns
on Twitter, Facebook, and LinkedIn. They should be able to supply you with an estimate or upper goal they want to drive to the site. Then you can take those numbers and plug them into our formula. It’s likely they will not have expressed their goals in terms of “visit duration”, but you have every right to ask them to come up with a reasonable number.
Now you should have a concurrent user target of something like 5 or 100 times the number of concurrent users in a typical hour. Load test against the target to see if your site can handle it. If not, the good news is that you have time to find bottlenecks and make performance improvements before the campaigns begin. Web performance tuning really should be an ongoing task for your team. It isn’t always in the budget, however if you present some of the revenue impact of performance tuning to your executives, they will quickly see the value of putting testing/tuning into your IT budget.
Putting the Traffic into Realistic User Types
Just having an accurate overall traffic volume number is a big deal. Another step toward excellent load testing is to make sure you put the right volume against each test scenario. A scenario may be thought of as a type of user – anonymous, administrator, buyer, blogger, etc. You will create these scenarios in your test plan to give the load testing realistic system usage. For example, an administrator will need to login with appropriate credentials and take actions that are restricted to only a few users such as moderating a comment post.
Once you have the user types clearly identified, you need to decide what percentage of your traffic is represented by each user type. Administrators are usually a very small group of users in a web application. I would think that less than 5% of traffic would be from admins – maybe less, but it depends on what your app does.
By allocating percentages to each scenario, you get an accurate mix of activity that reflects the real world demands on your system. For example, anonymous users will put much less stress on your system resources because they will be getting many of their requests fulfilled from cache (jpgs, static CSS, etc.), whereas buyers will have much more personalization on their pages and require more database access (slowest part of most systems).