What do Load Testing Metrics tell us about Performance?

The load testing metrics described here are key performance indicators for your web application or web site. Response metrics show the performance measurement from a user perspective while volume metrics show the traffic generated by the load testing tool against the target web application.

Response Metrics

Average Response Time
Peak Response Time
Error Rate

Volume Measurements

Concurrent Users
Requests per Second
Throughput

In LoadStorm, the load testing metrics are plotted in one-minute intervals. All of the load generating servers feed data back to the LoadStorm reporting engine. Calculations are applied to the raw data from every request and response, which results in objective metrics that are useful to determine the effectiveness of your target web application to handle the load. To see an interactive example of LoadStorm’s analysis view, click here.

Average Response Time

When you measure the start of every request and the end of every response to those requests, you will have data for the round trip of what is sent from a browser and how long it takes the target web application to deliver what was needed. For example, one request will be a web page…let’s say the home page of the web site. The load testing system will simulate the user’s browser in sending a request for the “home.html” resource. On the target’s side, the request is received by the web server, it makes further requests of the application to dynamically build the page, and when the full HTML document is compiled, the web server returns that document along with a response header.

The Average Response Time takes into consideration every round trip request/response cycle up until that point in time of the load test and calculates the mathematical mean of all response times for that interval. The resulting metric is a reflection of the speed of the web application being tested – the BEST indicator of how the target site is performing from the users’ perspective. The Average Response Time includes the delivery of HTML, images, CSS, XML, Javascript files, and any other resource being used. Thus, the average will be significantly affected by any slow components. Also geographic locations can have small impact on response times if the end user is thousands of miles away from the target web server.

Response times can be measured as either:

Time to First Byte
Time to Last Byte

Some people like to know when the first byte of the response is received by the load generator (simulated browser). This shows how long the request took to get there and how long the server took to start replying. However, that is only part of the real equation. It seems to be much more valuable to know the entire cycle of response that encompasses the duration of download for the resource. Meaning, why would I want to know only part of the response time? What is most important is what the user experiences, and that includes the delivery of the full payload from the server. A user wants to see the HTML page – which requires receipt of the full document. So the Time to Last Byte would be preferred as a Key Performance Indicator (KPI) over Time to First Byte.

Peak Response Time

Similar to the previous metric, Peak Response Time is measuring the round trip of a request/response cycle. However, the peak will tell us what the LONGEST cycle was at this one minute interval of the test. For example, if we are looking at a graph that is showing 5 minutes into the load test that the Peak Response Time is 12 seconds, then we now know one of the requests took that long. The average may still be less than one second because many of the other resources had speedy response times.

The Peak Response Time shows us that at least one of our resources are potentially problematic. It can reflect an anomaly in the web application where a specific request was mishandled by the target system. For example, this could be an “expensive” database query involved in fulfilling a certain request such as a search results page that makes it take much longer, and this metric is great to expose those issues.

Typically images and stylesheets are not the slowest (although they can be when a mistake is made like using a BMP file). In a web application, the process of dynamically building the HTML document from application logic and database queries is usually the most time intensive part of the system. It is less common, yet occurs more often with open source apps, to have very slow Javascript files because of their enormous size. Large files can produce slow responses that will show up in Peak Response Time, so be careful when using big images or calling big JS libraries. Many times, you really only need less than 20% of the Javascript inside those libraries. Lazy coders won’t take the trouble to clean out the other 80%, and that will hurt their system performance.

Error Rate

It is to be expected that some errors may occur when processing requests, especially under load. Most of the time you will see errors begin to be reported when the load has reached a point that exceeds the web application’s ability to deliver what is necessary.

The Error Rate is the mathematical calculation that produces a percentage of problem requests compared to all requests. The percentage reflects how many responses are HTTP status codes indicating an error on the server, as well as any request that times out before receiving or completing its response.

The web server will return an HTTP Status Code in the response header. Normal codes are usually 200 (OK) or something in the 3xx range indicating a redirect on the server. A common error code is 500, which means the web server knows it has a problem with fulfilling that request. That of course doesn’t tell you what caused the problem, but at least you know that the server is aware that there is a technical issue in the system somewhere.

It is much trickier to measure something you never receive, so an error code can be reported by the load testing tool for a condition not indicated by the server. Specifically, the tool must wait for some period of time before it quits “listening” for a response. The tool must determine when it will “give up” on a request and declare a timeout condition. Response timeouts will usually not receive a status code from a web server, so the load testing tool must assign a custom error message “Request Read Timeout” to indicate the timeout.

Other errors can be hard to describe because they do not occur at the HTTP level. A good example is when the web server refuses a connection at the TCP network layer. There is no way to receive an HTTP Status Code for this, thus the load testing tool must assign an error message “Request Connection Timeout” to display for reporting this condition back to you in the load testing results.

Error Rate is a significant metric because it measures “performance failure” in the application. It tells you how many failed requests are occurring at a particular point in time of your load test. The value of this metric is most evident when you can easily see the percentage of problems increase significantly as the higher load produces more errors. In many load tests, this climb in Error Rate will be drastic. This rapid rise in errors tells you where the target system is stressed beyond its ability to deliver adequate performance.

No one can define the tolerance for Error Rate in your web application. Some testers consider less than 1% Error Rate successful if the test is delivering greater than 95% of the maximum expected traffic. However, other testers consider any errors to be a big problem and work to eliminate them. It is not uncommon to have a few errors in web applications – especially when you are dealing with thousands of concurrent users.

Concurrent Users

Concurrent users is the most common way to express the load being applied during a test. This metric is measuring how many virtual users are active at any particular point in time. It does not equate to RPS because one user can generate a high number of requests, and each VUser will not constantly be generating requests.

A virtual user does what a “real” user does as specified by the script that you have created in the load testing tool. If there are 1,000 VUsers, then there are 1,000 scripts executing at that particular time. Many of those 1,000 VUsers are making requests at the same time, but there are many VUsers that are not because of “think time”. Simply put, think time is the pause after each page that simulates what happens with a real user as he or she reads the page received before clicking again.

Requests per Second

RPS is the measurement of how many requests are being sent to the target server. It includes requests for HTML pages, CSS stylesheets, XML documents, JavaScript libraries, images, Flash/multimedia files, and any other requested resource.

RPS will be affected by how many resources are called from the site’s pages. Some sites can have between 50 to 100 images per page, and as long as these images are small in size (e.g. <25KB), the RPS will be higher than long text pages with few images that are dynamically generated from database queries. The reason for this is that images and other static resources are served by the web server or a Content Delivery Network, and there is virtually no expensive processing that must take place before that resource is sent to the browser (i.e. LoadStorm).

Throughput

Throughput is measured in units of Kilobytes Per Second, and it is the measurement of bandwidth consumed during the test. It shows how much data is flowing back and forth from your servers.

Throughput will often vary from test to test relative to the concurrent users, but there can be other reasons for these shifts. If your throughput is very high this could indicate that your site is successfully transferring lots of response data, but this could also be a signal that your site has several resources such as images that could be compressed to save bandwidth. Very low throughput could indicate that your site was failing to respond before requests were timed out, or that many of your resources were treated as cached if the continue to appear on each page which is good.

Other Thoughts on Load Testing Metrics

On SOA Testing blog, they list the most important load testing metrics in their context as:

Response time: It’s the most important parameter to reflect the quality of a Web Service. Response time is the total time it takes after the client sends a request till it gets a response. This includes the time the message remains in transit on the network, which can’t be measured exclusively by any load-testing tool. So we’re restricted to testing Web Services deployed on a local machine. The result will be a graph measuring the average response time against the number of virtual users.
Number of transactions passed/failed: This parameter simply shows the total number of transactions passed or failed.
Throughput: It’s measured in bytes and represents the amount of data that the virtual users receive from the server at any given second. We can compare this graph to the response-time graph to see how the throughput affects transaction performance.
Load size: The number of concurrent virtual users trying to access the Web Service at any particular instance in an interval of time.
CPU utilization: The amount of CPU time used by the Web Service while processing the request.
Memory utilization: The amount of memory used by the Web Service while processing the request.
Wait Time (Average Latency): The time it takes from when a request is sent until the first byte is received.