Today I read an article by Omer Brandis on IT Toolbox about SAP stress testing versus performance testing. Maybe his perspective differs from mine because he is not focused on web applications. I just haven’t seen other technologists use the definitions and goals of the two types of testing like Omer uses them.
It has been helpful to me in understanding our software testing industry to read others’ blogs and articles. Comparing, contrasting, and merging the thoughts of testing leaders is beneficial to derive better communication with my peers through improved usage of terms.
My observations and opinions that follow are not meant to be argumentative nor arrogant; rather they express where I agree or disagree with the items Omer shared with us today. My hope is that it brings illumination on the performance engineering industry for some of our LoadStorm users.
Stress Testing
Omer believes stress and performance testing are “two very different things”, while I see stress as a type (or subset) of perf testing.
Stress testing according to Omer:
- “the goal of stress testing is usually to evaluate the hardware and or the system’s configuration”
- “these tests are intended to find hardware and/or software and/or configuration related bottlenecks”
- “should be conducted on the hardware you wish to test, be it your actual production servers or an exact replica of them”
- “stress testing is usually conducted before a major rollout of a new release, or before a major hardware change”
- “stress testing usually requires using some sort of automation tool that can simulate concurrent executions of various programs”
- “during the testing you monitor the utilization of the various hardware/software components…and the system’s throughput and average response times are measured in order to see if the test was successful or not”
- “personally, I find that conducting several tests of increasing intensity ( x, 2x, 10x, 50x, 100x….) makes it easier to locate the bottlenecks (conducting one massive test on full capacity can only generate a yes/no answer to the question – ‘can these servers handle the expected load’?)”
Amen Brother!
First, I absolutely wholeheartedly agree with the last point. Stress testing should be iterative because you will find different trouble spots in your application or infrastructure depending on the load at a given point in time.
I also agree that stress testing requires a tool. There is no other way to simulate 100s or 10,000s of concurrent users working their way through your system.
Additionally, it seems quite accurate in point #6 that one should monitor response times and throughput during a stress test.
Yessir! I agree about testing on production servers (point #2). There is truly no other way to know what the system will do under load and/or stress. Creating an “exact replica” is expensive and very difficult to achieve, but if you can, that is a great alternative to hitting production at 1:00 am Sunday morning.
Hold On a Minute
My concise definition of stress testing is, “Break the system.” More details on my understanding of stress testing.
So, I’m not sure I fully agree with Omer on some aspects such as his stated goal. Hardware and configuration are definitely important to test, but I want to find my inefficient database queries too. In my experience, most of the problems from stressing a system are caused by bad coding. Hardware is rarely the biggest offender for crashing software under load.
Point #4 is a good idea, yet it seems to be overly optimistic. By that I mean “major releases” aren’t the only time a stress test is appropriate. I’ve seen relatively small coding changes crater performance of the whole system. My recommendation is to run a stress test after every build. Test early and test often is my motto. Most people don’t because of the cost associated with tools and hardware needed. Newer cloud load testing tools make iterative test runs affordable.
In his point about “average response times are measured in order to see if the test was successful or not”, I don’t like the idea implied that good throughput and response equate to success. Since I define stress testing by “breaking the system”, my tests can be quite successful when response times are very bad – that’s exactly what I’m hoping for. I want to see errors skyrocket, servers timeout, and throughput dwindle. That’s when the results of a stress test are useful to me. That’s how I spot the breaking point.
Performance Testing
Performance testing excerpts from Omer’s article:
- “the goal of standard application performance testing is to test a program for performance related errors like
inefficiencies in the application’s algorithm or sql statements that can be made to run faster (consume less resources)” - “you should be able to perform perf-testing on ‘any’ hardware – as long as it meets other relevant requirements, like having all the data that will be processed by the program in the future (in the production system)”
- “application perf-testing should usually be conducted after any major change in the application itself”
- “app perf testing is a lot simpler and cheaper to perform, there is [usually] no need to execute the program in parallel”
- “usually all you need to do is execute the program with indicative input – a few tests with the most common input,
a few tests with the input that represents the worst case” - “during these tests you should use the relevant tracing tools to ‘look into’ what the program is doing, and make sure it all makes sense. It has been my experience that many people concentrate on the programs elapsed time, I have written before about the downside of this method”
Me Too!
I agree with point #3. If you use a good automated performance testing tool, then running a test after application modifications should be relatively easy and inexpensive. Agile development methodology has proven that testing early and often helps uncover code defects sooner and lower costs of software. Don’t you want to find performance problems immediately after some coder injected a bad query that will grind your database server to a halt?!
Point #6 is true as long as “tracing” and “look into” mean watching server side metrics like CPU utilization, memory usage, or similar system dynamics. It does appear to be crossing the line into performance engineering though. I’m fine with that.
Elapsed time of program execution probably is not useful unless you are testing performance of a COBOL batch program running is a static partition. So, I agree with Omer.
I Can’t Get There From Here
Maybe I’m being picky, but I don’t exactly see the goal of performance testing to find errors. I conduct perf tests to measure indicators like response, throughput, concurrent users, and requests per second in the same way I would with stress testing.
I see performance testing a symbiotic part of engineering that can’t live without application tuning. Test and tune; test and tune. That’s how I see web development. Get the speed necessary to make the app useful to people.
I do understand how performance can be measured for a single user. For example, calculating the page render speed on a Firefox browser. However, I cannot completely agree with point #2. Using ‘any’ hardware will not give you relevant results regarding the system’s ability to deliver what is needed.
If I run my system on an old server I’ve got sitting in my closet at the office, I will certainly get performance metrics to satisfy my curiosity. It seems too big of a mathematical interpolation to make accurate correlation to how my system will run on the large Amazon EC2 instance. Sorry, it just does not seem possible to get useful measurements that way. That’s why I strongly recommend using your production environment for perf engineering (including testing).
Point #5 strikes me as being a matter of specificity that will produce misleading results. For example, if I use a few test scenarios to match a common case and a worst case, then I probably won’t be getting the normal mix of user traffic.
Whenever I run performance tests I am looking to get as close as possible to the actual activity my application experiences. This reality is baked into the types of users represented by my scenarios. If 80% of the traffic on my site is anonymous, then I want that to be accurately represented in the way I create my test. Otherwise, the test will not be triggering the same conditions relative to efficiency in my system’s processing (database connections, thread pools, paging, virtual machine memory management, etc.).
Sum It Up
Omer is obviously smart and has software engineering skills. I may not agree with everything, but he his article has been helpful in refining my thinking. Hopefully you have also stretched your mind relative to software testing in the past 5 minutes while reading this blog post.
His insight into the engineering aspects of performance are worth repeating here:
“it should be clear that not all programs “were created equal”, they can’t all be winners in the 100 meter dash (they can’t all have subsecond response times), but sometimes you may be able to help them with a small change of the sql code, or add an index, sometimes you may be forced to perform a massive rewrite, and sometimes you may decide to run them during the weekend so that they don’t have an adverse effect on system.”
Wise words. You never know the root cause of poorly performing applications until you dig in and hunt for them. Performance testing is the way you find out if a re-write is necessary or if a single SQL statement is the culprit.
Performance lets you measure the speed. Stress is trying to break the system by pushing it beyond the limits in some way.