Here are a few good insights about software performance from Robert Read in his eBook entitled, How to be a Programmer: A Short, Comprehensive, and Personal Summary. Robert dedicated the book to his colleagues at Hire.com.
My favorite parts are listed here as excerpts and included below in the original context:
- Bottlenecks in performance canl be an example of counting cows by counting legs and dividing by four instead of counting heads.
- The purpose of stress testing is to figure out where the wall is, and then figure out how to move the wall further out.
- If the wall is too close to satisfy your needs, figure out which resource is the bottleneck (there is usually a dominant one.) Is it memory, processor, I/O, network bandwidth, or data contention?
- Performance is a part of usability.
- Most software can be made (with relatively little effort) 10 to 100 times faster than they are at the time they are first released.
- If a well-isolated algorithm that uses a slightly fancy algorithm can decrease hardware cost or increase performance by a factor of two across an entire system, then it would be criminal not to consider it.
- A plan for stress testing should be developed early in the project, because it often helps to clarify exactly what is expected. Is two seconds for a web page request a miserable failure or a smashing success? Is 500 concurrent users enough?
- I’ve made errors such as failing to provide a relational database system with a proper index on a column I look up a lot, which probably made it at least 20 times slower.
- Other examples include doing unnecessary I/O in inner loops, leaving in debugging statements that are no longer needed, and unnecessary memory allocation.
- Stress testing is fun.
- Who has particular knowledge about a component also constantly changes and can have an order of magnitude effect on performance.
- Finding the expensive I/O and the expensive 10% of the code is a good first step
- There is not much sense in optimizing a function that accounts for only 1% of the computation time.
- Each change brings a test burden with it, so it is much better to have a few big changes.
How to Stress Test
Stress testing is fun. At first it appears that the purpose of stress testing is to find out if the system works under a load. In reality, it is common that the system does work under a load but fails to work in some way when the load is heavy enough. I call this hitting the wall or bonking[1]. There may be some exceptions, but there is almost always a ‘wall’. The purpose of stress testing is to figure out where the wall is, and then figure out how to move the wall further out.
A plan for stress testing should be developed early in the project, because it often helps to clarify exactly what is expected. Is two seconds for a web page request a miserable failure or a smashing success? Is 500 concurrent users enough? That, of course, depends, but one must know the answer when designing the system that answers the request. The stress test needs to model reality well enough to be useful. It isn’t really possible to simulate 500 erratic and unpredictable humans using a system concurrently very easily, but one can at least create 500 simulations and try to model some part of what they might do.
In stress testing, start out with a light load and load the system along some dimension—such as input rate or input size—until you hit the wall. If the wall is too close to satisfy your needs, figure out which resource is the bottleneck (there is usually a dominant one.) Is it memory, processor, I/O, network bandwidth, or data contention? Then figure out how you can move the wall. Note that moving the wall, that is, increasing the maximum load the system can handle, might not help or might actually hurt the performance of a lightly loaded system. Usually performance under heavy load is more important than performance under a light load.
You may have to get visibility into several different dimensions to build up a mental model of it; no single technique is sufficient. For instance, logging often gives a good idea of the wall-clock time between two events in the system, but unless carefully constructed, doesn’t give visibility into memory utilization or even data structure size. Similarly, in a modern system, a number of computers and many software systems may be cooperating. Particularly when you are hitting the wall (that is, the performance is non-linear in the size of the input) these other software systems may be a bottleneck. Visibility into these systems, even if only measuring the processor load on all participating machines, can be very helpful.
Knowing where the wall is is essential not only to moving the wall, but also to providing predictability so that the business can be managed effectively.
How to Understand Performance Problems
Learning to understand the performance of a running system is unavoidable for the same reason that learning debugging is. Even if the code you understand perfectly precisely the cost of the code you write, your code will make calls into other software systems that you have little control over or visibility into. However, in practice performance problems are a little different and a little easier than debugging in general.
Suppose that you or your customers consider a system or a subsystem to be too slow. Before you try to make it faster, you must build a mental model of why it is slow. To do this you can use a profiling tool or a good log to figure out where the time or other resources are really being spent. There is a famous dictum that 90% of the time will be spent in 10% of the code. I would add to that the importance of input/output expense (I/O) to performance issues. Often most of the time is spent in I/O in one way or another. Finding the expensive I/O and the expensive 10% of the code is a good first step to building your mental model.
There are many dimensions to the performance of a computer system, and many resources consumed. The first resource to measure is wall–clock time, the total time that passes for the computation. Logging wall-clock time is particularly valuable because it can inform about unpredictable circumstance that arise in situations where other profiling is impractical. However, this may not always represent the whole picture. Sometimes something that takes a little longer but doesn’t burn up so many processor seconds will be much better in computing environment you actually have to deal with. Similarly, memory, network bandwidth, database or other server accesses may, in the end, be far more expensive than processor seconds.
Contention for shared resources that are synchronized can cause deadlock and starvation. Deadlock is the inability to proceed because of improper synchronization or resource demands. Starvation is the failure to schedule a component properly. If it can be at all anticipated, it is best to have a way of measuring this contention from the start of your project. Even if this contention does not occur, it is very helpful to be able to assert that with confidence.
How to Fix Performance Problems
Most software projects can be made with relatively little effort 10 to 100 times faster than they are at the they are first released. Under time-to-market pressure, it is both wise and effective to choose a solution that gets the job done simply and quickly, but less efficiently than some other solution. However, performance is a part of usability, and often it must eventually be considered more carefully.
The key to improving the performance of a very complicated system is to analyze it well enough to find the bottlenecks, or places where most of the resources are consumed. There is not much sense in optimizing a function that accounts for only 1% of the computation time. As a rule of thumb you should think carefully before doing anything unless you think it is going to make the system or a significant part of it at least twice as fast. There is usually a way to do this. Consider the test and quality assurance effort that your change will require. Each change brings a test burden with it, so it is much better to have a few big changes.
After you’ve made a two-fold improvement in something, you need to at least rethink and perhaps reanalyze to discover the next-most-expensive bottleneck in the system, and attack that to get another two-fold improvement.
Often, the bottlenecks in performance will be an example of counting cows by counting legs and dividing by four, instead of counting heads. For example, I’ve made errors such as failing to provide a relational database system with a proper index on a column I look up a lot, which probably made it at least 20 times slower. Other examples include doing unnecessary I/O in inner loops, leaving in debugging statements that are no longer needed, unnecessary memory allocation, and, in particular, inexpert use of libraries and other subsystems that are often poorly documented with respect to performance. This kind of improvement is sometimes called low-hanging fruit, meaning that it can be easily picked to provide some benefit.
What do you do when you start to run out of low-hanging fruit? Well, you can reach higher, or chop the tree down. You can continue making small improvements or you can seriously redesign a system or a subsystem. (This is a great opportunity to use your skills as a good programmer, not only in the new design but also in convincing your boss that this is a good idea.) However, before you argue for the redesign of a subsystem, you should ask yourself whether or not your proposal will make it five to ten time better.
How to Know When to Apply Fancy Computer Science
There is a body of knowledge about algorithms, data structures, mathematics, and other gee-whiz stuff that most programmers know about but rarely use. In practice, this wonderful stuff is too complicated and generally unnecessary. There is no point in improving an algorithm when most of your time is spent making inefficient database calls, for instance. An unfortunate amount of programming consists of getting systems to talk to each other and using very simple data structures to build a nice user interface.
When is high technology the appropriate technology? When should you crack a book to get something other than a run-of-the-mill algorithm? It is sometimes useful to do this but it should be evaluated carefully.
The three most important considerations for the potential computer science technique are:
- Is it well encapsulated so that the risk to other systems is low and the overall increase in complexity and maintenance cost is low?
- Is the benefit startling (for example, a factor of two in a mature system or a factor of ten in a new system)?
- Will you be able to test and evaluate it effectively?
If a well-isolated algorithm that uses a slightly fancy algorithm can decrease hardware cost or increase performance by a factor of two across an entire system, then it would be criminal not to consider it. One of the keys to arguing for such an approach is to show that the risk is really quite low, since the proposed technology has probably been well studied, the only issue is the risk of integration. Here a programmer’s experience and judgment can truly synergize with the fancy technology to make integration easy.