Queueing Theory and The Science of Waiting
The more you learn in the field of web development and web application load (and technology in general), the more that the scenarios in ‘Num3ers’ seem possible. ‘Rubbish!’ I thought when I first saw the show. ‘Maths can’t predict that! There are too many variables! There’s human error to account for! You’d have to be omniscient to calculate that!’. Well, queueing theory is making me start to think about whether there is a literal way to eat my words. Web developers use it to do things like calculate the probability that a given amount of traffic will arrive at a website at any given time, and to estimate delays in each stage of the queue, from accepting a request, to waiting to be served, to the time taken to actually be served. We look at what queueing theory entails.
Queueing theory encompasses the mathematical models and theories that allow us to predict how long a packet of data, or a human, will take to move through a queue. It is usually undertaken as an operations research function in business, no matter whether the operation is technological, an ordinary people-based service like a supermarket or bank, or physical item-based like a post office.
Types of delay in queueing theory
There are several types of delay that can be studied in queueing theory. Looking at each of these allows us to build a picture of total delay dependant on the number of customers.
Processing delay: The first delay that a person, item of mail, or packet of data encounters in a queue is processing delay. This is the time between receipt of the item, and placement in the queue. With people, it might represent the time taken to walk from the door to the queueing area. For an item of mail it might represent the time a letter waits in the letterbox before being delivered. For a packet of data it is relatively short, and would be the time between a mouse click or keystroke, and the request being queued in the processor. In computing, it is dependant on CPU speed and CPU load.
Queueing delay: The second delay encountered is often the longest, and is the wait in the queue itself. In computing, it depends on the load on the server.
Transmission delay: This third delay refers to the time taken for the host machine to transmit the first piece of data, to the last piece of data. It depends on server speed.
Propagation Delay: This refers to the time taken from the final point of the packet transmission to the reception of that last piece on the destination end. This depends on the hardware characteristics of the server or communications link itself.
Retransmission delay: Occurs when the packet is lost, and must be transmitted again. The actual time taken here depends on the protocol that the communications link/server uses for retransmissions, as well as the error rate.
Testing for the average case – using the Poisson distribution
Queueing theory explains various ways to predict load on a server. One system for predicting the load is to use an average case scenario. This information is useful as a starting point, even though best practices point us consistently to the need to test for both best- and worst-case scenarios as a matter of necessity, dependant on the risk that an outage would engender. Poisson distribution analysis is a great tool, and simple to calculate, as long as users recognize its limitations.
Poisson distribution makes four assumptions:
1. The time period being studied can be divided into many smaller sub-periods – like dividing an hour into seconds.
2. The probability of an occurrence is random, and therefore relatively constant throughout the tested period
3. The probability of having two or more occurrences in a sub-period is negligible
4. Occurrences are independent of one another.
Your can see from assumptions 2 and 4 that the Poisson distribution is inapplicable in many real life and web application situations. For example, the Slashdot effect has given us numerous examples of traffic items not being independent of each other, but all joined through that common link. The link can also be non-internet related, like the load on a disaster hotline during a natural disaster, or on Internal Revenue sites during tax time. However, the Poisson distribution gives you an idea of how to engineer for average load outside these extraordinary times. You will obtain an accurate picture of how likely it is that you’ll get a worst-case scenario on an average day, but there are plenty of non-average factors to account for that each business will need to analyse independently. The equation for the Poisson distribution is as follows:
F (k; lambda) = lambdak . e-x / k!
Where lambda is the mean number of occurrences in a given interval, k is the number of occurrences in an interval that you want to test for, and e is the base of the natural logarithm, or 2.71828182845904523… to whatever accuracy you require.
M/M/1 Queueing Theory; Finding Best and Worst Case Scenarios
When we talk of a best-case scenario, we mean how long a server will take to respond if there are no other requests in the queue. The worst-case scenario refers to the point (number of customers) at which the response time grows so close to infinity as to become unacceptable.
Testing the response time when there is just one request is easy. Hop on there with your ‘stopwatch’ going, through whichever load testing tool you generally use. Testing for your worst case scenario requires a little more mathematics.
You’ll first need to determine the percentage of time that the server is busy, using the formula: = * s
Where is the percentage of time the server is busy, is the average number of requests per second, and s is the time taken to process a request.
Extrapolate this using the following formula to find the maximum number of requests that can be processed per second:
max = 1/s
You will also need to determine:
- the average number of requests in the server (waiting and being processed) (denoted by q)
- the average time a request spends in the server (response time as perceived by the user) (denoted by tq)
- average number of requests waiting in the server (denoted by w)
To obtain those values for your system, you can solve the following formulae: q = / (1 – ) tq = s/(1 – )
w = 2 / (1 – )
When the fraction of the time the server is busy approaches 1, tq, the total response time perceived by the user (incorporating every type of delay mentioned at the beginning of the article), approaches infinity quite sharply.
When a server becomes busy, there is the obvious problem of RAM to deal with as well. Simple queueing theory usually makes the assumption that the waiting queue has an infinite size. So, bank customers could spill out onto the street and all around the world if they wanted to be served badly enough. Your post office could theoretically rent more space to hold all of the incoming mail that it stores before it is delivered. And you could theoretically add more and more RAM to handle all of the requests waiting in the system.
However in practice, memory is not infinite, bank customers don’t queue past the door, and it would take longer to rent more storage space than it would to just stop accepting new parcels for the postage service. So, your absolute worst-case scenario in queueing theory for computing is determined by your RAM. This is the point at which you won’t be able to store any more requests in the server. The formulae above relate to M/M/1 queueing theory, where there is only 1 server (whether that be a machine or a human), the number of arrivals is average and the service time follows an exponential curve.
The Human Factor
The other important factor in the equation is less predictable, and refers to the fact that as your response times grows substantially from the norm, people may assume that there has been a very short-term glitch in the system and will resubmit their request. When thousands of people do this at once, it is almost sure to cause a crash.
These tools should help you determine both your average wait time and your best and worst case times, under queueing theory models. And while they do help the believability for the guys on ‘Num3ers’, they also prove their foibles!