Meltdown under load is hard to forget

I was recently reading an article on Yahoo entitled, “MS: No repeat of Xbox Live holiday meltdown (hopefully)”. What immediately struck me as interesting was the fact that Microsoft is still relying on hope to avoid another black eye with their customers.

Last year’s failure of the Xbox Live system under load has not been forgotten. It cost Microsoft money and customer loyalty. People are still thinking about it…and writing about it. And it certainly is puzzling to me that one of the world’s most dominant application providers continues to live in “Hopeville” when it comes to preventing a recurrence.

The article states:

“Last year, the Xbox Live servers were slammed for nearly a month—starting a few days before Christmas and through New Year’s Day—after untold thousands of new users tried to create Live accounts. The Live Marketplace and online multiplayer games slowed to crawl, and millions of users found themselves locked out of Live altogether. Microsoft ultimately apologized and offered users a free Live Arcade game for their trouble.”

Aaron Greenberg is attributed with saying that his team has been shoring up the Live servers in preparation for holiday rush. Greenberg said. “We will have people standing by … knock on wood we will hopefully not have any issues.”

That type of talk doesn’t make me feel warm and fuzzy. And it would seem that Greenberg has no clear data on whether the issues (i.e. more user volume than the system will handle) are resolved for this holiday season.

And having people “standing by” is not a solution. Were those same people standing by last year for a month? Were they knocking on wood too?

Millions of users being locked out would sure seem like a great reason to invest in load testing. If you were the Product Manager, wouldn’t you want to know for sure the number of concurrent users that can safely engage with the system? As technologists, don’t they have enough pride in their work to figure out a key factor in the usability of their product? As business people, don’t they have a clue regarding the lifetime value of a customer?

I’m not a Microsoft bigot, nor am I a Microsoft hater. But failures such as last year’s month long outage for millions of paying customers definitely caught my attention. And I didn’t forget it this year when I went shopping. I’ll be enjoying my Wii on Christmas day.

My humble recommendation to Greenberg’s quality assurance team is simply: load test Xbox Live with tens of millions of users before you enter another peak period “hoping for the best”. And if over ten million users are expected this year, then simulate one hundred million. The load testing technology available can do this. It is a technological reality. Perhaps they would prefer to let the customers load test the system. Bad plan.

Hey Aaron, can your system weather the storm this year?

Similar Posts