(Pre-scriptum: This is a developing story. This post was originally titled "Two days without Skype: the price of free". I changed it into "Two days without S: Blame Microsoft" after adding the correction below: if Windows Update indeed triggered the outage, then the problem had nothing to do with the fact that Skype is free; then Skype clarified that "the update patches were not the cause of the disruption", so I changed the headline again in "Blame Microsoft, or maybe not".)
Quick summary: For about two days, Thursday to Saturday last week, Skype's free or cheap voice-over-Internet services haven't been available to the 200+ million people who have signed up to use them. The tech explanation, as given on the Skype blog,
is: "The disruption was initiated by a massive restart of our user’s
computers across the globe within a very short timeframe as they
re-booted after receiving a routine set of patches through Windows Update.
The abnormally high number of restarts affected Skype’s network
resources. This caused a flood of log-in requests, which, combined with
the lack of peer-to-peer network resources, prompted a chain reaction
that had a critical impact." In other words, Skype sent out a
software update, which -- apparently because a 4 year old bug in the
client software that had gone undetected so far -- prompted many
computers to restart, temporarily depriving the peer-to-peer system of
substantial network resources, which started the chain reaction. CORRECTION 20 Aug: I got the previous sentence wrong in the original version of this post, because the original version of Skype's post did not include the words "through Windows Update", suggesting that the whole was triggered by an update of Skype's own software. Now, Skype blames Microsoft for the crash. The explanation is not totally convincing, though: earlier, in an interview with the New York Times,
Skype's engineers had said that a bug in all Skype clients since 2003,
which had gone undetected so far, could have started the disruption.
So, the latest (partial) version could be: Microsoft sent out a routine Windows software update (as it does every second Tuesday of the month) which required the
computers to restart, temporarily depriving the Skype peer-to-peer
system of substantial network resources and creating a flood of
re-log-in requests, which triggered the chain reaction, with the Skype bug possibly magnifying the problem -- which would explain why this didn't happen on other WIndows Update Tuesdays. (thanks Amir
for pointing out the mistake) UPDATE 22 Aug: Or it could be different, as a new Skype blog post says: there was a critical weakness in the Skype software hat was just made worse by the Windows update.
The Skype system is now back to normal, when I used it this morning it worked fine. But questions remain.
In August 2004, I interviewed Niklas Zennström, one of the two Skype founders. At the time, there were about 550'000 concurrent Skype users on average. I asked him about scaling to serve a larger crowd:
With this design can you scale up to vast number of users or is there a point where you need to redesign your architecture?
We won't need to invest in infrastructure. What we will need to do at some point is to make some changes in the technology to be able to scale more. If we didn’t do anything, when we reach 10 million concurrent users (20 times more than now) we believe there will be problems. So before that happens we will have to spend some time to make some changes in the architecture. But there is no investment needed in hardware etc. Just in development.
The 10-million mark has been reached. So one can wonder whether the issue with Skype is not larger than what their blog post says (I don't have more info on this, am just wondering, but here is what's written on the Skype blog: "Skype’s peer-to-peer core was not properly tuned to cope with the load and core size changes").
There are a couple other things to consider:
- The whole Internet seems to be bursting at its seams. Bandwidth is abundant, but the growth of transferred data is phenomenal
-- just think of video and music up-/downloads and streaming, and of
VoIP calls and videoconferencing such as Skype's. Some are even
wondering whether the network is reaching saturation and the upcoming
online television services (Joost, BBC's iPlayer, etc) are going to crash the Internet. A report by Cisco, quoted by iTnews, found that American video websites currently transmit more data per month than the entire amount of traffic sent over the internet in 2000.
- Some have argued that the Skype outage is a sign of the unreliability of Internet telephony. But what surprises me is actually that the Skype outage didn't happen sooner
-- that the system has worked so well for five years (the company was
founded in 2002), which is a testament to the genius in the
application. In the meantime, dozens of other VOIP providers have closed down. Traditional telecom services are by no means perfect, either (not to mention cell phones, and Blackberries).
And let's not forget that Skype is a free service (so the inconvenience of two days with no service is actually the price of free.)
Bruno Giussani is a writer, the European Director of the 









The sad thing is that this is not just the price of free. Service on all sectors of telecom has gone down hill.
I pay good money to Wanadoo (former France Telecom) but still find I need to reboot my router several times a month and pay to wait on hold when I have more serious problems.
Not sure what the solution is, but free services on the one hand and the death throes of former telecom monopolies on the other makes for a bad spot to be a consumer!!
Wonder if there are equivalent examples in other industries or situations hit by a paradigm shift in technology?
Posted by: Thomas Crampton | August 20, 2007 at 04:16 PM
In the case of this recent outage at Skype, the costless aspect of the app is irrelevant. Because it has been provoqued by an external factor, which, speaking of costs, is not free (MS Windows).
What is unacceptable here is the contrary : that a mass-market product for the use of which customers are charged a premium (think of the total cost of ownership of your production Windows PC...) causes troubles onto a third-party' s product.
What is remarkable here is the fact that Skype has been able to stay up and running all this time without any major problem for its customers - i mean, for its first 5 years of operation until last week' s bug.
Posted by: Marc Duchesne | August 20, 2007 at 05:34 PM
Skype's issues last week are instructive for the entire industry. On the one hand, Skype has done a remarkable thing in generating a large user base very quickly. There are, however, important concerns about their architecture and approach, and questions have quite rightly been raised about peer to peer networks.
In fact, all Peer-to-Peer models are not created equal. Skype uses a different type of Peer-To-Peer network than most companies, based on SuperNodes. A SuperNode Peer-to-Peer system is one in which you rely on your customers rather than your own servers to handle the majority of your traffic. SuperNodes are just normal computers which get promoted by the Skype software to serve as the traffic cops for their entire network. In theory this is a good idea, but it does have unique vulnerabilities. Skype, as a company, has no physical or programmatic control over the most vital piece of its product when the network destabilizes for any reason.
Another issue with SuperNode models concerns system recovery after a crash. A SuperNode-based network can only recover as fast as new SuperNodes can be identified. Skype's formal post on Monday about the cause of its crash essentially confirmed this point.
Skype's model also creates usage issues. A Skype user who installs Skype on a university or corporate network agrees in the End-User License Agreement to let Skype route calls through his or her PC (and by extension the organizationís network). In many cases this is a violation of the terms of use the student/employee has agreed to with the university or corporate IT dept. It can cause legal and bandwidth issues.
Other companies such as SightSpeed use a standards-based Peer-to-Peer architecture built on SIP (the standard protocol as opposed to Skypeís proprietary protocol) that allows them to manage all the core functionality themselves. Telephony protocols such as SIP (which SightSpeed uses) were designed from the outset to be fault tolerant. Companies such as Microsoft, Cisco, Sprint/Nextel, Verizon, AT&T, Comcast, Time Warner and SightSpeed all ship standards based SIP software and hardware.
Skype's proprietary SuperNode architecture is what is risky. Peer-to-peer CAN be done right.
Aron Rosenberg
CTO SightSpeed
http://www.sightspeed.com
Posted by: Aron Rosenberg | August 22, 2007 at 11:29 PM
@Aron : thanks for the detailed explanation. Now I understand the difference between traditional SIP and Skype's own protocol. Do you think Skype could switch to SIP, easily ? Any guess how much efforts (technical, whatsoever) would it take for them ? I know you're rival, but Sun Tzu said : know you enemy better than yourself ;-)
Posted by: Marc Duchesne | August 24, 2007 at 10:01 AM