I'm having coffee with Karlheinz Brandenburg and realizing that without this man and his team there would be no iPod, no iTunes, no jogging playlists, no rip-mix-and-burn, no file-sharing, no podcasts, and there would have been neither Napster nor Kazaa, and possibly overnight music sensations such as the Arctic Monkeys and Kamini would still be young unknown wannabees.
Brandenburg (photo right) is the inventor of the MP3 music compression format, which has made possible all of the above and much more. Sure, if MP3 hadn't been developed, other formats would have appeared to serve the same purpose. But history is not made of "ifs".
I ask him to tell me the story in his own words, and it goes back to the 1970s, when his thesis adviser, professor Seitzer in Erlangen, Germany, wanted to explore future use of ISDN lines - Seitzer had applied for a patent on ways to transmit music over the telephone, and the patent examiners had turned him down saying that it could not be done (it was clearly a different era for granting patents). "So Seitzer looked for a PhD student to do some research", and that's how the topic got tossed at Brandenburg. He started to look into different compression options, and it took a couple of years before he got to his first breakthrough, coming up with a novel way to encode and compress signals by psychoacoustics masking: that's the idea that when music is played, louder sounds mask weaker ones, and Brandenburg's approach was to transmit different frequencies at different accuracy levels, "almost taking away what is not audible". The team in Erlangen was not the only one exploring psychoacoustics for compression, but they did the most efficient work. The typical compression rate of MP3 is between 1 to 8 and 1 to 12 times.
Standardization work was ongoing for the video format MPEG. "They were looking for proposals for the audio compression for MPEG, and working together with Thomson and AT&T we submitted ours". "We" being the Fraunhofer Institute, Germany's foremost research institution.The adopted standard ended up comprising three different modes (called layers) and Brandenburg's work went into layer 3 - that's where it got the name: MP3, short for "MPEG audio layer 3".
The standard was finalized in 1992. For a while, MP3 was a good technology and a technical standard. "People thought that it was so complex, that there would be no widespread usage", says Brandenburg. MP3 was used - along Seitzer's original idea - by broadcasters to transfer audio via ISDN rather than via leased phone lines, hence saving money. That was MP3's first real application.
But then, in a couple of years a perfect storm came together that projected MP3 to public sensation. Fraunhofer created a shareware demo. PC's (we're in 1995 or so) became powerful enough to do the decoding so that music could be listened directly on the PC (before that, users needed special hardware). Macromedia took the first license on the technology. A student in Australia, using a stolen credit card number from Taiwan, bought the software from Fraunhofer online, did some reverse engineering, found out that they had used APIs specified by Microsoft, wrote a different user interface to the encoding kernel of the software, and started distributing it as a freeware (tagged "thank you Fraunhofer"). It was illegal, but people got the software, and it spread like wildfire. In 1997, starting from US colleges and universities, students (which had at their disposal the last ingredient of the perfect storm: high-speed Internet connections) started ripping music from CDs and put it on websites. The music majors (represented by the RIAA) didn't like those at all, and through the courts managed to close down many sites.
It was in 1997 that MP3's reputation to the wider public was made, through an article in USA Today (likely inspired by the RIAA) defining it "a technology used to steal music". It's likely that the RIAA hoped for this to become a negative tag on MP3, "but of course everyone went for it, it was fabulous advertising", says Brandenburg. A few months later (and years before the iPod) a Korean company called Saehan built the first flash-based MP3 player, called the "MPman". It came in two versions, with memory sizes of 16 or 32 MB, enough for 15 or 30 minutes of music. Diamond, a US company, followed with the "Rio". The RIAA tried to stop them by court order, and the news of the trial made the front pages, multiplying the advertising effect.
The Napster came along, the first wildly successful file-sharing system. It was designed as an attempt to bypass the music industry's successful legal attacks against sites that were hosting ripped music: it was based on a peer-to-peer scheme, which means that no central repository of music existed that could be closed down. Napster's weakness however turned out to be the fact that it did create a central catalog of the music hosted on the user's ("peers") computers. Other systems that were developed in the following years - the best-known being probably Kazaa - got rid also of the central catalogue.
"We at Fraunhofer always had a very clear stance that intellectual property should be respected", says Brandenburg, telling that back in the late 1990s he did have some meetings with RIAA representatives.
There are today a number of patents on the format, and different companies claim to hold licenses. The Fraunhofer research institute (where Brandenburg still works) took in last year some 100 million euros from its patent.
I ask for his take on Apple's iPod and iTunes: "They did a great job in packaging and marketing things that were already around. The first MP3 player was developed in 1998; Deutsche Telekom had an online service called MOD for music-on-demand that same year; etc. But Apple's timing and bundling - and design of course - was perfect."
An expanded version of MP3 has in the meantime been developed, called MP3 Surround, which takes only a few KB more but offers a "surround sound" experience.
What are Brandenburg and his team currently working on? They still do work on audio compression, of course, and audio search - things like automatically deriving metadata for audio (and video), querying by humming (feeding some melody to the computer and let it recognize the tune), identification of audio (fingerprinting), etc. He says that "the next big topic" is recommendation engines: "we have access to so much multimedia data, that we need new ways to find the right thing". Audio and video search so far has been made basically through text: either through tags or through captioning (collaborative or automatic). Semantic identification is a different story and requires a different set of technologies. "At the end we will be using a combination of the two".
The other area they're researching is audio immersive worlds, things like wavefield synthesis, which is based on the Huygens principle and renders the soundwaves in a room in a way that allows listeners to feel the exact direction and distance of the sound. Possible applications range from themeparks (such as the Bavaria Filmstadt, where people get glasses and sit on moving seats to get a "full immersion experience") to movies to home theatres.