From Threads, Fall 2009 issue

Around the world, Akamai Technologies operates 50,000 servers in over 1,500 locations, handling twenty percent of today’s Web traffic. So it’s no wonder that as Vice President of Research and Development at Akamai, Bruce Maggs became interested in studying large distributed systems and massive data sets.
“The key to innovation in Internet infrastructure is the proper engineering of large distributed systems. Working at Akamai, I started to think this would be an interesting area in which to do research,” says Maggs. As a new faculty member in the Department (see his profile on page two), Maggs arrives on campus with several research projects that address large distributed systems within various areas of concern, from business energy costs to database bottlenecks to denial of capability attacks.
In much of his research, Maggs uses an innovative strategy to analyze data: He collects massive data sets on networking behavior from Akamai’s content delivery network (CDN) and combines them with other data sets to gain new perspective in a particular area. For example, in a paper presented this summer at the ACM SIgCOMM 2009 conference, Maggs and colleagues at Akamai and MIT analyzed the energy costs of existing distributed systems. Massive Internet-scale and cloud computing systems today utilize hundreds of thousands of servers, which can require megawatts of electricity, enough to power thousands of homes. The cost of such energy can range from $3.7 million per year (eBay) to $38 million per year (google), the researchers estimate. By comparing historical electricity prices with network traffic data from the Akamai CDN, the team found that companies might save millions in electricity costs each year by routing client requests to locations where energy is cheapest, as energy prices regularly fluctuate from area to area and hour to hour. The process could be done using routing systems that most large distributed systems already have in place.
Another ongoing project in Maggs’ repertoire is a study of improving bottlenecks at central database servers for websites that use dynamic content— images and text generated in real-time and customized for users. As more and more websites add dynamic content, from welcome screens to advertisements, it is becoming harder to scale the technology. “We need a database scalability service,” says Maggs. He and colleagues at Carnegie Mellon and google are working to do just that by building layers of caches that store dynamic content locally, instead of at distant, backlogged databases. “But it’s tricky,” Maggs warns, as researchers must figure out how to maintain local cache consistency as the database is updated. Last summer, at the 34th International Conference on Very Large Databases (VLDB), the group introduced Ferdinand, a first- of-its-kind cooperative proxy architecture for dynamic content delivery that uses distributed database query result caching and scalable consistency maintenance.
Maggs also has a strong interest in database security and has another project in the works that follows on the heels of previous research from Professor Xiaowei Yang on network capabilities, a solution to denial of service attacks, during which a malicious host can flood a server with unwanted traffic. Capabilities are like golden tickets to a website, establishing priority connections for “good” traffic. But attackers have learned to prevent any capabilities from being sent in “denial of capability” attacks. As a new way to prevent such attacks, Maggs has developed a partial solution. “It would require all users to provide evidence that they are legitimate by performing some kind of computational work, like solving a puzzle,” says Maggs. Even if attackers can figure out how to solve some type of short computation, they will have to invest some of their resources into doing so, effectively slowing the attack. Together with Yang and Professor Landon Cox, Maggs plans to continue to explore network security issues.