"Big Data at Large" Workshop Held by Pankaj Agarwal and Shivnath Babu

September 15, 2012

Professor Mauro Maggioni (Math and Computer Science) gives a talk at the "Big Data at Large" Workshop

Big data is a big deal — and so was this past summer’s "Big Data at Large" Workshop.

“This seminar offered an exceptional opportunity for interaction,” said Shivnath Babu, Duke associate professor and one of three organizers for the workshop sponsored by the Army Research Office. The workshop — held June 14-15 at the Millennium Hotel near campus — didn’t focus on just one discipline but instead drew about 45 participants from a broad spectrum of disciplines within academia, research labs, industry and government agencies.

“One of the motivations for organizing this is the realization that every field has a problem dealing with large data sets,” Babu said. “The data might be different, but sometimes the techniques can actually apply broadly.”

That sharing of ideas across fields was a major goal of the workshop and the reason Pankaj Agarwal — Duke professor of computer science and mathematics, whose research interests focus on studying algorithmic challenges in dealing with big data sets — enlisted help in organizing the workshop.

“Each group is looking at a somewhat different piece of the puzzle,” Agarwal said. “I don’t think we even understand the whole puzzle yet. Right now it’s the wild west. People know this is something that is important. As we hear, we are in the information age. The problem is that we don’t have actionable information; we have a lot of data. And how does one generate and synthesize knowlege from this data? That is the challenge.”

In addition to help from Babu, who is focusing on improving the architecture of data-intensive systems, Agarwal had help in organizing the workshop from Muthu Muthukrishnan, a Rutgers computer science professor who works closely with industry and on the applications for big data.

The three organized a workshop that not only looked at the current state of the big data problem, through presentations from participants, but that also looked forward to where energy and resources should be focused, through smaller breakout sessions each day. The four breakout sessions focused on scientific opportunities of big data; the human/big data interaction; systems infrastructure for big data; and when algorithms can declare success. The groups worked on concretely defining problems that would have a significant impact if solved, how the problems could be approached and who could contribute. Collaborations were formed, which both Agarwal and Babu pointed to as an important outcome of the workshop.

“I learned a lot about what interesting things are happening in machine learning,” Babu noted as an example. “I felt that my research could actually benefit their area, and I can take some ideas from them and use them in my own work. That wouldn’t have happened without a venue like this.”

Agarwal noted that while there is no consensus yet on what the biggest challenges are, it is clear from the workshop that big data is a fertile area for deep research and many more such multidisciplinary workshops are needed.

“There’s a lot of need right now — both in scientific disciplines and economic domains — to understand, and extract useful knowledge from, these large sets that are being generated at an unprecedented rate,” he said, adding that this election year is the first in which data mining is truly playing a big role.

“What comes out of this research will have a significant impact in every aspect of our society, whether it’s economic development, whether it’s scientific discovery or whether it’s social interactions,” he said. “It is just the beginning of a new era.”