Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Analytics on Graphs with a Trillion Edges

Add to your list(s) Download to your calendar using vCal

Willy Zwaenepoel (EPFL)
Friday 08 May 2015, 11:00-12:00
FW26, Computer Laboratory, William Gates Builiding.

If you have a question about this talk, please contact Eiko Yoneki.

Big graphs occur naturally in many applications, most obviously in social networks, but also in many other areas such as biology and forensics. Current approaches to processing large graphs use either supercomputers or very large clusters. In both cases the entire graph must reside in memory before it can be processed. We are pursuing an alternative approach, processing graphs from secondary storage. While this comes with some performance penalty, it makes analytics on very large graphs feasible on a small number of commodity machines. It also has the pleasing property that “if you can store a graph, you can compute on it”.

We have developed two systems, one for a single machine and one for a cluster of machines. X-Stream, the single-machine solution, aims to make all secondary storage access sequential. It uses two techniques to achieve this goal: edge-centric processing and streaming partitions. X-Stream outperforms the state-of-the-art GraphChi system, because it achieves better sequentiality and because it requires less preprocessing. Slipstream, the cluster solution, starts from the observation that there is little benefit to locality when accessing secondary storage over a high-speed network. As a result, we use lightweight dynamic partitioning, focusing on achieving load balance and sequential access to secondary storage. The resulting system achieves good scaling and outperforms other systems. With Slipstream we have also been able to process a trillion-edge graph, a new milestone for graph size on a small cluster. I will describe both systems and their performance on a number of benchmarks and in comparison to the state-of-the-art alternatives.

This work is joint work with Laurent Bindschaedler, Jasmina Malicevic and Amitabha Roy.

Bio: Willy Zwaenepoel received his BS/MS from the University of Gent, Belgium, and his PhD from Stanford University. He is currently a Professor of Computer Science at EPFL . Before he has held appointments as Professor of Computer Science and Electrical Engineering at Rice University, and as Dean of the School of Computer and Communication Sciences at EPFL . His interests are in operating systems and distributed systems. He is a Fellow of the ACM and the IEEE , he has received the IEEE Kanai Award and several best paper awards, and is a member of the Belgian and European Academies. He has also been involved in a number of startups, including iMimic (now part of Cisco), Midokura and Nutanix.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Analytics on Graphs with a Trillion Edges

This talk is included in these lists:

Other lists

Other talks