Microsoft Research Peer-to-peer projects

Microsoft Research Cambridge - Peer-to-peer research related to structured overlays

We currently have a number of ongoing projects relating to structured overlays.

Recent publications:

Defending against Eclipse attacks on overlay networks [Postscript | PDF]
Atul Singh, Miguel Castro, Peter Druschel and Antony Rowstron
SIGOPS European Workshop, Leuven, Belgium, Sept. 2004

Overlay networks are widely used to deploy functionality at edge nodes without changing network routers. Each node in an overlay network maintains pointers to a set of neighbor nodes. These pointers are used both to maintain the overlay and to implement application functionality, for example, to locate content stored by overlay nodes. If an attacker controls a large fraction of the neighbors of correct nodes, it can ``eclipse'' correct nodes and prevent correct overlay operation. This Eclipse attack is more general than the Sybil attack. Attackers can use a Sybil attack to launch an Eclipse attack
by inventing a large number of seemingly distinct overlay nodes. However, defenses against Sybil attacks do not prevent Eclipse attacks
because attackers may manipulate the overlay maintenance algorithm to mount an Eclipse attack. This paper discusses the impact of the Eclipse attack on several types of overlay and it proposes a novel defense that prevents the attack by bounding the degree of overlay nodes. Our defense can be applied to any overlay and it enables secure implementations of overlay optimizations that choose neighbors according to metrics like proximity. We present preliminary results that demonstrate the importance of defending against the Eclipse attack and show that our defense is effective.

Peer-to-peer overlays: structured, unstructured, or both? [Postscript | PDF]
Miguel Castro, Manuel Costa and Antony Rowstron
MSR-TR-2004-73 (July 2004)
Extended version of (Should we build Gnutella on a structured overlay? HotNets II Paper).

We compare structured and unstructured overlays and derive a hybrid overlay that can outperform both. Unstructured overlays build a random graph and use flooding or random walks on that graph to discover data stored by overlay nodes. Structured overlays assign keys to data items and build a graph that maps each key to the node that stores the corresponding data. Unstructured overlays are widely used in popular applications because they can perform complex queries more efficiently than structured overlays. It is also commonly believed that structured graphs are more expensive to maintain than unstructured graphs and that the constraints imposed by the structure make it harder to exploit heterogeneity to improve scalability. This is not a fundamental problem. We describe techniques that exploit structure to achieve low maintenance overhead, and we present a modified proximity neighbor selection algorithm that can exploit heterogeneity effectively. We performed detailed comparisons of structured and unstructured graphs using simulations driven by real-world traces. Inspired by these results, we developed a hybrid system that uses the graph from structured overlays with the data placement and search strategies of unstructured overlays. The results show that our hybrid system supports complex queries more efficiently than unstructured overlays in realistic scenarios.

Performance and Dependability of structured peer-to-peer overlays [Postscript | PDF]
Miguel Castro, Manuel Costa and Antony Rowstron
DSN-2004, Florence, Italy, (June 2004) (Also available as MSR-TR-2003-94 (December 2003) [Postscript | PDF])

Structured peer-to-peer overlay networks provide a useful substrate for building distributed applications. They map object keys to overlay nodes and offer a primitive to send a message to the node responsible for a key. They can implement, for example, distributed harsh tables and multicast trees. However, there are concerns about the performance and dependability of these overlays in realistic environments. Several studies have shown that current peer-to-peer environments have high churn rates: nodes join and leave the overlay continuously. This paper presents techniques that continuously detect faults and repair the overlay to achieve high dependability and good performance in realistic environments. The techniques are evaluated using large-scale network simulation experiments with fault injection guided by real traces of node arrivals and departures. The results show that previous concerns are unfounded; our techniques can achieve dependable routing in realistic environments with an average delay stretch below two and maintenance overhead of less than half a message per second per node.

PIC: Practical Internet Coordinates for Distance Estimation [Postscript | PDF]
Manuel Costa, Miguel Castro, Antony Rowstron, and Peter Key
ICDCS 2004 (March 2004)

This paper introduces PIC, a practical coordinate-based mechanism to estimate Internet network distance (i.e., round-trip delay or network hops). Network distance estimation is important in many applications, for example, network-aware overlay construction and server selection. There are several proposals for distance estimation in the Internet but they all suffer from problems that limit their benefit. Most rely on a small set of infrastructure nodes that are a single point of failure and limit scalability. Others use sets of peers to compute coordinates but these coordinates can be arbitrarily wrong if one of these peers is malicious. While it may be reasonable to secure a small set of infrastructure nodes, it is unreasonable to secure all peers. PIC addresses these problems: it does not rely on infrastructure nodes and it can compute accurate coordinates even when some peers are malicious. We present PIC's design, experimental evaluation, and an application to network-aware overlay construction and maintenance.

Should we build Gnutella on a structured overlay? [Postscript | PDF]
Miguel Castro, Manuel Costa and Antony Rowstron
HotNets-II (November 2003) (MSR-TR-2004-73 (July 2003) is an extended version of this workshop paper.)

There has been much interest in both unstructured and structured overlays recently. Unstructured overlays, like Gnutella, build a random graph and use flooding or random walks on the graph to discover data stored by overlay nodes. Structured overlays assign keys to data items and build a graph that maps each key to a specific node. The structure of the graph enables efficient discovery of data items given their keys but it does not support complex queries.

Should we build Gnutella on a structured overlay? We believe the answer is yes. We replaced the random graph in Gnutella by a structured overlay while retaining the content placement and discovery mechanisms of unstructured overlays to support complex queries. Our preliminary results indicate that we can use structure to improve the performance of floods and random walks. They also indicate that structure can be used to reduce maintenance overhead, which is surprising because it is commonly believed that unstructured overlays have lower maintenance overhead than structured overlays.

Links to other projects:

Pastry

SplitStream

Scribe

Squirrel

PAST

Software distribution:

We are able to licence the C# source code for our simulator, stand-alone run-time, Pastry (MS Pastry), SplitStream, Squirrel and Scribe implementations to academic institutions. If you are a bona fide academic institution and would like to licence to code base then please contact us directly.

Summer 2005 internships:

We will be seeking internship applications for 2005 in December/January 2005.