Project
CamCube
Why do we build data center
clusters the way we build them? We started this project in 2008 wondering if
there was a better hardware platform with a co-designed software stack that
would make it easier/more preformat to write the sort of applications that run
in data centers. We wanted to try and address the many issues, which we largely
saw as being a function of the hard separation between the network and the
processing elements of a clusters, applications had to treat the network as a
black box and then work around the issues.
We took ideas from distributed
systems, networking and High Performance Computing (HPC) to create the CamCube
platform, and the software stack that we now call CamCubeOS.
We have demonstrated the potential performance gains for many applications,
including a MapReduce-like framework (CamDoop). The CamCube platform used a 3D
Torus topology (frequently used in the HPC community), for several reasons,
including the fact that the physical topology matched closely the virtual
topology of the CAN structured overlay, which then allowed us to borrow ideas
used with structured overlays, to make designing distributed services easier.
We had to make a number of
compromises in CamCube, as we needed to build an experimental system and did
not want to design custom hardware or ASICs. However, in the last couple of
years we have seem many exciting advancements in Industry in the CamCube
direction, in particular the development and deployment of in-rack fabrics that
closely integrate the network and CPUs using direct connect networks,
effectively embedded the switch across all the servers. The AMD SeaMicro platform that is
targeted at commodity data centers and uses a 3D Torus topology, and platforms
like the Boston
Viridis which uses Calxeda’s EnergyCore
processors that support several topologies.
As we being to see these systems
deployed there is still the important question of how visible to the
applications running on these racks will be the underlying network/fabric
topology? At the moment there is a trend to try and hide the topology, and make
it look and feel like a traditional ToR Ethernet
switch based network. One of the many lessons from Project CamCube is that
there is value in not treating the network as a black box.
Contributors:
Paolo Costa, Ant Rowstron, Austin Donnelly,
Greg O'Shea,
H. Abu-Libdeh, Thomas Zahn, Simon Schubert.
Conference papers:
Paolo Costa, Austin Donnelly, Ant
Rowstron, Greg O'Shea. "CamCubeOS: A
key-based Network Stack for 3D Torus Cluster Topologies". Proceedings
HPDC, June, 2013 [ pdf
]
Paolo Costa, Austin Donnelly, Ant
Rowstron, Greg O'Shea. "CamDoop: Exploiting In-network Aggregation for
Big Data Applications". Proceedings NSDI, April, 2012 [ pdf
]
H. Abu-Libdeh, P. Costa, A.
Rowstron, G. O'Shea and A. Donnelly. "Symbiotic routing in future data
centers". Proceedings ACM Sigcomm, Aug 2010.
[ pdf
]
Workshop paper describing early
ideas:
P. Costa, T. Zhan, A. Rowstron, G.
O'Shea and S. Schubert. "Why should
we integrate services, servers, and networking in a Data Center?" Proceedings of WREN, August 2009. [ pdf ]
For corporate policy: Contact Us Terms of Use Trademarks Privacy Statement ©2010
Microsoft Corporation. All rights reserved.