学位论文详细信息
Fault-tolerant cluster of networking elements
Communications.;Computer Networks;Distributed Protocols;Distributed Systems;Fault tolerance;Group Membership;High Availability;Load Balancing;Multistage Interconnection Networks
Fan, Chenggong Charles ; Bruck, Jehoshua
University:California Institute of Technology
Department:Engineering and Applied Science
关键词: Communications.;    Computer Networks;    Distributed Protocols;    Distributed Systems;    Fault tolerance;    Group Membership;    High Availability;    Load Balancing;    Multistage Interconnection Networks;   
Others  :  https://thesis.library.caltech.edu/3121/1/Thesis.pdf
美国|英语
来源: Caltech THESIS
PDF
【 摘 要 】

The explosive growth of the Internet demands higher reliability and performance than what the current networking infrastructure can provide.This dissertation explores novel architectures and protocols that provide a methodology for grouping together multiple networking elements, such as routers, gateways, and switches, to create a more reliable and performant distributed networking system.Clustering of networking elements is a novel concept that requires the invention of distributed computing protocols that facilitate efficient and robust support of networking protocols.We introduce the Raincore protocol architecture that achieves these goals by bridging the fields of computer networks and distributed systems.In designing Raincore, we paid special attention to the unique requirements from the networking environment.First, networking clusters need to scale up the networking throughput in addition to the scaling up of computing power. Second, task switching between the different services supported by a networking element has a major negative impact on performance.Third, fast fail-over time is critical for maintaining network connections in the event of failures.We discuss in depth the design of Raincore Group Communication Manager that addresses the forgoing requirements and provides group membership management and reliable multicast transport.It is based on a novel token-ring protocol.We prove that this protocol is formally correct, namely, it satisfies the set of formal specifications that defines the Group Membership problem. The creation of Raincore has already made a substantial impact both at Caltech and the academic community as well as in the industry.The first application isSNOW, a scalable web server cluster that is part of RAIN, a collaborative project between Caltech and JPL/NASA.The second application is RainWall, a commercial solution created by Rainfinity, a Caltech spin-off company, that provides the first fault-tolerant and scalable firewall cluster.These applications exhibit the fast fail-over response, low overhead, and near-linear scalability of the Raincore protocols.In addition, we studied fault-tolerant networking architectures.In particular, we considered efficient constructions of extra-stage fault-tolerant Multistage Interconnection Networks.Multistage Interconnection Networks provide a way to construct a larger switching network using smaller switching elements.We discovered an optimal family of constructions, in the sense that it requires the least number of extra components to tolerate multiple switching element failures.We prove that this is the only family of constructions that has this optimal fault-tolerance property.

【 预 览 】
附件列表
Files Size Format View
Fault-tolerant cluster of networking elements 1610KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:15次