Cloud applications have burgeoned over the last few years, but they are typically written for loosely-coupled clusters such as datacenters.In this thesis we investigate how one can runcloudapplicationsintightly-coupledclustersandnetworktopologies,namelysuper-computers.Specifically,we look at a class of distributed machine learning systems called distributed graph processing systems, and run them on NCSA Blue Waters.Partitioning the graph is key to achieving performance in distributed graph processing systems.We present new topology-aware partitioning techniques that better exploit the structure of the network topologies in supercomputers.Compared to existing work, our new Restricted Oblivious and GridCentroidpartitioningapproachesproduce25-33%improvementinmakespan,along withasizablereductioninnetworktraffic. Wealsodiscussoptimizationssuchassmart networkbuffersthatfurtheramplifythe improvement. Tohelpoperatorsselectthebest graph partitioning technique, we culminate our experimental results into a decision tree.
【 预 览 】
附件列表
Files
Size
Format
View
Topology-aware distributed graph processing for tightly-coupled clusters