Krylov solvers are key kernels in many large-scale science and engineering applications for solving sparse linear systems. Extreme-scale systems have many factors that increase communication costs and cause performance variation across cores that can reduce performance at scale. Many Krylov solvers require frequent blocking allreduce collective operations that can limit performance at scale due to the increasing cost of this collective as the node count increases and the cost of synchronizing all processes.This thesis investigates non-blocking Krylov solver variations designed to reduce communication costs by overlapping communication and computation using non-blocking allreduces. These variations can allow us to hide most of the allreduce cost and avoiding synchronizing all processes to produce better performance at scale. This work builds on gaps in the literature to help us gain a more thorough understanding of the performance and robustness of these solvers and how we can use them to efficiently solve linear systems at scale in practice.A variety of blocking and non-blocking Krylov solvers are analyzed in detail with multiple different preconditioners on multiple leadership-class supercomputers. Performance analysis tools and performance models are developed to provide deeper insight into the performance barriers encountered by these algorithms and show how they relate to observed performance. These tools guide us to a variety of optimizations to further improve solver performance. The Nek5000 and Quda applications are used to analyze the effectiveness of these solvers in practice. Both applications are designed to perform well at scale, however they need further improvements to reach their desired performance. The resulting tools and analysis provide us with a better understanding of how to improve performance at scale that can benefit a wider range of applications.
【 预 览 】
附件列表
Files
Size
Format
View
Scalable non-blocking Krylov solvers for extreme-scale computing