24th IUPAP Conference on Computational Physics | |
Exploiting parallelism in many-core architectures: Lattice Boltzmann models as a test case | |
物理学;计算机科学 | |
Mantovani, F.^1 ; Pivanti, M.^2 ; Schifano, S.F.^3 ; Tripiccione, R.^4 | |
Department of Physics, Univesität Regensburg, Germany^1 | |
Department of Physics, Università di Roma la Sapienza, Italy^2 | |
Department of Mathematics and Informatics, Università di Ferrara and INFN, Italy^3 | |
Department of Physics and CMCS, Università di Ferrara and INFN, Italy^4 | |
关键词: Computational kernels; Large scale simulations; Lattice Boltzmann algorithms; Lattice boltzmann models; Many-core architecture; Many-core processors; Micro architectures; Rayleigh-Taylor instabilities; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/454/1/012015/pdf DOI : 10.1088/1742-6596/454/1/012015 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
In this paper we address the problem of identifying and exploiting techniques that optimize the performance of large scale scientific codes on many-core processors. We consider as a test-bed a state-of-the-art Lattice Boltzmann (LB) model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equations of state of a perfect gas. The regular structure of Lattice Boltzmann algorithms makes it relatively easy to identify a large degree of available parallelism; the challenge is that of mapping this parallelism onto processors whose architecture is becoming more and more complex, both in terms of an increasing number of independent cores and-within each core-of vector instructions on longer and longer data words. We take as an example the Intel Sandy Bridge micro-architecture, that supports AVX instructions operating on 256-bit vectors; we address the problem of efficiently implementing the key computational kernels of LB codes-streaming and collision-on this family of processors; we introduce several successive optimization steps and quantitatively assess the impact of each of them on performance. Our final result is a production-ready code already in use for large scale simulations of the Rayleigh-Taylor instability. We analyze both raw performance and scaling figures, and compare with GPU-based implementations of similar codes.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Exploiting parallelism in many-core architectures: Lattice Boltzmann models as a test case | 2135KB | download |