| 24th IUPAP Conference on Computational Physics | |
| Exploiting parallelism in many-core architectures: Lattice Boltzmann models as a test case | |
| 物理学;计算机科学 | |
| Mantovani, F.^1 ; Pivanti, M.^2 ; Schifano, S.F.^3 ; Tripiccione, R.^4 | |
| Department of Physics, Univesität Regensburg, Germany^1 | |
| Department of Physics, Università di Roma la Sapienza, Italy^2 | |
| Department of Mathematics and Informatics, Università di Ferrara and INFN, Italy^3 | |
| Department of Physics and CMCS, Università di Ferrara and INFN, Italy^4 | |
| 关键词: Computational kernels; Large scale simulations; Lattice Boltzmann algorithms; Lattice boltzmann models; Many-core architecture; Many-core processors; Micro architectures; Rayleigh-Taylor instabilities; | |
| Others : https://iopscience.iop.org/article/10.1088/1742-6596/454/1/012015/pdf DOI : 10.1088/1742-6596/454/1/012015 |
|
| 学科分类:计算机科学(综合) | |
| 来源: IOP | |
PDF
|
|
【 摘 要 】
In this paper we address the problem of identifying and exploiting techniques that optimize the performance of large scale scientific codes on many-core processors. We consider as a test-bed a state-of-the-art Lattice Boltzmann (LB) model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equations of state of a perfect gas. The regular structure of Lattice Boltzmann algorithms makes it relatively easy to identify a large degree of available parallelism; the challenge is that of mapping this parallelism onto processors whose architecture is becoming more and more complex, both in terms of an increasing number of independent cores and-within each core-of vector instructions on longer and longer data words. We take as an example the Intel Sandy Bridge micro-architecture, that supports AVX instructions operating on 256-bit vectors; we address the problem of efficiently implementing the key computational kernels of LB codes-streaming and collision-on this family of processors; we introduce several successive optimization steps and quantitatively assess the impact of each of them on performance. Our final result is a production-ready code already in use for large scale simulations of the Rayleigh-Taylor instability. We analyze both raw performance and scaling figures, and compare with GPU-based implementations of similar codes.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| Exploiting parallelism in many-core architectures: Lattice Boltzmann models as a test case | 2135KB |
PDF