11th International Conference on "Mesh methods for boundary-value problems and applications" | |
Scalability of parallel finite element algorithms on multi-core platforms | |
Kopysov, S.P.^1 ; Novikov, A.K.^1 ; Nedozhogin, N.S.^1 ; Rychkov, V.N.^1 | |
Institute of Mechanics, Ural Branch of the Russian Academy of Sciences, 34 T. Baramzinoy, Izhevsk | |
426067, Russia^1 | |
关键词: Computational scalability; Finite element algorithms; Multi-core platforms; Multi-core processor; Multicore architectures; Numerical experiments; Processor performance; Unstructured meshes; | |
Others : https://iopscience.iop.org/article/10.1088/1757-899X/158/1/012055/pdf DOI : 10.1088/1757-899X/158/1/012055 |
|
来源: IOP | |
【 摘 要 】
The speedup of element-by-element FEM algorithms depends not only on peak processor performance but also on access time to shared mesh data. Eliminating memory boundness would significantly speed up unstructured mesh computations on hybrid multi-core architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that only those elements are processed in parallel which do not have common nodes. Therefore, memory conflicts are minimized. FEM assembly is performed with respect to the ordering, which defines how to compose vectors. Mesh can be partitioned into disjoint subdomains by using different layer-by-layer schemes. In this work, we evaluated several partitioning schemes (block, odd, even, and their modifications) on multi-core platforms, using Gunther's Universal Law of Computational Scalability. We performed numerical experiments with element-by-element matrix-vector multiplication on unstructured meshes on multi-core processors accelerated by MIC and GPU. With ordering, we achieved 5-times speedup on CPU, 40-times speedup on MIC, and 200- times speedup on GPU.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Scalability of parallel finite element algorithms on multi-core platforms | 1112KB | download |