会议论文

【摘要】

The speedup of element-by-element FEM algorithms depends not only on peak processor performance but also on access time to shared mesh data. Eliminating memory boundness would significantly speed up unstructured mesh computations on hybrid multi-core architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that only those elements are processed in parallel which do not have common nodes. Therefore, memory conflicts are minimized. FEM assembly is performed with respect to the ordering, which defines how to compose vectors. Mesh can be partitioned into disjoint subdomains by using different layer-by-layer schemes. In this work, we evaluated several partitioning schemes (block, odd, even, and their modifications) on multi-core platforms, using Gunther's Universal Law of Computational Scalability. We performed numerical experiments with element-by-element matrix-vector multiplication on unstructured meshes on multi-core processors accelerated by MIC and GPU. With ordering, we achieved 5-times speedup on CPU, 40-times speedup on MIC, and 200- times speedup on GPU.

【预览】

附件列表
Files	Size	Format	View
Scalability of parallel finite element algorithms on multi-core platforms	1112KB	PDF	download

11th International Conference on "Mesh methods for boundary-value problems and applications"
Scalability of parallel finite element algorithms on multi-core platforms

Kopysov, S.P.^1 ; Novikov, A.K.^1 ; Nedozhogin, N.S.^1 ; Rychkov, V.N.^1
Institute of Mechanics, Ural Branch of the Russian Academy of Sciences, 34 T. Baramzinoy, Izhevsk
426067, Russia^1
关键词: Computational scalability; Finite element algorithms; Multi-core platforms; Multi-core processor; Multicore architectures; Numerical experiments; Processor performance; Unstructured meshes;
Others : https://iopscience.iop.org/article/10.1088/1757-899X/158/1/012055/pdf DOI : 10.1088/1757-899X/158/1/012055

来源: IOP
PDF


	文献评价指标
	下载次数：15次	浏览次数：37次

【 摘 要 】

【 预 览 】

【摘要】

【预览】