期刊论文

【摘要】

We describe an implementation of conjugate gradient method on heterogeneous platforms (multiple nodes with GPU accelerators) to be used in OpenFOAM. Several optimizations are described. For conjugate gradient itself, we suggest using device memory for scalars used only on the GPU and pinned memory for scalars used in MPI reductions. For preconditioning, we choose AINV as a suitable preconditioner for GPUs and describe ways to make it more efficient, such as storing in it single precision, laying out factors in upper-left triangular form and computing it on the CPU asynchronously. We describe how multi-GPU computing can be supported together with arbitrary boundary conditions by copying only boundary coefficients from the accelerator to host memory and then using existing OpenFOAM methods on the CPU. To improve overlap of computations and communications, we suggest using a pipelined variant of the conjugate gradient method and describe GPU-specific adjustments. In experimental evaluation, we obtain a 1.75x speedup in the linear solver by using a Tesla K20X accelerator in addition to a 10-core Xeon CPU, but only for sufficiently large problem sizes: below 1 million cells per accelerator the efficiency of GPU computations dimishes.

【授权许可】

Unknown

Труды Института системного программирования РАН
Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform

V. A. Platonov¹ A. V. Monakov¹
[1] ИСП РАН;
关键词: метод сопряженных градиентов; ainv предобуславливание; openfoam; gpu; mpi;
DOI : 10.15514/ISPRAS-2014-26(3)-4
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：0次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】