• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中国核心期刊(遴选)数据库
张乾毅, 韦华健, 赫轶男, 李华兵. 基于晶格Boltzmann方法的CUDA加速优化[J]. 桂林电子科技大学学报, 2022, 42(3): 240-244.
引用本文: 张乾毅, 韦华健, 赫轶男, 李华兵. 基于晶格Boltzmann方法的CUDA加速优化[J]. 桂林电子科技大学学报, 2022, 42(3): 240-244.
ZHANG Qianyi, WEI Huajian, HE Yinan, LI Huabing. CUDA accelerated optimization based on lattice Boltzmann method[J]. Journal of Guilin University of Electronic Technology, 2022, 42(3): 240-244.
Citation: ZHANG Qianyi, WEI Huajian, HE Yinan, LI Huabing. CUDA accelerated optimization based on lattice Boltzmann method[J]. Journal of Guilin University of Electronic Technology, 2022, 42(3): 240-244.

基于晶格Boltzmann方法的CUDA加速优化

CUDA accelerated optimization based on lattice Boltzmann method

  • 摘要: 为提高流体的计算效率并保证结果的准确性,利用CUDA编程平台和GPU强大的浮点计算能力,实现了基于晶格玻尔兹曼方法的泊松流模拟计算加速。设计了线性寻址和下标寻址2种不同寻址方式,将这2种寻址方式分别应用到晶格玻尔兹曼程序的格点碰撞、迁徙流动、宏观量计算等步骤中,并探讨2种寻址方式对程序计算效率带来的影响。同时在程序中使用统一内存管理,通过这样的方式开辟内存的变量可在主机端和设备端同时使用,简化了代码复杂度,同时降低了频繁为变量开辟内存带来的消耗。使用Intel® Xeon® E-52620 v4 CPU,Nvidia Quadro GP100 GPU进行计算,在线性寻址方法和下标寻址方法中分别获得了71倍和25倍CPU串行代码的加速比。

     

    Abstract: In order to improve the efficiency of fluid calculations and ensure the accuracy of the results, the CUDA programming platform and the powerful floating-point computing capabilities of the GPU are used to accelerate the Poisseuille flow simulation calculation based on the lattice Boltzmann method.Two different addressing methods, linear addressing and subscript addressing are designed, these two addressing methods are respectively applied to the lattice point collision, migration flow, and macroscopic calculation of the lattice Boltzmann program, then discuss the influence of two addressing methods on the calculation efficiency of the program. At the same time, unified memory management is used in the program, and the variables opened up in this way can be used on the host side and the device side at the same time, which simplifies the code complexity and reduces the consumption of frequently opening up memory for variables.Using Intel(R) Xeon(R) E-52620 v4 CPU and Nvidia Quadro GP100GPU for calculations, the linear addressing method and the subscript addressing method have obtained 71 times and 25 times the speedup ratio of CPU serial code respectively.

     

/

返回文章
返回