• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中国核心期刊(遴选)数据库
韦华健, 张乾毅, 张静静, 等. 基于面向对象对CUDA架构的LBM计算程序优化[J]. 桂林电子科技大学学报, 2024, 44(6): 579-584. DOI: 10.16725/j.1673-808X.2021452
引用本文: 韦华健, 张乾毅, 张静静, 等. 基于面向对象对CUDA架构的LBM计算程序优化[J]. 桂林电子科技大学学报, 2024, 44(6): 579-584. DOI: 10.16725/j.1673-808X.2021452
WEI Huajian, ZHANG Qianyi, ZHANG Jingjing, et al. Optimization of LBM computing program based on object-oriented CUDA architecture[J]. Journal of Guilin University of Electronic Technology, 2024, 44(6): 579-584. DOI: 10.16725/j.1673-808X.2021452
Citation: WEI Huajian, ZHANG Qianyi, ZHANG Jingjing, et al. Optimization of LBM computing program based on object-oriented CUDA architecture[J]. Journal of Guilin University of Electronic Technology, 2024, 44(6): 579-584. DOI: 10.16725/j.1673-808X.2021452

基于面向对象对CUDA架构的LBM计算程序优化

Optimization of LBM computing program based on object-oriented CUDA architecture

  • 摘要: 晶格玻尔兹曼方法(LBM)是一种新颖而有前途的计算流体力学方法,从算法的角度看,其迭代过程能被分化为多个子问题的并行程序,非常适合在高性能图像处理器(GPU)计算,获得极快的数据处理速度,同时有大量工作报告了基于GPU计算的LBM方法得到了高效实现。程序环境以C++编程语言,运用面向对象思想优化CUDA程序结构,可减少程序的耦合性,赋予程序的可持续发展能力;使用Poiseuille flow模型验证优化程序的稳定性与准确性。在程序运行过程中,调用CUDA内核函数来处理模型内的碰撞、迁徙流动、计算宏观量的迭代过程,同时使用共享内存储存GPU运行时的数据,以提高计算效率。数据分析结果表明,计算速度较中央处理器(CPU)提升了70倍,这归功于GPU高性能的并行计算能力。

     

    Abstract: Lattice Boltzmann method (LBM) is a novel and promising computational fluid dynamics method, which has natural advantages. From the perspective of algorithm, the iterative process can be divided into parallel programs with multiple subproblems. In order to obtain extremely fast data processing speed, the iterative process is computed by high-performance graphics processing unit(GPU). At the same time, the efficient implementation of GPU-based LBM method has been widely reported, so it is very suitable for high performance image processor (GPU) calculation to obtain extremely fast data processing speed. The program environment is C++ as the programming language, the CUDA program structure is optimized by object-oriented thinking, the coupling of the program is reduced, and the sustainable development of the program is endowed. Poiseuille flow model is used to verify the stability and accuracy of the optimization program. During the program running, CUDA kernel functions are called to deal with the collision within the model, migration flow and iterative process of calculating macro quantities. Meanwhile, shared memory is used to store GPU runtime data to improve computing efficiency. Analysis of the data show that computing speeds are up to 70 times faster than those of the central processing unit (CPU), thanks to the GPU's high-performance parallel computing capabilities.

     

/

返回文章
返回