THE performance of a Monte Carlo model for the simulation of electromagnetic wave propagation in particle-filled atmospheres has been conducted for different CUDA versions and design approaches. The proposed algorithm exhibits a high degree of parallelism, which allows favorable implementation in a GPU. Practical implementation aspects of the model have been also explained and their impact assessed, such as the use of the different types of memories present in a GPU.
A number of setups have been chosen in order to compare performance for manually optimized vs. UVM (Unified Virtual Memory) implementations for different CUDA versions. Features and relative performance impact of the different options have been discussed, extracting practical hints and rules useful to speed up CUDA programs.