High-fidelity nuclear power plant core simulations require solving the Boltzmann transport equation. In discrete ordinates methods, the most computationally demanding operation of this equation is the sweep operation. Considering the evolution of computer architectures, we propose in this paper, as a first step toward heterogeneous distributed architectures, a hybrid parallel implementation of the sweep operation on top of the generic task-based runtime system: PaRSEC. Such an implementation targets three nested levels of parallelism: message passing, multi-threading, and vectorization.
A theoretical performance model was designed to validate the approach and help the tuning of the multiple parameters involved in such an approach. The proposed parallel implementation of the Sweep achieves a sustained performance of 6.1 Tflop/s, corresponding to 33.9% of the peak performance of the targeted supercomputer. This implementation compares favourably with state-of-art solvers such as PartiSN, and it can therefore serve as a building block for a massively parallel version of the neutron transport solver DOMINO developed at EDF.