The wide application of omics research has produced a burst of biological data in recent years, which has in turn increased the need to infer biological networks from data. Learning biological networks from experimental data can help detect and analyze aberrant signaling pathways, which can be used in diagnosis of diseases at an early stage. Most networks can be modeled as Bayesian Networks (BNs). However, because of its combinatorial nature, computational learning of dependent relationships underlying complex networks is NP-complete. To reduce the complexity, researchers have proposed to use Markov chain Monte Carlo (MCMC) methods to sample the solution space.
MCMC methods guarantee convergence and traversability. However, MCMC is not scalable for networks with more than 40 nodes because of the computational complexity. In this work, we optimize an MCMC-based learning algorithm and implement it on a general-purpose graphics processing unit (GPGPU). We achieve a 2:46 speedup by optimizing the algorithm and an additional 58-fold acceleration by implementing it on a GPU. In total, we speed up the algorithm by 143. As a result, we can apply thissystem to networks with up to 125 nodes, a size that is of interest to many biologists. Furthermore, we add artificial interventions to the scores in order to incorporate prior knowledge of interactions into the Bayesian inference, which increases the accuracy of results. Our system provides biologists with a more computational efficient tool at a lower cost than previous works.