SparkSW: Large-Scale Biological Sequence Alignment

Home » SparkSW: Scalable Distributed Computing System for Large-Scale Biological Sequence Alignment

The Smith-Waterman (SW) algorithm is universally used for a database search owing to its high sensitively. The widespread impact of the algorithm is reflected in over 8000 citations that the algorithm has received in the past decades. However, the algorithm is prohibitively high in terms of time and space complexity, and so poses significant computational challenges. Apache Spark is an increasingly popular fast big data analytics engine, which has been highly successful in implementing large-scale data-intensive applications on commercial hardware. This paper presents the first ever reported systemthat implements the SW algorithm on Apache Spark based distributed computing framework, with a couple of off-the-shelf workstations, which is named as SparkSW.

The scalability and load-balancing efficiency of the system are investigated by realistic ultra-large database from the state-of-the-art UniRef100. The experimental results indicate that 1) SparkSW is load-balancing for parallel adaptive on workloads and scales extremely well with the increases of computing resource, 2) SparkSW provides a fast and universal option high sensitively biological sequence alignments. The success of SparkSW also reveals that Apache Spark framework provides an efficient solution to facilitate coping with ever increasing sizes of biological sequence databases, especially generated by second-generation sequencing technologies.

Technology	Ph.D	M.Tech	M.S
Wireless Sensor Networks	4	20	11
Security	3	26	15
Mobile computing	7	30	16
Cognitive Radio Network	6	39	14
IOT	8	21	15
LTE	4	23	18
Manet	2	29	25
Open Flow	2	18	28
SDN	12	16	24
VANET	10	34	14
Vide Streaming	3	6	9
WBAN	11	15	19
Vertical Handover	4	10	18
D-D communication	2	12	6
Attacks	30	57	39
WIFI	3	5	8
Bluetooth	2	5	4
Social sensor network	6	11	24
Under water sensor network	7	17	11
Multicast	1	18	5
5g,4g	10	38	12
IPv4,IPV6	15	40	14

Related Topics