Ns3 Projects for B.E/B.Tech M.E/M.Tech PhD Scholars.  Phone-Number:9790238391   E-mail: ns3simulation@gmail.com

Hadoop Performance Modeling for Job Estimation and Resource Provisioning

MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementation of MapReduce, has been adopted by an increasingly growing user community. Cloud computing service providers such as Amazon EC2 Cloud offer the opportunities for Hadoop users to lease a certain amount of resources and pay for their use. However, a key challenge is that cloud service providers do not have a resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user’s responsibility to estimate the required amount of resources for running a job in the cloud. This paper presents a Hadoop job performance model that accurately estimates job completion time and further provisions the required amount of resources for a job to be completed within a deadline.

The proposed model builds on historical job execution records and employs Locally Weighted Linear Regression (LWLR) technique to estimate the execution time of a job. Furthermore, it employs Lagrange Multipliers technique for resource provisioning to satisfy jobs with deadline requirements. The proposed model is initially evaluated on an in-house Hadoop cluster and subsequently evaluated in the Amazon EC2 Cloud. Experimental results show that the accuracy of the proposed model in job execution estimation is in the range of 94.97% and 95.51%, and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model.

A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPU

The wide application of omics research has produced a burst of biological data in recent years, which has in turn increased the need to infer biological networks from data. Learning biological networks from experimental data can help detect and analyze aberrant signaling pathways, which can be used in diagnosis of diseases at an early stage. Most networks can be modeled as Bayesian Networks (BNs). However, because of its combinatorial nature, computational learning of dependent relationships underlying complex networks is NP-complete. To reduce the complexity, researchers have proposed to use Markov chain Monte Carlo (MCMC) methods to sample the solution space.

MCMC methods guarantee convergence and traversability. However, MCMC is not scalable for networks with more than 40 nodes because of the computational complexity. In this work, we optimize an MCMC-based learning algorithm and implement it on a general-purpose graphics processing unit (GPGPU). We achieve a 2:46 speedup by optimizing the algorithm and an additional 58-fold acceleration by implementing it on a GPU. In total, we speed up the algorithm by 143. As a result, we can apply thissystem to networks with up to 125 nodes, a size that is of interest to many biologists. Furthermore, we add artificial interventions to the scores in order to incorporate prior knowledge of interactions into the Bayesian inference, which increases the accuracy of results. Our system provides biologists with a more computational efficient tool at a lower cost than previous works.

Service Operator-Aware Trust Scheme for Resource Matchmaking across Multiple Clouds

This paper proposes a service operator-aware trust scheme (SOTS) for resource matchmaking across multiple clouds. Through analyzing the built-in relationship between the users, the broker, and the service resources , this paper proposes a middleware framework of trust management that can effectively reduces user burden and improve system dependability. Based on multidimensional resource service operators, we model the problem of trust evaluation as a process of multi-attribute decision-making, and develop an adaptive trust evaluation approach based on information entropy theory.

This adaptive approach can overcome the limitations of traditional trust schemes, whereby the trusted operators are weighted manually or subjectively. As a result, using SOTS, the broker can efficiently and accurately prepare the most trusted resources in advance, and thus provide more dependable resources to users. Our experiments yield interesting and meaningful observations that can facilitate the effective utilization of SOTS in a large-scale multi-cloud environment.

HEADS-JOIN: Efficient Earth Mover’s Distance Similarity Joins on Hadoop

The Earth Mover’s Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply porting the state-of-the-art metric distance similarity join algorithms to Hadoop results in inefficiency because they involve excessive distance computations and are vulnerable to skewed data 1 deposit casino canada.com distributions.

We propose a novel framework, named HEADS-JOIN, which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has constant or linear complexity. We investigate both range and top-k joins, and design efficient algorithms on three popular Hadoop computation paradigms, i.e., MapReduce, Bulk Synchronous Parallel, and Spark. We conduct extensive experiments on both real and synthetic datasets. The results show that HEADS-JOIN outperforms the state-of-the-art metric similarity join technique, i.e., Quickjoin, by up to an order of magnitude and scales out well.

Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-processor

Because there is an enormous amount of genomic data, next-generation sequencing (NGS) applications pose significant challenges to current computing systems. In this study, we investigate both algorithmic and architectural strategies to accelerate an NGS data analysis algorithm–short read mapping on commodity multi-core platform and customizable FPGA (field programmable gate array) co-processor architecture, respectively.

A workload analysis reveals that conventional memory 1 deposit casino uk.com optimization is limited in its irregular computation of low arithmetic intensity and non-contiguous memory access pattern. To mitigate the inherent irregular computation in mapping, we have developed a FPGA co-processor based on Convey computer, which employs a scatter-gather memory mechanism that exploits both bit-level and word-level parallelism. The customized FPGA co-processor achieves a throughput of 947Gbp per day, about 189 times higher than that of current mapping tools on single CPU core. Moreover, the co-processor’s power efficiency is 29 times higher than that of a conventional 64-core multi-processor.

Routing Pressure: A Channel-Related and Traffic-Aware Metric of Routing Algorithm

How to precisely measure performance of routing algorithm is an important issue when studying routing algorithm of network-on-chip (NoC). The degree of adaptiveness is the most widely used metric in the literature. However, our study shows that the degree of adaptiveness cannot precisely measure performance of routing algorithm 1 deposit casino nz.com. It cannot account for why routing algorithm with high degree of adaptiveness may have poor performance. Simulation has to be carried out to evaluate performance of routing algorithm.

In this paper, we propose a new metric of routing pressure for measuring performance of routing algorithm. It has higher precision of measuring routing algorithm performance than the degree of adaptiveness. Performance of routing algorithm can be evaluated through routing pressure without simulation. It can explain why congestion takes place in network. In addition, where and when congestion takes place can be pointed out without simulation.

Neighbor Similarity Trust against Sybil Attack in P2P E-Commerce

Peer to peer (P2P) e-commerce applications exist at the edge of the Internet with vulnerabilities to passive and active attacks. These attacks have pushed away potential business firms and individuals whose aim is to get the best benefit in e-commerce with minimal losses. The attacks occur during interactions between the trading peers as a transaction takes place. In this paper, we propose how to address Sybil attack, an active attack, in which peers can have bogus and multiple identities to fake their owns.

Most existing work, which concentrates on social networks and trusted certification, has not been able to prevent Sybil attack peers from doing transactions. Our work exploits the neighbor similarity trust relationship to address Sybil attack. In our approach, duplicated Sybil attack peers can be identified as the neighbor peers become acquainted and hence more trusted to each other. Security and performance analysis shows that Sybil attack can be minimized by our proposed neighbor similarity trust.

Distributed Topological Convex Hull Estimation of Event Region in Wireless Sensor Networks without Location Information

In critical event (e.g., fire or gas) monitoring applications of wireless sensor networks (WSNs), convex hull of the event region is an efficient tool in handling the usual tasks like event report, routes reconstruction and human motion planning. Existing works on estimating convex hull of event region usually require location information of sensor nodes, which needs high communication cost or hardware cost. In this paper, to avoid the requirement of location information, we define topological convex hull (T-convex hull) which presents the convex contour of an event region directly with a route passing by nodes, and hence becomes more efficient in handling the above tasks.

To obtain the T-convex hull of event region in the absence of locations, we propose a low-weight (in terms of computation and storage resource requirement) distributed algorithm, with which sensor nodes just need to count the hop counts from some nodes. The communication cost of the algorithm is also low and independent of the network size. Comprehensive and largescale simulations are conducted, showing the effectiveness and much lower communication cost of the proposed algorithm, compared with related method.

Towards Distributed Optimal Movement Strategy for Data Gathering in Wireless Sensor Networks

In this paper, we address how to design a distributed movement strategy for mobile collectors, which can be either physical mobile agents or query/collector packets periodically launched by the sink, to achieve successful data gathering in wireless sensor networks. Formulating the problem as general random walks on a graph composed of sensor nodes, we analyze how much data can be successfully gathered in time under any Markovian random-walk movement strategies for mobile collectors moving over a graph (or network), while each sensor node is equipped with limited buffer space and data arrival rates are heterogeneous over different sensor nodes.

In particular, from the analysis, we obtain the optimal movement strategy among a class of Markovian strategies so as to minimize the data loss rate over all sensor nodes, and explain how such an optimal movement strategy can be made to work in adistributed fashion. We demonstrate that our distributed optimal movement strategy can lead to about 2 times smaller loss rate than a standard random walk strategy under diverse scenarios. In particular, our strategy results in up to 70% cost savings for the deployment of multiple collectors to achieve the target data loss rate than the standard random walk strategy.

Switch-Centric Data Center Network Structures Based on Hypergraphs and Combinatorial Block Designs

Fat trees are considered suitable structures for data center interconnection networking. Such structures are rigid, and hard to scale up and scale out. A good data center network structure should have high scalability, efficient switch utilization, and high reliability. In this paper we present a class of data center network structures based on hypergraph theory and combinatorial block design theory.

We show that our data center network structures are more flexible and scalable than fat trees. Using switches of the same size, our data center network structures can connect more nodes than fat trees, and it is possible to construct different structures with tradeoffs among inter-cluster communication capacity, reliability, the number of switches used, and the number of connected nodes.