HEADS-JOIN: Efficient Earth Mover's Distance Joins on Hadoop

The Earth Mover’s Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply porting the state-of-the-art metric distance similarity join algorithms to Hadoop results in inefficiency because they involve excessive distance computations and are vulnerable to skewed data 1 deposit casino canada.com distributions.

We propose a novel framework, named HEADS-JOIN, which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has constant or linear complexity. We investigate both range and top-k joins, and design efficient algorithms on three popular Hadoop computation paradigms, i.e., MapReduce, Bulk Synchronous Parallel, and Spark. We conduct extensive experiments on both real and synthetic datasets. The results show that HEADS-JOIN outperforms the state-of-the-art metric similarity join technique, i.e., Quickjoin, by up to an order of magnitude and scales out well.

Tagged Ns3 Projects

HEADS-JOIN: Efficient Earth Mover’s Distance Similarity Joins on Hadoop

Our CLients

Business Time

Mon to Sat – 9am to 7pm

lunch Time – 1.30pm to 2.15pm

(Sunday Holiday)

Our Features

Our Research Services

Helping 1M+ Research Scholars since 2010

Payment Options