智能与分布计算实验室
 

澳大利亚墨尔本大学张瑞高级讲师学术讲座通知
时间:2017-09-27



应计算机学院智能与分布计算实验室邀请,澳大利亚墨尔本大学张瑞高级讲师将于2013年10月31日(周四)上午来计算机学院举行学术报告,欢迎广大师生参加。

讲座题目:MELODY-JOIN: Efficient Earth Mover's Distance Similarity Joins using MapReduce

讲座时间:2013年10月31日(周四)上午10:00开始
讲座地点:计算机学院会议室(南1楼433室)

讲座题目:MELODY-JOIN: Efficient Earth Mover's Distance Similarity Joins using MapReduce

Rui Zhang
Department of Computing and Information Systems
The University of Melbourne
Australia

讲座摘要:
The Earth Mover’s Distance (EMD) similarity join retrieves pairs of records with EMD below a given threshold. It has a number of important emerging applications such as near duplicate image retrieval and pattern analysis in probabilistic datasets. However, the computational cost of EMD is super cubic to the number of bins in a histogram that is used to represent a data record. Therefore the EMD similarity join operation is prohibitive for large datasets. This is the first paper that specifically addresses EMD similarity join and we propose to use MapReduce to approach this problem. MapReduce algorithms designed for generic metric distance similarity join are inefficient for EMD similarity join because they involve a large number of distance computations and have unbalanced workloads on reducers when dealing with skewed datasets. We propose a novel framework, named MELODY-JOIN, to transform data into the space of EMD lower bounds and perform pruning and partitioning in this space at a linear cost. Based on MELODYJOIN, we further propose to integrate the quantile grid and the estimation based load balancing techniques to obtain balanced workloads on reducers. To achieve high pruning power, we plug multiple EMD lower bounds into MELODY-JOIN thanks to the generic character of MELODY-JOIN. We conduct experiments on COREL and MIRFLICKR image collections and the results show that MELODY-JOIN outperforms the best existing method up to an order of magnitude.

讲者简介:
Dr Rui Zhang is currently a senior lecture in the University of Melbourne. He obtained his bachelor's degree from Tsinghua University in 2001 and PhD from National University of Singapore in 2006. He is a regular visiting researcher at Microsoft Research Asia in Beijing. He has authored and co-authored over 60 publications in prestigious conferences and journals. His research interest is data and information management in general, particularly in areas of indexing techniques, moving object management, web services, data streams and sequence databases. He regularly serves as PC members of top conferences in data management and mining such as SIGMOD, VLDB, ICDE and KDD. He is an Australian Future Fellow (equivalent of mid-career "Thousand Talent Plan" in Australia).

http://ww2.cs.mu.oz.au/~rui/