Cost-aware load balancing for multilingual record linkage using MapReduce

Author's Department

Computer Science & Engineering Department

Find in your Library

https://www.sciencedirect.com/science/article/pii/S2090447919301145?via%3Dihub

All Authors

Doaa Medhat; Ahmed H. Yousef; Cherif Salama

Document Type

Research Article

Publication Title

Ain Shams Engineering Journal

Publication Date

1-1-2019

doi

10.1016/j.asej.2019.08.009

Abstract

Gathering and processing large amounts of data is increasing every day. Record linkage is one of the most complex data-intensive tasks, which is used to accurately match records from different data sources that contain information about same entity like a person, especially when they do not share common identifier. As more resources in more than one language become available, new methods are required that are capable to match records expressed in more than one language. In this paper, we are presenting a scalable, cost-aware load balancing technique over MapReduce that is capable to link records from different multilingual data sources accurately and efficiently by re-distributing the multilingual matching tasks on available machines based on their cost. We are evaluating our approach on a Hadoop cluster on cloud infrastructure against state of the art blocking-based load balancing techniques, where our approach outperforms other approaches in terms of execution time and scalability.

First Page

419

Last Page

433

This document is currently not available here.

Share

COinS