Cost-aware load balancing for multilingual record linkage using MapReduce
Author's Department
Computer Science & Engineering Department
Find in your Library
https://www.sciencedirect.com/science/article/pii/S2090447919301145?via%3Dihub
Document Type
Research Article
Publication Title
Ain Shams Engineering Journal
Publication Date
1-1-2019
doi
10.1016/j.asej.2019.08.009
Abstract
Gathering and processing large amounts of data is increasing every day. Record linkage is one of the most complex data-intensive tasks, which is used to accurately match records from different data sources that contain information about same entity like a person, especially when they do not share common identifier. As more resources in more than one language become available, new methods are required that are capable to match records expressed in more than one language. In this paper, we are presenting a scalable, cost-aware load balancing technique over MapReduce that is capable to link records from different multilingual data sources accurately and efficiently by re-distributing the multilingual matching tasks on available machines based on their cost. We are evaluating our approach on a Hadoop cluster on cloud infrastructure against state of the art blocking-based load balancing techniques, where our approach outperforms other approaches in terms of execution time and scalability.
First Page
419
Last Page
433
Recommended Citation
APA Citation
Salama, C.
(2019). Cost-aware load balancing for multilingual record linkage using MapReduce. Ain Shams Engineering Journal, 11(2), 419–433.
10.1016/j.asej.2019.08.009
https://fount.aucegypt.edu/faculty_journal_articles/113
MLA Citation
Salama, Cherif
"Cost-aware load balancing for multilingual record linkage using MapReduce." Ain Shams Engineering Journal, vol. 11,no. 2, 2019, pp. 419–433.
https://fount.aucegypt.edu/faculty_journal_articles/113