Entity Resolution with Deep Interactions and Fine-Grained Difference Extraction based on BERT

Authors: Huiting Yuan, Liang Zhu, Yu Wang, and Zhouyang Liu
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 715-732
Keywords: entity resolution matching deep interaction fine-grained information blocking

Abstract

Entity Resolution ER is crucial for data integration and it aims to determine whether a pair of records from one or multiple datasets refer to the same real-world entity. With growing complexity and diversity in record structures, traditional ER methods rely on coarse-grained features, making it difficult to delve into subtle semantic associations and difference between records, which in turn affects model performance. Furthermore, processing each record pair individually also increases computational costs. To overcome the limits of existing methods, we propose DIBER, a novel ER model based on Siamese networks structure and a pre-trained language model PLM that generates contextually rich representations of records. DIBER harnesses co-attention to discern inter-record relationships and applies a fusion and weighted- attention to pinpoint subtle but significant distinctions. It further integrates a feature extractor for extracting fine-grained and pivotal matching information, complementing the global context furnished by the PLM. This results in richer, more discriminative entity representations. It also can be flexibly applied to blocking. Extensive experiments are conducted on benchmark datasets and compared with state-of-the-art SOTA methods, showing superior performance on small-scale datasets without injecting specific domain knowledge.
📄 View Full Paper (PDF) 📋 Show Citation