Large scale instance matching via multiple indexes and candidate selection

Juanzi Li; Zhichun Wang; Xiao Zhang; Jie Tang

首页> 外文期刊>Knowledge-Based Systems >Large scale instance matching via multiple indexes and candidate selection

【24h】

Large scale instance matching via multiple indexes and candidate selection

机译：通过多个索引和候选者选择进行大规模实例匹配

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Instance matching aims to discover the linkage between different descriptions of real objects across heterogeneous data sources. With the rapid development of Semantic Web, especially of the linked data, automatically instance matching has been become the fundamental issue for ontological data sharing and integration. Instances in the ontologies are often in large scale, which contains millions of, or even hundreds of millions objects. Directly applying previous schema level ontology matching methods is infeasible. In this paper, we systematically investigate the characteristics of instance matching, and then propose a scalable and efficient instance matching approach named VMI. VMI generates multiple vectors for different kinds of intained in the ontology instances, and uses a set of inverted indexes based rules to get the primary matching candidates. Then it employs user customized property values to further eliminate the incorrect matchings. Finally the similarities of matching candidates are computed as the integrated vector distances and the matching results are extracted. Experiments on instance track from OAEI 2009 and OAEI 2010 show that the proposed method achieves better effectiveness and efficiency (a speedup of more than 100 times and a bit better performance (+3.0% to 5.0% in terms of F1-score) than top performer RiMOM on most of the datasets). Experiments on Linked MDB and DBpedia show that VMI can obtain comparable results with the SILK system (about 26,000 results with good quality).

机译：实例匹配旨在发现跨异构数据源的真实对象的不同描述之间的联系。随着语义Web（尤其是链接数据）的飞速发展，自动实例匹配已成为本体数据共享和集成的基本问题。本体中的实例通常是大规模的，其中包含数百万甚至数亿个对象。直接应用先前的模式级别本体匹配方法是不可行的。在本文中，我们系统地研究了实例匹配的特征，然后提出了一种可扩展且高效的实例匹配方法，称为VMI。 VMI为本体实例中的不同类型的对象生成多个向量，并使用一组基于倒排索引的规则来获取主要匹配候选对象。然后，它使用用户自定义的属性值来进一步消除不正确的匹配。最终，随着积分矢量距离的计算出匹配候选者的相似度，并提取出匹配结果。在OAEI 2009和OAEI 2010上进行的实例跟踪实验表明，所提出的方法比性能最高的方法具有更好的有效性和效率（速度提高了100倍以上，性能也有所提高（按F1评分为+ 3.0％到5.0％）） RiMOM在大多数数据集上）。在链接的MDB和DBpedia上进行的实验表明，VMI可以在SILK系统上获得可比的结果（大约26,000个结果具有良好的质量）。

著录项

来源
《Knowledge-Based Systems》 |2013年第9期|112-120|共9页
作者
Juanzi Li; Zhichun Wang; Xiao Zhang; Jie Tang;
展开▼
作者单位

Department of Computer Science and Technology, Tsinghua University. Beijing, China;

Department of Computer Science and Technology, Tsinghua University. Beijing, China,College of Information Science and Technology, Beijing Normal University, Beijing, China;

Department of Computer Science and Technology, Tsinghua University. Beijing, China;

Department of Computer Science and Technology, Tsinghua University. Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Heterogeneous data; Semantic web; Instance matching; Ontology matching; Linked data;

机译：异构数据;语义网;实例匹配;本体匹配;关联数据;

相似文献

外文文献
中文文献
专利

1. Large scale instance selection by means of federal instance selection [J] . Aida de Haro-Garcia, Nicolas Garcia-Pedrajas, Juan Antonio Romero del Castillo Data & Knowledge Engineering . 2012,第期

机译：通过联邦实例选择进行大型实例选择
2. Combining example selection with instance selection to speed up multiple-instance learning [J] . Liming Yuan, Jiafeng Liu, Xianglong Tang Neurocomputing . 2014,第apra10期

机译：结合实例选择和实例选择以加快多实例学习
3. Multiple-instance learning via multiple-point concept based instance selection [J] . Yuan Liming, Xu Guangping, Zhao Lu, International journal of machine learning and cybernetics . 2020,第9期

机译：通过基于多点概念的实例选择的多实例学习
4. Graph-based visual instance mining with geometric matching and nearest candidates selection [C] . Ngoc-Bao Nguyen, Khang M. T. T. Nguyen, Cuong Mai Van, International Conference on Knowledge and Systems Engineering . 2017

机译：具有几何匹配和最近候选者选择的基于图的可视实例挖掘
5. Instance selection for simplified decision trees through the generation and selection of instance candidate subsets. [D] . Bennette, Walter Dean. 2011

机译：通过实例候选子集的生成和选择，简化决策树的实例选择。
6. Drug activity prediction using multiple-instance learning via joint instance and feature selection [O] . Zhendong Zhao, Gang Fu, Sheng Liu, 2013

机译：通过联合实例和特征选择使用多实例学习进行药物活动预测
7. Large scale instance matching via multiple indexes and candidate selection [O] . Juanzi Li, Zhichun Wang, Xiao Zhang, 2013

机译：通过多个索引和候选选择匹配大规模实例
8. Variety Preserved Instance Weighting and Prototype Selection for Probabilistic Multiple Scope Simulations. [R] . Washio, T. 2017

机译：概率多范围模拟的多种保留实例加权和原型选择。

Large scale instance matching via multiple indexes and candidate selection

摘要

著录项

相似文献

相关主题

期刊订阅