首页> 外文会议>International Symposium on Computer and Information Sciences(ISCIS 2005); 20051026-28; Istanbul(TR) >Effective Early Termination Techniques for Text Similarity Join Operator
【24h】

Effective Early Termination Techniques for Text Similarity Join Operator

机译:文本相似联接运算符的有效早期终止技术

获取原文
获取原文并翻译 | 示例

摘要

Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics.
机译:如果文本相似性联接运算符的联接属性在文本上彼此相似,则它们将联接两个关系,并且它具有多种应用程序域,包括集成和从异构资源中查询数据;数据清理;和数据挖掘。尽管文本相似性联接运算符被广泛使用,但是由于执行了大量相似性计算,其处理成本很高。在本文中,我们将信息检索领域中的一些快捷评估技术(即Harman,退出,继续和最大相似度过滤器启发式方法)结合到先前提出的文本相似度联接算法中,以减少联接操作期间所需的相似度计算量。我们使用从DBLP参考书目数据库获得的真实数据,通过实验评估了原始的和基于启发式的相似性联接算法,并通过连续和最大相似性过滤器启发式方法观察了性能的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号