首页> 外文会议>International conference on Asian-Pacific digital libraries >Finding 'Similar but Different' Documents Based on Coordinate Relationship
【24h】

Finding 'Similar but Different' Documents Based on Coordinate Relationship

机译:找到基于坐标关系的“类似但不同”的文档

获取原文

摘要

Traditional search technologies are based on similarity relationship such that they return content similar documents in accordance with a given one. However, such similarity-based search does not always result in good results, e.g., similar documents will bring little additional information so that it is difficult to increase information gain. In this paper, we propose a method to find similar but different documents of a user-given one by distinguishing coordinate relationship from similarity relationship between documents. Simply, a similar but different document denotes the document with the same topic as that of the given document, but describing different events or concepts. For example, given as the input a news article stating the occurrence of the Oregon school shooting, articles stating the occurrence of other school shooting events, such as the Virginia Tech shooting, are detected and returned to users. Experiments conducted on the New York Times Annotated Corpus verify the effectiveness of our method and illustrate the importance of incorporating coordinate relationship to find similar but different documents.
机译:传统的搜索技术基于相似关系,使得它们根据给定的方式返回内容类似的文档。然而,这种基于相似性的搜索并不总是产生良好的结果,例如,类似的文档将带来很少的其他信息,以便难以提高信息增益。在本文中,我们通过区分文档之间的相似关系的坐标关系,提出一种方法来找到用户给定的类似文件。简单地,类似但不同的文档表示具有与给定文档相同主题的文档,但描述了不同的事件或概念。例如,作为投入的新闻文章陈述了俄勒冈州学校射击的发生,检测到授予其他学校射击事件(如弗吉尼亚科技拍摄)的文章,并返回给用户。在纽约时报注释语料库进行的实验验证了我们方法的有效性,并说明了统一关系找到类似但不同文档的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号