首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >Document similarity analysis via involving both explicit and implicit semantic couplings
【24h】

Document similarity analysis via involving both explicit and implicit semantic couplings

机译:通过涉及显式和隐式语义耦合的文档相似性分析

获取原文

摘要

Document similarity analysis is increasingly critical since roughly 80% of big data is unstructured. Accordingly, semantic couplings (relatedness) have been recognized valuable for capturing the relationships between terms (words or phrases). Existing work focuses more on explicit relatedness, with respective models built. In this paper, we propose a comprehensive semantic similarity measure: Semantic Coupling Similarity (SCS), which (1) captures intra-term pair couplings within term pairs represented by patterns of explicit term co-occurrences in a document set, (2) extracts inter-term pair couplings between term pairs indicated by implicit couplings between term pairs through indirectly linked terms and paths between terms after term connections are converted to a graph presentation; and (3) semantic coupling similarity, integrating intra- and inter-term pair couplings towards a comprehensive capturing of explicit and implicit couplings between terms across documents. SCS caters for both synonymy and polysemy, and outperforms baseline methods consistently on all real data sets.
机译:由于大约80%的大数据都是非结构化的,因此文档相似性分析变得越来越重要。因此,已经认识到语义耦合(相关性)对于捕获术语(单词或短语)之间的关系是有价值的。现有的工作更多地关注于显式的相关性,并建立了各自的模型。在本文中,我们提出了一种全面的语义相似度度量:语义耦合相似度(SCS),其(1)捕获由文档集中的显式术语共现模式表示的术语对内的术语对配对,(2)提取术语对之间的术语对之间的耦合由术语对之间的隐式耦合通过间接链接的术语和术语之间的路径转换为图形表示后的术语之间的路径表示; (3)语义耦合相似性,将词间和词间对耦合集成在一起,以全面捕获文档中各个词之间的显式和隐式耦合。 SCS可满足同义词和多义性的要求,并且在所有实际数据集上均始终优于基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号