首页> 中文期刊> 《计算机工程与应用》 >基于词项语义组合的文本相似度计算方法研究

基于词项语义组合的文本相似度计算方法研究

         

摘要

Similarity comparison between texts is mainly based on keywords matching, while lacking of analysis of com-bination relationship among keywords deeply. Aiming at the combination of keywords, the larger of the sum of keywords which appears orderly, the greater significance for the similarity comparison between texts, a novel non-linear semantic relevance function is proposed based on the sum of keywords combination cooperatively, under the foundation of LCS the-ory, it extracts all the combination blocks of keywords. The experimental results on an open benchmark dataset from Microsoft Research Paraphrase corpus(MSRP)show that the proposed algorithm acquires a well accuracy and F1 perfor-mance particularly compared with traditional algorithm under the circumstance of short text similarity comparison.%文本之间在相似度比较时主要考虑关键词的匹配特性,缺乏对关键词间组合关系的深入分析。针对关键词间组合特性,按序组合的关键词数目越大,对文本之间相似度贡献越大,并提出基于关键词组合数目的非线性语义关联性函数,在LCS基础上提取文本中所有关键词组合块。将这种结合关键词组合关系的相似度比较方法运用于短文本的相似度比较中,数据采用微软语义释义语料库,实验结果表明,短文本相似度计算的准确率和F1值都有了提高,其中F1值的提高较为明显。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号