【24h】

General Representation Model for Text Similarity

机译:文本相似性的通用表示模型

获取原文

摘要

Text similarity is a central issue in multiple information access tasks. General speaking, most of existing similarity models focus on a particular kind of text features such as words, n-grams, or linguistic features or distributional semantics units. In this paper, we introducea general theoretical model for integrating multiple sources in the text feature representation called Feature Projection Information model. The proposed model allows us to integrate traditional features such as words with other sources such as the output of classifiers over different categories or distributional semantics information. The theoretical analysis shows that traditional approaches can be seen as particularizations of the model. Our first empirical results support the idea that additional features in the representation step outperform the predictive power of similarity measures.
机译:文本相似性是多个信息访问任务中的核心问题。一般而言,大多数现有的相似性模型都集中在一种特殊的文本特征上,例如单词,n-gram,语言特征或分布语义单元。在本文中,我们介绍了一种在文本特征表示中集成多个源的通用理论模型,称为特征投影信息模型。提出的模型使我们能够将诸如单词之类的传统特征与其他来源(例如,不同类别上的分类器的输出或分布语义信息)进行集成。理论分析表明,传统方法可以看作是模型的特殊化。我们的第一个实证结果支持这样的想法,即表示步骤中的其他功能胜过相似性度量的预测能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号