【24h】

Improving the retrieval performance by using distance-based bigram

机译:通过使用基于距离的二元函数提高检索性能

获取原文

摘要

In this paper, we discussed a new scheme of forming and weighing a term called a distance-based bigram. In this scheme, the distance between two words is considered for a new term and a new weighting. This new scheme is applied to the vector formation in the process of the vector space model with other standard term forming schemes: unigram and bigram. The tested domains are English and Thai medical corpora. The results show that our proposed method performs well for the Thai corpus under the condition that only a few returned documents are needed. Within the first ten percent of recall, our method improves the precision over the standard unigram by nearly 30%.
机译:在本文中,我们讨论了一种新的形成和加权术语的方案,称为基于距离的二元模。在该方案中,两个词之间的距离被认为是一个新的术语和一个新的权重。该新方案与其他标准术语形成方案(unigram和bigram)一起应用于矢量空间模型过程中的矢量形成。被测试的域是英语和泰国医疗语料库。结果表明,在仅需要返回几个文档的情况下,我们提出的方法对于泰国语料库表现良好。在召回的前10%内,我们的方法比标准unigram的精度提高了近30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号