首页> 外文期刊>Malaysian Journal of Computer Science >Improving Document Relevancy Using Integrated Language Modeling Techniques
【24h】

Improving Document Relevancy Using Integrated Language Modeling Techniques

机译:使用集成语言建模技术改善文档相关性

获取原文
           

摘要

This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved.
机译:本文提出了一种集成的语言模型,以提高文本查询的文档相关性。准确地说,开发了集成的词干-词原化(S-L)模型,并在三个文档级别(即前5名,10名和15名)比较了其检索性能。开发了原型搜索引擎并执行了15个查询。平均平均精度显示出S-L模型优于基线(即没有语言处理),词干分析以及在文档的所有三个级别上的词形化模型。这些结果也得到直方图精度的支持,直方图精度说明了集成模型以提高文档的相关性。但是,要注意的是,各种模型之间的精度差异不明显。总体而言,该研究发现,将语言处理技术(即词干处理和词义化处理)结合起来,可以检索到更多相关文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号