首页> 外文会议>2014 International conference on advanced networking distributedystems and applications >A New Term-Term Similarity Measure for Selecting Expansion Features in Big Data
【24h】

A New Term-Term Similarity Measure for Selecting Expansion Features in Big Data

机译:选择大数据扩展特征的新术语-术语相似性度量

获取原文
获取原文并翻译 | 示例

摘要

The massive growth of information and the exponential increase in the number of documents published and uploaded online each day have led to led to the appearance of new words in the Internet. Due to the difficulty of reaching the meanings of these new terms, which play a central role in retrieving the desired information, it becomes necessary to give more importance to the sites and topics where these new words appear, or rather, to give value to the words that occur frequently with them. For this purpose, in this paper, we propose a new term-term similarity measure based on the co-occurrence and closeness of words. It relies on searching for each query feature the locations where it appears, then selecting from these locations the words which often neighbor and co-occur with the query features, and finally used the selected words in the retrieval process. Our experiments were performed using the OHSUMED test collection and show significant performance enhancement over the state-of-the-art.
机译:信息的大量增长以及每天在线发布和上传的文档数量呈指数级增长,导致出现了新词在Internet上出现。由于很难找到这些新术语的含义,这些含义在检索所需信息中起着核心作用,因此有必要更加重视这些新单词出现的位置和主题,或者更重要的是,经常出现的单词。为此,在本文中,我们提出了一种基于词的共现和接近度的新的词项相似度度量。它依赖于搜索每个查询特征出现的位置,然后从这些位置中选择经常与查询特征相邻并共同出现的单词,最后在检索过程中使用选定的单词。我们的实验是使用OHSUMED测试集进行的,并且显示出与现有技术相比显着的性能增强。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号