首页> 外国专利> Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems

Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems

机译:在基于关键字的检索系统中找到有意义的停用词或停用词组

摘要

A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.
机译:停用词检测组件检测输入到基于关键字的信息检索系统的搜索查询中的停用词(也包括停用词组)。最初通过将搜索查询中的术语与已知停用词列表进行比较来识别潜在的停用词。然后根据搜索查询和识别出的停用词来检索上下文数据。在一个实现中,上下文数据包括从文档索引检索的文档。在另一实施方式中,上下文数据包括与搜索查询有关的类别。将一组检索到的上下文数据相互比较,以确定它们是否基本相似。如果上下文数据的集合基本相似,则该事实可用于推断潜在停用词的去除对搜索而言并不重要。如果上下文数据集基本不相似,则可能将停用词视为对搜索重要的内容,不应将其从查询中删除。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号