首页> 外国专利> Finite-state transduction of related word forms for text indexing and retrieval

Finite-state transduction of related word forms for text indexing and retrieval

机译:相关词形的有限状态转换,用于文本索引和检索

摘要

The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite- state transducers to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The techniques described apply generally across the languages of the world and are not just limited to simple suffixing languages like English. Although the resulting transducers can have many states and transitions or arcs, they can be compacted by finite-state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel finite state transducer as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.
机译:本发明解决了在自然语言文档的全文检索中使用词干(单词含义的规范指示符)的许多问题,因此允许在不牺牲精度的情况下改善回忆。它使用有限状态换能器的各种安排来精确编码在单词和词干之间来回映射的多种理想方式,同时考虑到语言的形态规则系统的系统方面以及逐字不规则的情况,发生。所描述的技术通常适用于世界各地的语言,而不仅限于像英语这样的简单后缀语言。尽管最终的换能器可以具有许多状态,过渡或弧形,但可以通过有限状态压缩算法对其进行压缩,以便可以在资源受限的应用中有效使用它们。本发明构想了一种信息检索系统,该系统包括新颖的有限状态传感器作为数据库和处理器,用于响应用户查询,搜索数据库并输出适当的响应(如果存在的话),以及用于这种用户的新颖数据库。用于构建新型数据库的系统和方法。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号