【24h】

Information Retrieval and Large Text Structured Corpora

机译:信息检索和大文本结构语料库

获取原文
获取原文并翻译 | 示例

摘要

First, it is necessary to emphasise that it is mandatory to transform documents of the corpora into a common format when managing large amounts of information. This will allow us to query all documents using a unique query and to improve the performance of the system. By doing so we will avoid problems with performance and result management. Furthermore, nowadays, the technologies used to build IRSs are not prepared to satisfy corpora users' requirements. So, in the near future the development of new add-ons which take them into account is needed. There are some timid attempts to include basic linguistic operations (sensitivity to accents, umlauts, etc., theme searches, etc.) based on localization, but it is time to incorporate Syntactic techniques into commercial systems to enable the building of more versatile IRSs based on corpora.
机译:首先,必须强调,在管理大量信息时,必须将语料库的文档转换为通用格式。这将使我们能够使用唯一查询查询所有文档,并提高系统性能。这样我们将避免性能和结果管理方面的问题。此外,如今,用于构建IRS的技术还无法满足语料库用户的需求。因此,在不久的将来,需要开发将这些附加功能纳入考虑范围的新附加功能。有一些胆小的尝试,包括基于本地化的基本语言操作(对重音,变音符号等敏感度,主题搜索等),但是现在是时候将语法技术结合到商业系统中,以构建基于IRS的更多通用功能在语料库上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号