首页> 外文会议>Conference on Document Recognition and Retrieval Ⅷ Jan 24-25, 2001, San Jose, USA >Recognition Techniques for Extracting Information from Semi-Structured Documents
【24h】

Recognition Techniques for Extracting Information from Semi-Structured Documents

机译:从半结构化文档中提取信息的识别技术

获取原文
获取原文并翻译 | 示例

摘要

Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
机译:光学文件的档案越来越多地被使用,只要新的规范认可了数字文件的法律价值,新的规范也会推动这种需求,前提是它们必须存储在物理上不可更改的支撑上。在供应方面,现在有一个庞大且技术先进的市场,其中光学存储器以与磁存储器相当的成本解决了数据的持续时间和永久性问题。这些系统中剩下的瓶颈是索引。具有可变结构的文档索引虽然仍不是完全自动化,但可以在很大程度上在机器上得到支持,这在工作的组织和提取信息方面均具有明显的优势,从而提供了更为详细和潜在的数据意义。用户。我们在这里介绍了一种用于自动注册与公职之间来往信件的系统。该系统基于从半结构化文档中提取,索引,归档和检索重要信息的通用方法。在我们的原型应用程序中,此信息分布在发件人,收件人,主题,日期和文档正文的数据库字段中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号