首页> 外国专利> Automatic transformation of complex tables in documents into computer understandable structured format with mapped dependencies and providing schema-less query support for searching table data

Automatic transformation of complex tables in documents into computer understandable structured format with mapped dependencies and providing schema-less query support for searching table data

机译:使用映射依赖项自动将文档中复杂表的复杂表转换为计算机可理解的结构格式,并为搜索表数据提供较少模式的查询支持

摘要

An information processing system, a computer readable storage medium, and a computer-implemented method, collect tables from a corpus of documents, convert the collected tables to flattened table format and organized to be searchable by schema-less queries. A method collects tables, extracts feature values from collected table data and collected table meta-data for each collected table. A table classifier classifies each collected table as being a type of table. Based on the classifying, the collected table is converted to a flattened table including table values that are the table data and the table meta-data of the collected table. Dependencies of the data values are mapped. The flattened table and mapped dependencies are stored in a triple store searchable by schema-less queries. The table classifier learns and improves its accuracy and reliability. Dependency information is maintained among a plurality of database tables. The dependency information can be updated at variable update frequency.
机译:信息处理系统,计算机可读存储介质和计算机实现的方法从文档的语料中收集表,将收集的表转换为展平的表格式,并被组织为可通过概要查询来搜索。方法收集表,从收集的表数据和每个收集的表中收集的表元数据提取特征值。表分类器将每个收集的表分类为作为表的类型。基于分类,收集的表将被转换为展平表,包括表值,该表值是收集表的表数据和表元数据。映射数据值的依赖关系。展平表和映射依赖项存储在可通过架构查询中搜索的三重存储器中。表分类器学习并提高其准确性和可靠性。依赖性信息在多个数据库表中保持。可以以可变更新频率更新依赖性信息。

著录项

相似文献

  • 专利
  • 外文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号