...
首页> 外文期刊>電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding >Graph grammar based analysis system of complex table form document and its representation based on XML
【24h】

Graph grammar based analysis system of complex table form document and its representation based on XML

机译:基于图文法的复杂表格文档分析系统及其基于XML的表示

获取原文
获取原文并翻译 | 示例
           

摘要

Structure analysis of table form document is important because both printed and electronical documents only provide geometrical layout and lexical information explicitly. To handle these documents automatically, logical structure information is necessary. In this paper, we first propose a general representation of table form document based on XML, which contains both structure and layout information. Next, we present structure analysis system based on graph grammar which represents document structure knowledge. As the relation between adjacent fields in table form documents become two dimensional, two dimensional notation is necessary to denote structural knowledge. Therefore, we adopt two dimensional graph grammar to denote them. By using grammar notation, we can easily modify and keep consistency of it, as the rules are relatively simple. Another advantage of using grammar notation is that, it can be used for generating documents only from logical structure. Experimental results have shown that the system successfully analyzed several kinds of table forms.
机译:表格文件的结构分析很重要,因为印刷文件和电子文件都仅明确提供几何布局和词汇信息。为了自动处理这些文档,逻辑结构信息是必需的。在本文中,我们首先提出了一种基于XML的表格文档的一般表示形式,其中包含结构和布局信息。接下来,我们提出一种基于图文法的结构分析系统,该系统代表了文档结构知识。当表格文档中相邻字段之间的关系变为二维时,必须使用二维符号来表示结构知识。因此,我们采用二维图文法来表示它们。通过使用语法符号,由于规则相对简单,我们可以轻松地对其进行修改并保持其一致性。使用语法符号的另一个优点是,它只能用于从逻辑结构生成文档。实验结果表明,该系统成功分析了几种表格形式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号