【24h】

StructuralLM: Structural Pre-training for Form Understanding

机译:Structurallm:表单理解的结构性预培训

获取原文

摘要

Large pre-trained language models achieve state-of-the-art results when tine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important for form image understanding. In this paper, we propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. Specifically, we prc-train StructuralLM with two new designs to make the most of the interactions of cell and layout information: 1) each cell as a semantic unit: 2) classification of cell positions. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks, including form understanding (from 78.95 to 85.14), document visual question answering (from 72.59 to 83.94) and document image classification (from 94.43 to 96.08).
机译:当在下游NLP任务上调整时,大型预训练的语言模型实现最先进的结果。 但是,它们几乎完全专注于唯一的文本表示,而忽略对形式图像理解很重要的单元级布局信息。 在本文中,我们提出了一种新的训练方法,STRATURURALLM,共同利用来自扫描的文档的细胞和布局信息。 具体而言,WE PRC-TRANS STRUCTURALLM具有两种新设计,以充分利用细胞和布局信息的互动:1)每个单元作为语义单位:2)单元位置的分类。 训练有素的STRACTURALLM在不同类型的下游任务中实现了新的最先进的结果,包括表单理解(从78.95到85.14),记录视觉问题的回答(从72.59到83.94)和文件图像分类(从94.43到 96.08)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号