首页> 外国专利> Method and system for identifying citations within regulatory content

Method and system for identifying citations within regulatory content

机译:在监管内容内识别引用的方法和系统

摘要

A computer-implemented system and method for identifying citations within regulatory content is disclosed. The method involves receiving image data representing a format and layout of the regulatory content, receiving a language embedding including a plurality of tokens representing words or characters in the regulatory content, and generating a token mapping associating each of the tokens with a portion of the image data. The method also involves receiving the plurality of tokens and token mapping at an input of a citation classifier, the citation classifier having been trained to generate a classification output for each token based on the language embedding and the token mapping, the classification output identifying a plurality of citation tokens within the plurality of tokens. The method further involves processing the plurality of citation tokens to determine a hierarchical relationship between citation tokens, the hierarchical relationship being established based at least in part on the token mapping for the citation tokens.
机译:公开了一种用于在监管内容内识别引文的计算机实现的系统和方法。该方法涉及接收表示调节内容的格式和布局的图像数据,接收包括表示在监管内容中的单词或字符的多个令牌的语言嵌入,并生成将每个令牌与图像的一部分关联的令牌映射数据。该方法还涉及在引文分类器的输入处接收多个令牌和令牌映射,引用分类器已经训练以基于语言嵌入和令牌映射来生成每个令牌的分类输出,分类输出识别多个多个令牌中的引文令牌。该方法还涉及处理多个引文令牌以确定引文令牌之间的分层关系,至少部分地基于引文令牌的令牌映射建立的层级关系。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号