首页> 外国专利> Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process

Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process

机译：使用自然语言处理技术确定输入描述符和关联的文本内容与目标字段匹配的可能性，以准备进行提取，转换和加载过程

页面导航

摘要
著录项
相似文献

摘要

Embodiments presented herein disclose techniques for transforming input documents having disparate formats into a normalized format (e.g., Atom, RSS, HTML, customized XML, etc.). According to one embodiment, a plurality of fields is identified in an input document that has a given format. Each field includes a descriptor and text content associated with the descriptor. For each field, semantic properties are evaluated for the descriptor and text content against a plurality of mapping rules to determine whether the field is consistent with one of a plurality of fields of a target format. Each mapping rule specifies characteristics associated with one of the fields in the target format. Once so determined, a mapping from the first field to the second field is defined.

机译：本文提出的实施例公开了用于将具有不同格式的输入文档转换成规范化格式（例如，Atom，RSS，HTML，定制的XML等）的技术。根据一个实施例，在具有给定格式的输入文档中识别多个字段。每个字段都包含一个描述符和与该描述符关联的文本内容。对于每个字段，根据多个映射规则评估描述符和文本内容的语义属性，以确定该字段是否与目标格式的多个字段之一一致。每个映射规则指定与目标格式中的字段之一相关联的特征。一旦确定，就定义了从第一字段到第二字段的映射。

著录项

公开/公告号US10120844B2

专利类型
公开/公告日2018-11-06

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US201414522397
发明设计人 ELIZABETH T. DETTMAN;JOEL C. DUBBELS;ANDREW R. FREED;MICHAEL T. PAYNE;MICHAEL W. SCHROEDER;
展开▼

申请日2014-10-23
分类号G06F17;G06F17/22;G06F17/27;G06F17/30;G06F17/21;
国家 US
入库时间 2022-08-21 13:04:22

相似文献

专利
外文文献