首页> 外国专利> Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process

Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process

机译:使用自然语言处理技术确定输入描述符和关联的文本内容与目标字段匹配的可能性,以准备进行提取,转换和加载过程

摘要

Embodiments presented herein disclose techniques for transforming input documents having disparate formats into a normalized format (e.g., Atom, RSS, HTML, customized XML, etc.). According to one embodiment, a plurality of fields is identified in an input document that has a given format. Each field includes a descriptor and text content associated with the descriptor. For each field, semantic properties are evaluated for the descriptor and text content against a plurality of mapping rules to determine whether the field is consistent with one of a plurality of fields of a target format. Each mapping rule specifies characteristics associated with one of the fields in the target format. Once so determined, a mapping from the first field to the second field is defined.
机译:本文提出的实施例公开了用于将具有不同格式的输入文档转换成规范化格式(例如,Atom,RSS,HTML,定制的XML等)的技术。根据一个实施例,在具有给定格式的输入文档中识别多个字段。每个字段都包含一个描述符和与该描述符关联的文本内容。对于每个字段,根据多个映射规则评估描述符和文本内容的语义属性,以确定该字段是否与目标格式的多个字段之一一致。每个映射规则指定与目标格式中的字段之一相关联的特征。一旦确定,就定义了从第一字段到第二字段的映射。

著录项

相似文献

  • 专利
  • 外文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号