首页>
外国专利>
Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process
Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process
Embodiments presented herein disclose techniques for transforming input documents having disparate formats into a normalized format (e.g., Atom, RSS, HTML, customized XML, etc.). According to one embodiment, a plurality of fields is identified in an input document that has a given format. Each field includes a descriptor and text content associated with the descriptor. For each field, semantic properties are evaluated for the descriptor and text content against a plurality of mapping rules to determine whether the field is consistent with one of a plurality of fields of a target format. Each mapping rule specifies characteristics associated with one of the fields in the target format. Once so determined, a mapping from the first field to the second field is defined.
展开▼