...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Inference of Regular Expressions for Text Extraction from Examples
【24h】

Inference of Regular Expressions for Text Extraction from Examples

机译:从示例中提取文本的正则表达式推断

获取原文
获取原文并翻译 | 示例
           

摘要

A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work, we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior. We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is available as a web application at .
机译:正则表达式可以解决半结构化或完全非结构化文本中的大量实体提取任务,因为在许多实际情况下,相关实体遵循基本的句法模式,并且该模式可以用正则表达式描述。在这项工作中,我们仅根据所需行为的示例考虑了自动合成此类表达式的长期存在的问题。我们介绍了能够解决实际复杂性提取任务的系统的设计和实现。我们的系统基于一个经过进化的程序,并通过实例精心定制了这些表达式以适应正则表达式生成的特定需求。该过程执行由多目标优化策略驱动的搜索,该策略旨在同时提高候选解决方案的多个性能指标,同时确保对巨大的解决方案空间进行充分的探索。我们在许多具有挑战性的数据集上以实验方式深入评估了我们的提案。所获得的解决方案的准确性似乎足以用于实际应用,并且与先前的建议相比有很大的提高。最重要的是,即使相对于操作员,我们的结果也具有很高的竞争力。原型可作为Web应用程序在处获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号