首页> 外文会议>Advances in Artificial Intelligence >Post-supervised Template Induction for Dynamic Web Sources
【24h】

Post-supervised Template Induction for Dynamic Web Sources

机译:动态Web源的监督后模板归纳

获取原文

摘要

Dynamic web sites commonly return information in the form of lists and tables. Although hand crafting an extraction program for a specific template is time-consuming but straightforward, it is desirable to automatically generate template extraction programs from examples of lists and tables in html documents. We describe a novel technique, Post-supervised Learning, which exploits unsupervised learning to avoid the need for training examples, while minimally involving the user to achieve high accuracy. We have developed unsupervised algorithms to extract the number of rows and adopted a dynamic programming algorithm for extracting columns. Our system, called TIDE (Template Induction for web Data Extraction), achieves high performance with minimal user input compared to fully supervised techniques.
机译:动态网站通常以列表和表格的形式返回信息。尽管为特定模板手工制作提取程序非常耗时,但是却很直接,但是希望从html文档中的列表和表格示例自动生成模板提取程序。我们描述了一种新技术,即“后监督学习”,它利用无监督学习来避免训练示例的需要,同时最少地让用户参与以实现高精度。我们已经开发了无监督算法来提取行数,并采用了动态编程算法来提取列。与完全监督的技术相比,我们的系统称为TIDE(用于Web数据提取的模板归纳)可以在最少的用户输入的情况下实现高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号