FiVaTech: Page-Level Web Data Extraction from Template Pages

Kayed Mohammed; Chang Chia-Hui

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >FiVaTech: Page-Level Web Data Extraction from Template Pages

【24h】

FiVaTech: Page-Level Web Data Extraction from Template Pages

机译：FiVaTech：从模板页面提取页面级Web数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web data extraction has been an important part for many Web data analysis applications. In this paper, we formulate the data extraction problem as the decoding process of page generation based on structured data and tree templates. We propose an unsupervised, page-level data extraction approach to deduce the schema and templates for each individual Deep Website, which contains either singleton or multiple data records in one Webpage. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. In experiments, FiVaTech has much higher precision than EXALG and is comparable with other record-level extraction systems like ViPER and MSE. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.

机译：Web数据提取已成为许多Web数据分析应用程序的重要组成部分。在本文中，我们将数据提取问题公式化为基于结构化数据和树模板的页面生成的解码过程。我们提出了一种无监督的页面级数据提取方法，以推导每个单独的深度网站的架构和模板，该网站在一个网页中包含单例或多个数据记录。 FiVaTech应用树匹配，树对齐和挖掘技术来完成具有挑战性的任务。在实验中，FiVaTech的精度比EXALG高得多，并且可以与ViPER和MSE等其他记录级提取系统相媲美。实验显示，在许多最新的Web数据提取工作中使用的测试页，结果令人鼓舞。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2010年第2期|P.249-263|共15页
作者
Kayed Mohammed; Chang Chia-Hui;
展开▼
作者单位

Beni-Suef Universiy, Giza;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semistructured data; Web data extraction; multiple trees merging; wrapper induction.;

机译：半结构化数据;Web数据提取;多棵树合并;包装器归纳。;

相似文献

外文文献
中文文献
专利

1. Unsupervised Structured Data Extraction from Template-generated Web Pages [J] . Tomas Grigalis, Antanas ?enys Journal of Universal Computer Science . 2014,第2期

机译：从模板生成的网页中进行无监督的结构化数据提取
2. Optimized Template Detection and Extraction Algorithm for Web Scraping of Dynamic Web Pages [J] . Xin Luo Journal of wavelet theory and applications . 2017,第2期

机译：动态网页网页抓取的优化模板检测与提取算法
3. Implementation of a weblog extraction system with an improved template extraction technique [J] . E CHANG 中国文献情报（英文刊） . 2013,第001期

机译：利用改进的模板提取技术实现Weblog提取系统
4. FiVaTech: Page-Level Web Data Extraction from Template Pages [C] . Mohammed Kayed, Khaled Shaalan, Chia-Hui Chang, International Conference on Data Mining . 2008

机译：fivatech：从模板页面提取页面级网页数据
5. Post-supervised template induction for information extraction from lists and tables in Web sources. [D] . Shi, Zhongmin. 2002

机译：监督后的模板归纳，用于从Web源中的列表和表中提取信息。
6. Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics [O] . Pieter P. Plehiers, Guy B. Marin, Christian V. Stevens, 2018

机译：自动化反应数据库和反应网络分析：使用化学信息学提取反应模板
7. FiVaTech: Page-level web data extraction from template pages [O] . Mohammed Kayed, Chia-hui Chang 2010

机译：FiVaTech：从模板页面提取页面级Web数据
8. Mapping the footsteps of the green anole: A template for publishing ecological data on the World Wide Web [R] . Carnes, E. T. , Truett, D. F. , Truett, L. F. 1996

机译：绘制绿色anole的足迹：用于在万维网上发布生态数据的模板

FiVaTech: Page-Level Web Data Extraction from Template Pages

摘要

著录项

相似文献

相关主题

期刊订阅