A Unified Approach for Extracting Multiple News Attributes from News Pages

机译：从新闻页面提取多个新闻属性的统一方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most previous woks on web news article extraction only focus on its content and title. To meet the growing demand for the various web data integration applications, more useful news attributes, such as publication date, author, etc., need to be extracted structured stored for further processing. In this paper, we study the problem of automatically extracting multiple news attributes from news pages. Unlike the traditional ways(e.g. extracting news attributes separately or generating template-dependent wrappers), we propose an automatic, unified approach to extract them based on the visual features of news attributes which includes independent visual features and dependent visual features. The basic idea of our approach is that, first, the candidates of each news attribute are extracted from the news page based on their independent visual features, and then, the true value of each attribute is identified from the candidates based on dependent visual features(the layout relations among news attributes). The extensive experiments using a large number of news pages show that the proposed approach is highly effective and efficient.

机译：以前有关网络新闻文章提取的大多数工具只关注其内容和标题。为了满足对各种Web数据集成应用程序不断增长的需求，需要提取结构化存储的更多有用的新闻属性（例如出版日期，作者等）以进行进一步处理。在本文中，我们研究了从新闻页面自动提取多个新闻属性的问题。与传统方式（例如分别提取新闻属性或生成依赖模板的包装器）不同，我们提出了一种自动，统一的方法来根据新闻属性的视觉特征来提取它们，包括独立的视觉特征和相关的视觉特征。我们的方法的基本思想是，首先，根据新闻页面的独立视觉特征从新闻页面中提取每个新闻属性的候选者，然后根据依赖的视觉特征从候选者中识别每个属性的真实值（新闻属性之间的布局关系）。使用大量新闻页面进行的广泛实验表明，该方法非常有效。

著录项

来源
《PRICAI 2010: Trends in artificial intelligence》|2010年|p.157-169|共13页
会议地点 Daegu(KR);Daegu(KR)
作者
Wei Liu; Hualiang Yan; Jianwu Yang; Jianguo Xiao;
展开▼
作者单位

Institute of Computer Science Technology, Peking University,Key Laboratory of Computational Linguistics (Peking University), MOE China, 100871;

Institute of Computer Science Technology, Peking University;

Institute of Computer Science Technology, Peking University,Key Laboratory of Computational Linguistics (Peking University), MOE China, 100871;

Institute of Computer Science Technology, Peking University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
web data extraction; news attribute; visual feature.;

机译：网络数据提取；新闻属性；视觉特征。;

相似文献

外文文献
中文文献
专利

1. Extracting multiple news attributes based on visual features [J] . Wei Liu, Hualiang Yan, Jianguo Xiao Journal of Intelligent Information Systems . 2012,第2期

机译：根据视觉特征提取多个新闻属性
2. The PFAD-HEC Model: Impacts of News Attributes and Use Motivations on Selective News Exposure [J] . Mothes Cornelia, Knobloch-Westerwick Silvia, Pearson George D. H. Communication Theory . 2019,第3期

机译：PFAD-HEC模型：新闻属性和使用动机对选择性新闻曝光的影响
3. Defining Obesity: Second-Level Agenda Setting Attributes in Black Newspapers and General Audience Newspapers [J] . Lee Hyunmin, Len-Rios Maria E. Journal of health communication . 2014,第10a12期

机译：定义肥胖：黑人报纸和一般受众报纸的二级议程设置属性
4. A Unified Approach for Extracting Multiple News Attributes from News Pages [C] . Wei Liu, Hualiang Yan, Jianwu Yang, Pacific Rim Conference on Artificial Intelligence . 2010

机译：从新闻页面提取多个新闻属性的统一方法
5. TOWARDS A UNIFIED FIELD THEORY OF TELEVISION NEWS: A COMPARATIVE ANALYSIS OF PUBLIC AND COMMERCIAL TELEVISION NEWS IN THE UNITED STATES AND BRITAIN. (VOLUMES I AND II). [D] . LABASCHIN, SUSAN JANE. 1987

机译：走向电视新闻统一领域理论：对美国和英国的公共和商业电视新闻进行比较分析。（第一和第二卷）。
6. Reliable and valid NEWS for Chinese seniors: measuring perceived neighborhood attributes related to walking [O] . Ester Cerin, Cindy HP Sit, Man-chin Cheung, 2010

机译：针对中国老年人的可靠有效新闻：测量与步行相关的感知邻里属性
7. A Study on Extracting News Contents from News Web Pages [O] . Yong-Gu Lee 2009

机译：从新闻网页提取新闻内容的研究

A Unified Approach for Extracting Multiple News Attributes from News Pages

摘要

著录项

相似文献

相关主题

期刊订阅