首页> 外文会议>Intelligence and Security Informatics, 2009. ISI '09 >ContentEx: A framework for automatic content extraction programs
【24h】

ContentEx: A framework for automatic content extraction programs

机译:ContentEx:自动内容提取程序的框架

获取原文

摘要

Web pages are often decorated with extraneous information (such as navigation bars, branding banners, JavaScript and advertisements). This kind of information may distract users from actual content they are really interested in and may reduce effects of many advanced Web applications. Automatic content extraction has many applications ranging from providing data for Web mining to realizing better accessing the Web over mobile devices. In this paper, we propose ContentEx, a framework for automatic content extraction programs, which we use to organize codes of automatic content extraction programs and to facilitate the development of related solutions. We also introduce how we extract content from forum pages in this framework to fulfill the requirement from our actual application.
机译:网页通常装饰有无关的信息(例如导航栏,品牌横幅,JavaScript和广告)。这种信息可能会使用户从他们真正感兴趣的实际内容中分散注意力,并可能降低许多高级Web应用程序的影响。自动内容提取具有许多应用程序,从提供用于Web挖掘的数据到实现通过移动设备更好地访问Web都有。在本文中,我们提出了ContentEx,一个自动内容提取程序的框架,该框架用于组织自动内容提取程序的代码并促进相关解决方案的开发。我们还将介绍如何在此框架中从论坛页面提取内容,以满足实际应用程序的要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号