首页> 外文会议>IEEE International Conference on Tools with Artificial Intelligence >Web Page Segmentation and Its Application for Web Information Crawling
【24h】

Web Page Segmentation and Its Application for Web Information Crawling

机译:网页分割及其在网络信息检索中的应用

获取原文

摘要

Web page segmentation aims to break a page into sections that can reveal the information presentation structure and appear coherent to readers. In this paper, we propose a new web page segmentation framework based on the process of analyzing and understanding web page structure. After extracting the segmentation graph structure, we formulate the label assignment task which determines whether each boundary should segment current block or not on a graph as a structured learning problem. Computation of highest scoring label assignment relies on Viterbi algorithm and joint feature function captures the dependency among boundaries. To solve the learning of parameters, we adopt a learning model based on perceptron algorithm. Furthermore, utilizing the previous framework, we propose a web information crawling application framework which integrates web page segmentation and semantic block classification process.
机译:网页细分旨在将页面划分为多个部分,这些部分可以显示信息呈现结构并与读者保持连贯一致。本文在分析和理解网页结构的过程中,提出了一种新的网页细分框架。在提取分割图结构之后,我们制定标签分配任务,该任务确定每个边界是否应将图上的当前块分割为结构化学习问题。得分最高的标签分配的计算依赖于Viterbi算法,联合特征函数可捕获边界之间的依存关系。为了解决参数的学习问题,我们采用了基于感知器算法的学习模型。此外,利用先前的框架,我们提出了一种将网页分割和语义块分类过程相结合的网络信息爬行应用框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号