首页> 外文期刊>International journal of knowledge engineering and soft data paradigms >Extraction of the contents in the web texts by content-density distribution
【24h】

Extraction of the contents in the web texts by content-density distribution

机译:通过内容密度分布提取Web文本中的内容

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, users use result snippets of a web search engine to grasp the content of web pages, when users search for useful information on the internet. However, they are sometimes unable to notice the content of web pages by reading the result snippets because these snippets are so short that they cannot determine whether the content of each web page is relevant. To address this problem, we propose a method for grasping the content of each web page and extracting a part of the web page concerned to query keywords. This method is more effective than conventional methods based on snippets, because we regard the content as a set of words in the text of a web page, and we generate the content-density distribution by using both the position and the influence of the word. In the result of our experiments, we found that our method is useful for gasping the influence of extracted web text.
机译:近年来,当用户在Internet上搜索有用信息时,用户会使用Web搜索引擎的结果片段来掌握网页的内容。但是,他们有时无法通过阅读结果摘要来注意到网页的内容,因为这些摘要太短以至于他们无法确定每个网页的内容是否相关。为了解决这个问题,我们提出了一种方法,用于掌握每个网页的内容并提取一部分网页以查询关键词。此方法比基于片段的常规方法更有效,因为我们将内容视为网页文本中的一组单词,并且通过使用单词的位置和影响来生成内容密度分布。在我们的实验结果中,我们发现我们的方法对于减轻提取的Web文本的影响很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号