【24h】

A Multi Criteria Function to Concept Extraction in HTML Environment

机译:HTML环境中概念提取的多准则功能

获取原文
获取原文并翻译 | 示例

摘要

The core of Internet and the World Wide Web revolution is the capacity to efficiently share the huge quantity of data. But the rapid and chaotic growth of the Net has extremely complicated the task of share or mining useful information. Each inference process, from Internet information, requires an adequate characterization of the Web pages. The textual part of a page is one of the most important aspects that should be considered to appropriately perform a page characterization. The textual characterization should be made through the extraction of an appropriate set of relevant concepts that represent properly the included text in the Web page. This paper presents a method, essentially based on the extraction of characteristics in the HTML language, to obtain a set of relevant concepts from a Web page. In addition, to prove the validity of the proposed approach a comparative study is shown. It exhibits a higher quality in the representations generated by the proposed method versus a commercial tool.
机译:互联网和万维网革命的核心是有效共享大量数据的能力。但是,网络的迅速混乱发展使共享或挖掘有用信息的任务变得极为复杂。来自Internet信息的每个推理过程都需要对Web页面进行适当的描述。页面的文本部分是应考虑适当执行页面特征的最重要方面之一。文本表征应通过提取一组适当的相关概念来进行,这些概念正确地表示Web页面中包含的文本。本文提出了一种方法,该方法主要基于HTML语言中特征的提取,用于从Web页面获得一组相关概念。另外,为了证明所提出方法的有效性,进行了比较研究。与商业工具相比,在所提出的方法生成的表示中它表现出更高的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号