首页> 外文期刊>Journal of Intelligent Information Systems >An Analytical Approach to Concept Extraction in HTML Environments
【24h】

An Analytical Approach to Concept Extraction in HTML Environments

机译:HTML环境中概念提取的分析方法

获取原文
获取原文并翻译 | 示例
       

摘要

The core of the Internet and World Wide Web revolution comes from their capacity to efficiently share the huge quantity of data, but the rapid and chaotic growth of the Net has extremely complicated the task of sharing or mining useful information. Each inference process, from Internet information, requires an adequate characterization of the Web pages. The textual part of a page is one of the most important aspects that should be considered to appropriately perform a page characterization. The textual characterization should be made through the extraction of an appropriate set of relevant concepts that properly represent the text included in the Web page. This paper presents a method to obtain such a set of relevant concepts from a Web page, essentially based on a relevance estimation of each word in the text of a Web page. The word-relevance is defined by a combination of criteria that take into account characteristics of the HTML language as well as more classical measures such as the frequency and the position of a word in a document. Besides, heuristic rules to obtain the most suitable fusion of criteria is achieved via a statistical study. Several experiments are conducted to test the performance of the proposed concept extraction method compared to other approaches including a commercial tool. The results obtained here exhibit a greater success in the concept extraction by the proposed technique against other tested methods.
机译:互联网和万维网革命的核心在于它们有效共享大量数据的能力,但是网络的迅速混乱发展使共享或挖掘有用信息的任务变得极为复杂。来自Internet信息的每个推理过程都需要对Web页面进行适当的描述。页面的文本部分是应考虑适当执行页面特征的最重要方面之一。文本特征应该通过提取一组适当的相关概念来进行,这些概念正确地表示Web页面中包含的文本。本文提出了一种从网页中获取此类相关概念的方法,该方法主要基于网页文本中每个单词的相关性估计。单词相关性是由考虑HTML语言特性以及更经典的度量标准(例如单词在文档中的出现频率和位置)的标准组合定义的。此外,通过统计研究获得了获得最合适标准融合的启发式规则。与包括商业工具的其他方法相比,进行了几次实验以测试所提出的概念提取方法的性能。此处获得的结果与其他经过测试的方法相比,通过所提出的技术在概念提取中表现出更大的成功。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号