首页> 外文期刊>Journal of Intelligent Information Systems >Guest Editor's Introduction: Special Issue on Web Content Mining
【24h】

Guest Editor's Introduction: Special Issue on Web Content Mining

机译:客座编辑介绍:Web内容挖掘特刊

获取原文
获取原文并翻译 | 示例
       

摘要

Research in Web mining is moving the World Wide Web toward a more useful environment in which users can quickly and easily find the information they need. Web mining refers to the discovery and analysis of data, documents, and multimedia from the World Wide Web. It includes hyperlink structure, statistical usage, and document content mining. Structure mining is concerned with the discovery of information through the analysis of Web page in and out links. This kind of information can establish the authority of a Web page, and help in page categorization. Usage mining applies data mining techniques to discover patterns in Web logs. This is useful in defining collaboration between users and refining user personal preferences. Content mining extracts concepts from the content of Web pages. Information retrieval techniques are applied to unstructured (text), semi-structured (HTML, XML), and structured (databases) Web pages to extract semantic meaning. This journal issue presents current research in Web content mining of unstructured and semi-structured Web pages. Search engines have the responsibility for extracting semantic meaning from the content of Web pages. So much information is now available that a searcher must depend upon search engines for possible information sources. With Web content as diverse as the authors creating Web pages, the search engine must understand the content of the individual Web pages for a searcher to effectively find information. This is not a trivial task. Authors of unstructured and semi-structured text may not be concerned with the automatic extraction of meaning. Typically text is written for a human audience, which is naturally capable of extracting meaning. To extract semantic meaning requires an understanding of the elements of the Web page and an understanding of the relationships between those elements. The extracted meaning must then be placed in a structure that is easily searchable in response to a query.
机译:Web挖掘的研究正在将万维网移向一个更有用的环境,在该环境中,用户可以快速轻松地找到所需的信息。 Web挖掘是指从万维网中发现和分析数据,文档和多媒体。它包括超链接结构,统计用法和文档内容挖掘。结构挖掘与通过对网页输入和输出链接的分析来发现信息有关。这种信息可以建立网页的权限,并有助于页面分类。使用挖掘使用数据挖掘技术来发现Web日志中的模式。这在定义用户之间的协作和完善用户个人喜好时很有用。内容挖掘从网页的内容中提取概念。信息检索技术应用于非结构化(文本),半结构化(HTML,XML)和结构化(数据库)网页,以提取语义。本期杂志介绍了对非结构化和半结构化Web页面的Web内容挖掘的最新研究。搜索引擎负责从网页内容中提取语义。现在有太多信息可用,搜索者必须依赖搜索引擎来获取可能的信息源。由于Web内容与创建Web页面的作者一样多,搜索引擎必须了解各个Web页面的内容,搜索者才能有效地找到信息。这不是一件简单的任务。非结构化和半结构化文本的作者可能与自动提取含义无关。通常,文本是为人类读者编写的,自然可以提取含义。要提取语义,需要了解网页的元素以及这些元素之间的关系。然后,必须将提取的含义放在响应查询很容易搜索的结构中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号