首页> 外文学位 >Automatic Semantic Annotation of Web Documents.
【24h】

Automatic Semantic Annotation of Web Documents.

机译:Web文档的自动语义注释。

获取原文
获取原文并翻译 | 示例

摘要

Ontologies are the most important construct of the Semantic Web. From the first attempt of using simplified RDF syntax to the advanced features of the OWL languages, ontologies have arisen as the most viable technology offering solutions to integrate various Web resources into a more intelligent Web. The work presented in this thesis is a contribution to the new generation of the Web, which should be readable and interpreted not only by humans but also by machines, such as software agents. In order to allow ontologies to achieve their role of "animating" the traditional Web into this next generation Web, it is essential to find an efficient way to map all existent Web resources onto their corresponding ontology classes. In this thesis, we propose an approach for automatic semantic annotation of Web documents which is an effective way to make the Semantic Web a reality. Such an integrated Web would greatly improve the accuracy of search engines, bring a new generation of intelligent Web services, push the limits of multi-agent technologies and improve many other areas of human activity that we cannot even imagine today. Considering the size and the speed of the growing Web, it is clear that this task cannot be achieved manually. Semi-automatic and automatic annotations of Web documents using statistical text classification methods seem to be the most promising solution. This work is focused on an approach based on Naive Bayes text classification adapted to some characteristics that are particular to Web documents. A complete software solution is developed to allow testing feasibility of such an approach. Furthermore, different variations of the text classification algorithms are tested and analysed in order to identify the most optimal approach to semantically annotate Web documents. Notably, the usage of Web documents hierarchy is explored as an option to improve the accuracy of semi-automatic and automatic annotations of Web documents. The results of each tested method are presented and commented. Finally, some aspects that could possibly be improved or approached in a different way are identified for future work.;
机译:本体是语义网最重要的结构。从首次尝试使用简化的RDF语法到OWL语言的高级功能,本体已成为最可行的技术,提供了将各种Web资源集成到更智能的Web中的解决方案。本文提出的工作是对新一代Web的贡献,它不仅应由人类而且应由机器(如软件代理)读取和解释。为了使本体实现将传统Web“动画化”到下一代Web中的作用,必须找到一种有效的方法来将所有现有的Web资源映射到其对应的本体类上。本文提出了一种Web文档自动语义标注的方法,是一种使语义Web成为现实的有效途径。这样一个集成的Web将大大提高搜索引擎的准确性,带来新一代的智能Web服务,突破多代理技术的局限性,并改善人类活动的许多其他领域,而这些都是我们今天甚至无法想象的。考虑到不断增长的Web的规模和速度,很明显,此任务无法手动完成。使用统计文本分类方法的Web文档半自动和自动注释似乎是最有前途的解决方案。这项工作的重点是基于Naive Bayes文本分类的方法,该方法适用于Web文档特有的某些特征。开发了完整的软件解决方案以允许测试这种方法的可行性。此外,测试并分析了文本分类算法的不同变体,以便确定用于语义注释Web文档的最佳方法。值得注意的是,将Web文档层次结构的使用作为一种选择来提高Web文档的半自动和自动注释的准确性。每种测试方法的结果均会显示并发表评论。最后,确定了可以以其他方式改进或处理的某些方面,以供将来的工作使用。

著录项

  • 作者

    Vujicic, Milos.;

  • 作者单位

    Concordia University (Canada).;

  • 授予单位 Concordia University (Canada).;
  • 学科 Engineering Computer.
  • 学位 M.A.Sc.
  • 年度 2009
  • 页码 82 p.
  • 总页数 82
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号