首页> 外文学位 >Ontology generation, information harvesting and semantic annotation for machine-generated Web pages.
【24h】

Ontology generation, information harvesting and semantic annotation for machine-generated Web pages.

机译:机器生成的网页的本体生成,信息收集和语义注释。

获取原文
获取原文并翻译 | 示例

摘要

The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. This frustrating problem motivates an approach to turn the web of pages into a web of knowledge, so that web users can query the information of interest directly. This dissertation provides a step in this direction and a way to partially overcome the challenges. Specifically, this dissertation shows how to turn machine-generated web pages like those on the hidden web into semantic web pages for the web of knowledge. We design and develop three systems to address the challenge of turning the web pages into web-of-knowledge pages: TISP (Table Interpretation for Sibling Pages), TISP++, and FOCIH (Form-based Ontology Creation and Information Harvesting). TISP can automatically interpret hidden-web tables. Given interpreted tables, TISP++ can generate ontologies and semantically annotate the information present in the interpreted tables automatically. This way, we can offer a way to make the hidden information publicly accessible. We also provide users with a way where they can generate personalized ontologies. FOCIH provides users with an interface with which they can provide their own view by creating a form that specifies the information they want. Based on the form, FOCIH can generate user-specific ontologies, and based on patterns in hidden-web pages, FOCIH can harvest information and annotate these pages with respect to the generated ontology. Users can directly query on the annotated information. With these contributions, this dissertation serves as a foundational pillar for turning the current web of pages into a web of knowledge.
机译:当前的万维网是页面的网络。用户必须猜测可能通过搜索引擎引导到包含感兴趣信息的页面的可能关键字,并浏览成百上千的返回页面,以获得他们想要的内容。这个令人沮丧的问题激发了一种方法,可以将页面网络变成知识网络,以便网络用户可以直接查询感兴趣的信息。本文提供了朝这个方向迈出的一步,以及部分克服挑战的方法。具体来说,本论文展示了如何将机器生成的网页(如隐藏的网页上的网页)转换为用于知识网的语义网页。我们设计和开发了三种系统来应对将网页转换为知识网页的挑战:TISP(兄弟页面的表解释),TISP ++和FOCIH(基于表单的本体创建和信息收集)。 TISP可以自动解释隐藏的Web表。对于给定的解释表,TISP ++可以生成本体并在语义上自动注释解释表中的信息。这样,我们可以提供一种使隐藏信息可公开访问的方法。我们还为用户提供一种生成个性化本体的方法。 FOCIH为用户提供了一个界面,他们可以通过创建指定所需信息的表单来提供自己的视图。基于该表单,FOCIH可以生成用户特定的本体,并且基于隐藏网页中的模式,FOCIH可以针对所生成的本体收集信息并为这些页面添加注释。用户可以直接查询带注释的信息。有了这些贡献,本论文成为了将当前的网页变成知识网络的基础。

著录项

  • 作者

    Tao, Cui.;

  • 作者单位

    Brigham Young University.;

  • 授予单位 Brigham Young University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 108 p.
  • 总页数 108
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号