首页> 外文会议>Proof of Designed Reliability >Bootstrapping Semantic Annotation for Content-Rich HTML Documents

【24h】

Bootstrapping Semantic Annotation for Content-Rich HTML Documents

机译：内容丰富的HTML文档的自举语义注释

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for Semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, especially template-based content-rich documents, containing many different semantic concepts per document. Starting with a (small) seed of hand-labeled instances of semantic concepts in a set of HTML documents we bootstrap an annotation process that automatically identifies unlabeled concept instances present in other documents. The bootstrapping technique exploits the observation that semantically related items in content-rich documents exhibit consistency in presentation style and spatial locality to learn a statistical model for accurately identifying different semantic concepts in HTML documents drawn from a variety ofWeb sources. We also present experimental results on the effectiveness of the technique.

机译：HTML文档中仍在编码大量的语义数据。对此类文档中隐含的语义概念的识别和注释使它们直接适用于语义Web处理。在本文中，我们描述了一种用于注释HTML文档（尤其是基于模板的内容丰富的文档）的高度自动化的技术，每个文档包含许多不同的语义概念。从一组HTML文档中的带有语义标签的手动标记实例实例的一小种子开始，我们启动了一个注释过程，该过程会自动识别其他文档中存在的未标记的实例实例。自举技术利用了以下观察：内容丰富的文档中的语义相关项在表示样式和空间局部性方面表现出一致性，以学习一种统计模型，以准确地识别从各种Web来源中提取的HTML文档中的不同语义概念。我们还介绍了该技术有效性的实验结果。

著录项

来源
《Proof of Designed Reliability》|1994年|p.583-593|共11页
会议地点
作者
Mukherjee S.; Ramakrishnan I.V.; Singh A.;
展开▼
作者单位

State University of New York at Stony Brook;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. SEMANTIC ANNOTATION OF WIKI USING WIKI MARKUP FOR HTML5 MICRODATA [J] . VIGNESH NANDHA KUMAR K R, PANDURANGAN N, VIJAYAKUMAR R, International Journal of Engineering Science and Technology . 2010,第12期

机译：使用HTML5微数据的WIKI标记对WIKI进行语义标注
2. A Semantic Based Approach for Information Retrieval from Html Documents Using Wrapper Induction Technique [J] . A.M.Abirami, A.Askarunisa, T.M.Aishwarya, Computer Science & Information Technology . 2013,第6期

机译：基于语义的Html文档信息检索方法
3. Framework of Semantic Annotation of Arabic Document using Deep Learning [J] . Saeed Albukhitan, Ahmed Alnazer, Tarek Helmy Procedia Computer Science . 2020,第5期

机译：使用深度学习的阿拉伯文文献的语义注释框架
4. Bootstrapping Semantic Annotation for Content-Rich HTML Documents [C] . Mukherjee, S., Ramakrishnan, . 2005

机译：内容丰富的HTML文档的自举语义注释
5. Semantic hierarchies of HTML documents and their applications. [D] . Lim, SeungJin. 2001

机译：HTML文档及其应用程序的语义层次结构。
6. Easing semantically enriched information retrieval—An interactive semi-automatic annotation system for medical documents [O] . Theresia Gschwandtner, Katharina Kaiser, Patrick Martini, -1

机译：在语义上富集的信息检索 - 用于医疗文档的交互式半自动注释系统
7. Bootstrapping Semantic Annotation for Content-Rich HTML Documents [O] . Saikat Mukherjee Ramakrishnan, I. V. Ramakrishnan, Amarjeet Singh 2005

机译：内容丰富的HTmL文档的引导语义标注

Bootstrapping Semantic Annotation for Content-Rich HTML Documents

摘要

著录项

相似文献

相关主题

期刊订阅