首页> 外文会议>10th annual joint conference on digital libraries 2010 >oreChem ChemXSeer: A Semantic Digital Library for Chemistry
【24h】

oreChem ChemXSeer: A Semantic Digital Library for Chemistry

机译:oreChem ChemXSeer:化学语义数字图书馆

获取原文
获取原文并翻译 | 示例

摘要

Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem Chem_xSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository Chem_xSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) ) standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.
机译:代表非结构化科学出版物的语义肯定会促进访问和搜索,并有望带来新发现。但是,即使对于可能包含丰富语义元数据的科学出版物,当前的数字图书馆通常也仅限于经典的平面结构元数据。另外,如何搜索链接的语义元数据的科学文献是一个开放的问题。我们已经开发了一个语义数字图书馆oreChem Chem_xSeer,该数据库使用语义元数据对化学论文进行建模。它使用“复合对象”存储并索引从化学论文存储库Chem_xSeer中提取的元数据。我们使用Open Archives Initiative对象重用和交换(OAI-ORE)标准来定义一个复合对象,该复合对象聚合与数字对象相关的元数据字段。聚集的元数据可以作为一个单元轻松地进行管理和检索,从而提高了易用性,并且有可能改善共享数据的语义解释。我们展示了如何使用OAI-ORE从文档中提取元数据并进行汇总。 ORE对象是按需创建的;因此,我们能够通过一个查询来搜索一组链接的元数据。我们还能够轻松地对新型的元数据建模。例如,化学家对在文档中查找与实验相关的信息特别感兴趣。我们展示了如何基于具有470个类别的化学本体提取和标记化学论文中包含实验信息的段落,然后将其与其他与文档相关的元数据一起表示在ORE中。我们的算法使用具有特征的分类器,这些特征通常是用来描述实验的单词,例如“仪器”,“准备”等。使用由皇家化学学会数字图书馆的文档组成的数据集,我们可以发现我们提出的方法在从化学文献中提取与实验相关的段落方面表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号