【24h】

Metadata Extraction from Bibliographies Using Bigram HMM

机译:使用Bigram HMM从书目中提取元数据

获取原文
获取原文并翻译 | 示例

摘要

In recent years, we have seen huge volumes of research papers available on the World Wide Web. Metadata provides a good approach for organizing and retrieving these useful resources. Accordingly, automatic extraction of metadata from these papers and their bibliographies is meaningful and has been widely studied. In this paper, we utilize a bigram HMM (Hidden Markov Model) for automatic extraction of metadata (i.e. title, author, date, journal, pages, etc.) from bibliographies with various styles. Different from the traditional HMM, which only uses word frequency, this model also considers both words' bigram sequential relation and position information in text fields. We have evaluated the model on a real corpus downloaded from Web and compared it with other methods. Experiments show that the bigram HMM yields the best result and seem to be the most promising candidate for metadata extraction of bibliographies.
机译:近年来,我们在万维网上看到了大量的研究论文。元数据提供了一种组织和检索这些有用资源的好方法。因此,从这些论文及其书目中自动提取元数据是有意义的,并且已经被广泛研究。在本文中,我们利用bigram HMM(隐马尔可夫模型)从各种样式的书目中自动提取元数据(即标题,作者,日期,期刊,页面等)。与仅使用词频的传统HMM不同,该模型还考虑了词的双字母顺序关系和文本字段中的位置信息。我们已经在从Web下载的真实语料库上评估了该模型,并将其与其他方法进行了比较。实验表明,bigram HMM产生最好的结果,并且似乎是书目元数据提取的最有希望的候选者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号