首页> 中文期刊> 《情报学报》 >基于混合内容线索特征的语义组块标注研究

基于混合内容线索特征的语义组块标注研究

         

摘要

In the big data era,how to improve the semantic understanding for the academic papers is one of the hot topics in intelligence research.This paper using word frequency statistics and co-word analysis method to analyze the shallow grammatical features,context feature and the core clue word feature of the academic content.We construct a mixed content clue character set,and using conditional random field model to annotate the semantic chunk.The NSF research of carbon nanotubes project data was taken as experiment dataset.Results show that the precision value of B-SUB,I-SUB,B-ACT,I-ACT,B-GOL,I-GOL,B-IMP7 is 84.43%,89.09%,84.38%,89.87%,51.33%,50.37%,37.83%,respectively,compared to the precision of content clue features non-added,the value improved significantly.In particular,B-SUB,I-SUB,B-ACT,I-ACT four tags,the increase in the content of the characteristics of the clues to enhance the accuracy of the value of more than 10%.%大数据时代背景下,实现科技文献深层语义理解逐渐成为情报学研究的热点话题,本文利用词频统计和共词分析方法分析了科技文献内容浅层语法特征、上下文特征和核心线索词特征,构建了混合内容线索特征集合,采用条件随机场模型,对NSF碳纳米管研究领域项目数据进行了语义组块标注实验.实验结果表明,在B-SUB、I-SUB、B-ACT、I-ACT、B-GOL、I-GOL、B-IMP7种标签标注中,增加混合内容线索特征后的精度值分别达到84.43%、89.09%、84.38%、89.87%、51.33%、50.37%、37.83%,与没有增加的标注结果相比精度值有了明显提升.特别是B-SUB、I-SUB、B-ACT、I-ACT四种标签,在增加了内容线索特征后精度值提升了10%以上.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号