首页> 外文会议>International conference on web information systems engineering >Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM
【24h】

Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM

机译:通过统计功能和DT-SVM在大规模数据中的语义实体识别

获取原文

摘要

Semantic entities carry the most important semantics of text data. However, traditional approaches such as named entity recognition and new word identification may only detect some specific types of entities. In addition, they generally adopt sequence annotation algorithms such as Hidden Markov Model (HMM) and Conditional Random Field (CRF) which can only utilize limited context information. As a result, they are inefficient on the extraction of semantic entities that were never shown in the training data. In this paper we propose a strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. With the proposed statistical features and novel classification approach, our strategy can detect more semantic entities than traditional approaches such as CRF and Bootstrapping-SVM methods. It is very sensitive to new entities that just appear in fresh data. Our experimental results have shown that the precision, recall rate and F-One rate of our strategy are about 23.6%, 21.5% and 25.8% higher than that of the representative approaches on average.
机译:语义实体携带最重要的文本数据语义。然而,传统的方法,例如命名实体识别和新单词识别可以只检测一些特定类型的实体。另外,它们通常采用序列注释算法,例如隐藏的马尔可夫模型(HMM)和条件随机字段(CRF),其只能利用有限的上下文信息。因此,它们对从未在训练数据中显示的语义实体提取的效率低下。在本文中,我们提出了一种通过集成统计特征,决策树(DT)和支持向量机(SVM)算法来提取未知文本语义实体的策略。通过拟议的统计特征和新颖的分类方法,我们的策略可以检测比传统方法更具语义实体,例如CRF和Bootstraping-SVM方法。它对新的实体非常敏感,刚刚出现在新数据中。我们的实验结果表明,我们的策略的精确度,召回率和F-One率比平均代表方法高出23.6%,21.5%和25.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号