首页> 中文期刊> 《计算机应用》 >基于症状构成成分的上下位关系自动抽取方法




针对症状间上下位关系具有较强结构特性的问题,提出一种基于症状构成成分的上下位关系自动抽取方法.首先,通过观察症状实体,发现症状可以切分为原子症状词、修饰词等八种成分,且成分的构成序列满足一定的规则.然后,利用词法分析系统和条件随机场模型对症状进行切分和成分标注.最后,把症状之间的关系抽取看作一个分类问题,选取症状成分的构成特征、词典特征以及通用特征作为分类算法的特征;基于多种分类算法训练模型,将症状间的关系分为上下位关系和非上下位关系.实验结果表明,当选用支持向量机算法,同时选用三类特征时,取得了最好的效果,准确率、召回率和F1值分别达到了82.68%、82.13%和82.40%.在此基础上,使用所提出的关系抽取算法,抽取了20619条上下位关系,构建了具有上下位关系的症状知识库.%Since the hyponymy between symptoms has strong structural features,an automatic hyponymy extracting method based on symptom components was proposed.Firstly,it was found that symptoms can be divided into eight parts:atomic symptoms,adjunct words,and so on,and the composition of these parts satisfied certain constructed rules.Then,the lexical analysis system and Conditional Random Field (CRF) model were used to segment symptoms and label the parts of speech.Finally,the hyponymy extraction was considered as a classification problem.Symptom constitution features,dictionary features and general features were selected as the features of different classification algorithms to train the models.The relationship between symptoms were divided into hyponymy and non-hyponymy.The experimental results show that when these features are selected simultaneously,precision,recall and F1-measure of Support Vector Machine (SVM) are up to 82.68%,82.13% and 82.40%,respectively.On this basis,by using the above hyponymy extracting algorithm,20619 hyponymies were extracted,and the knowledge base of symptom hyponymy was built.



  • 中文文献
  • 外文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号