...
首页> 外文期刊>BMC Bioinformatics >pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms
【24h】

pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms

机译:Pymeshsim:用于生物医学的综合性Python包,用于生物医学命名实体识别,标准化和网格术语的比较

获取原文
           

摘要

Many disease causing genes have been identified through different methods, but there have been no uniform annotations of biomedical named entity (bio-NE) of the disease phenotypes of these genes yet. Furthermore, semantic similarity comparison between two bio-NE annotations has become important for data integration or system genetics analysis. The package pyMeSHSim recognizes bio-NEs by using MetaMap which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to Medical Subject Headings (MeSH), pyMeSHSim is embedded with a house-made dataset containing the main headings (MHs), supplementary concept records (SCRs), and their relations in MeSH. Based on the dataset, pyMeSHSim implemented four information content (IC)-based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms. To evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The pyMeSHSim introduced SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts, which improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of 461 GWAS phenotypes, pyMeSHSim showed recall ?0.94, precision ?0.56, and F1??0.70, demonstrating better performance than the state-of-the-art tools DNorm and TaggerOne in recognizing MeSH terms from short biomedical phrases. The semantic similarity in MeSH terms recognized by pyMeSHSim and the previous manual work was calculated by pyMeSHSim and another semantic analysis tool meshes, respectively. The result indicated that the correlation of semantic similarity analysed by two tools reached as high as 0.89–0.99. The integrative MeSH tool pyMeSHSim embedded with the MeSH MHs and SCRs realized the bio-NE recognition, normalization, and comparison in biomedical text-mining.
机译:通过不同的方法鉴定了许多引起基因的疾病,但尚未鉴定这些基因疾病表型的生物医学命名实体(Bio-Ne)的均匀注释。此外,两个BIO-ke注释之间的语义相似性比较对于数据集成或系统遗传分析变得重要。 Pymeshsim通过使用在自然语言过程中产生统一的医疗语言系统(UMLS)概念的MEAMAP来识别BIO-NE。要将UMLS概念映射到医疗主题标题(网格),Pymeshsim将嵌入包含主标题(MHS),补充概念记录(SCR)的房屋制作的数据集及其在网格中的关系。基于数据集,Pymeshsim实现了四种信息内容(IC)基于算法和一种基于图形的算法,以测量两个网格术语之间的语义相似度。为了评估其性能,我们使用Pymeshsim解析OMIM和GWAS表型。 Pymeshsim介绍了非网状同义UMLS概念的SCR和策划策略,这改善了Pymeshsim在识别OMIM表型中的性能。在461种Gwas表型的策型中,Pymeshsim显示召回>?0.94,精度>?0.56和F1 ???0.70,展示比最先进的工具DnorM和Taggerone从短片识别来自简短生物医学的网格术语短语。 Pymeshsim识别的网格术语中的语义相似性分别通过Pymeshsim和另一个语义分析刀具网格计算。结果表明,两个工具分析的语义相似性的相关性达到0.89-0.99。嵌入了网格MHS和SCR的集成网格工具Pymeshsim实现了生物医学文本挖掘中的BIO-NE识别,归一化和比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号