pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms

Zhi-Hui Luo; Meng-Wei Shi; Zhuang Yang; Hong-Yu Zhang; Zhen-Xia Chen

首页> 外文期刊>BMC Bioinformatics >pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms

【24h】

pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms

机译：Pymeshsim：用于生物医学的综合性Python包，用于生物医学命名实体识别，标准化和网格术语的比较

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many disease causing genes have been identified through different methods, but there have been no uniform annotations of biomedical named entity (bio-NE) of the disease phenotypes of these genes yet. Furthermore, semantic similarity comparison between two bio-NE annotations has become important for data integration or system genetics analysis. The package pyMeSHSim recognizes bio-NEs by using MetaMap which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to Medical Subject Headings (MeSH), pyMeSHSim is embedded with a house-made dataset containing the main headings (MHs), supplementary concept records (SCRs), and their relations in MeSH. Based on the dataset, pyMeSHSim implemented four information content (IC)-based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms. To evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The pyMeSHSim introduced SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts, which improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of 461 GWAS phenotypes, pyMeSHSim showed recall ?0.94, precision ?0.56, and F1??0.70, demonstrating better performance than the state-of-the-art tools DNorm and TaggerOne in recognizing MeSH terms from short biomedical phrases. The semantic similarity in MeSH terms recognized by pyMeSHSim and the previous manual work was calculated by pyMeSHSim and another semantic analysis tool meshes, respectively. The result indicated that the correlation of semantic similarity analysed by two tools reached as high as 0.89–0.99. The integrative MeSH tool pyMeSHSim embedded with the MeSH MHs and SCRs realized the bio-NE recognition, normalization, and comparison in biomedical text-mining.

机译：通过不同的方法鉴定了许多引起基因的疾病，但尚未鉴定这些基因疾病表型的生物医学命名实体（Bio-Ne）的均匀注释。此外，两个BIO-ke注释之间的语义相似性比较对于数据集成或系统遗传分析变得重要。 Pymeshsim通过使用在自然语言过程中产生统一的医疗语言系统（UMLS）概念的MEAMAP来识别BIO-NE。要将UMLS概念映射到医疗主题标题（网格），Pymeshsim将嵌入包含主标题（MHS），补充概念记录（SCR）的房屋制作的数据集及其在网格中的关系。基于数据集，Pymeshsim实现了四种信息内容（IC）基于算法和一种基于图形的算法，以测量两个网格术语之间的语义相似度。为了评估其性能，我们使用Pymeshsim解析OMIM和GWAS表型。 Pymeshsim介绍了非网状同义UMLS概念的SCR和策划策略，这改善了Pymeshsim在识别OMIM表型中的性能。在461种Gwas表型的策型中，Pymeshsim显示召回>？0.94，精度>？0.56和F1 ??？0.70，展示比最先进的工具DnorM和Taggerone从短片识别来自简短生物医学的网格术语短语。 Pymeshsim识别的网格术语中的语义相似性分别通过Pymeshsim和另一个语义分析刀具网格计算。结果表明，两个工具分析的语义相似性的相关性达到0.89-0.99。嵌入了网格MHS和SCR的集成网格工具Pymeshsim实现了生物医学文本挖掘中的BIO-NE识别，归一化和比较。

著录项

来源
《BMC Bioinformatics》 |2020年第1期|共14页
作者
Zhi-Hui Luo; Meng-Wei Shi; Zhuang Yang; Hong-Yu Zhang; Zhen-Xia Chen;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
MeSHUMLSNamed entity recognitionSemantic similaritySupplementary concept recordsDisease;

机译：Meshumlsnamed实体识别思想概念概念记录Disease;

相似文献

外文文献
中文文献
专利

1. NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition [J] . Richard Tzong-Han Tsai, Cheng-Lung Sung, Hong-Jie Dai, BMC Bioinformatics . 2006,第SUPPLEMENTa5期

机译：NERBio：使用选定的单词连接词，术语归一化和全局模式来改善生物医学命名实体的识别
2. NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition [J] . Richard Tzong-Han Tsai, Cheng-Lung Sung, Hong-Jie Dai, BMC Bioinformatics . 2006,第SUPPLEMENTa5期

机译：NERBio：使用选定的单词连接词，术语归一化和全局模式来改善生物医学命名实体的识别
3. Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes [J] . Huiwei Zhou, Shixian Ning, Zhe Liu, BMC Bioinformatics . 2020,第1期

机译：知识增强的生物医学命名实体识别和归一化：施用蛋白质和基因
4. Biomedical Named Entity Recognition Based on Long and Short Term Memory Model [C] . Youliang Huang, Sajid Ali, Li Wang, International Conference on Mechatronics, Computer and Education Informationization . 2017

机译：基于长期内记忆模型的生物医学命名实体识别
5. Unsupervised Biomedical Named Entity Recognition [D] . Ghiasvand, Omid. 2017

机译：无监督的生物医学命名实体识别
6. pyMeSHSim: an integrative python package for biomedical named entity recognition normalization and comparison of MeSH terms [O] . Zhi-Hui Luo, Meng-Wei Shi, Zhuang Yang, 2020

机译：Pymeshsim：用于生物医学的综合性Python包用于生物医学命名实体识别标准化和网格术语的比较
7. NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition [O] . Richard Tsai, Cheng-Lung Sung, Hong-Jie Dai, 2006

机译：NERBio：使用选定的单词连接词，术语归一化和全局模式来改善生物医学命名实体的识别

pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms

摘要

著录项

相似文献

相关主题

期刊订阅