On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

机译：从中国临床记录中学习更好的词嵌入方法：结合域内和域外数据的研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.

机译：高质量的词嵌入对于推进生物医学自然语言处理的应用具有重要意义。近年来，对基于英文医学文本的学习好的嵌入和评估嵌入质量的兴趣激增，但基于中医学文本，尤其是中国临床记录的研究数量有限。在此，我们提出了一种新颖的方法，在中国临床记录有限的情况下，使用域外数据作为辅助手段来提高学习嵌入的质量。此外，基于医学概念相似性进行了嵌入质量评估方法。实验结果表明，选择好的训练样本是必要的，收集正确数量的域外数据并在嵌入质量和训练时间消耗之间进行权衡是更好嵌入的必要因素。

著录项

来源
《Annual meeting of the Association for Computational Linguistics;Workshop on biomedical natural language processing》|2018年|177-182|共6页
会议地点
作者
Yaqiang Wang; Yunhui Chen; Hongping Shu; Yongguang Jiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Distracting users as per their knowledge: Combining linked open data and word embeddings to enhance history learning [J] . Blanco-Fernandez Yolanda, Gil-Solla Alberto, Pazos-Arias Jose J., Expert Systems with Application . 2020,第Apra期

机译：根据他们的知识分散用户的注意力：将链接的开放数据和单词嵌入相结合以增强历史学习
2. Clinical Information Extraction Using Small Data: An Active Learning Approach Based on Sequence Representations and Word Embeddings [J] . Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Journal of the American Society for Information Science and Technology . 2017,第11期

机译：利用小数据提取临床信息：一种基于序列表示和词嵌入的主动学习方法
3. Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study [J] . Haotian Lin, Erping Long, Xiaohu Ding, PLoS Medicine . 2018,第11期

机译：使用电子病历的折射数据预测中国学龄儿童的近视发展：一项回顾性多中心机器学习研究
4. On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data [C] . Yaqiang Wang, Yunhui Chen, Hongping Shu, Annual meeting of the Association for Computational Linguistics . 2018

机译：论中国临床记录中的更好的单词嵌入：与域和外域数据相结合的研究
5. Combined Word and Network Embeddings: An Analysis Framework of User Opinions on Social Media [D] . Singh, Tannu Dharmendra. 2020

机译：组合的Word和网络嵌入式：社交媒体上的用户意见分析框架
6. Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective multicentre machine learning study [O] . Haotian Lin, Erping Long, Xiaohu Ding, 2018

机译：使用电子病历的折射数据预测中国学龄儿童的近视发展：一项回顾性多中心机器学习研究
7. Barriers and facilitators to data quality of electronic health records used for clinical research in China: a qualitative study [O] . Kaiwen Ni, Hongling Chu, Lin Zeng, 2019

机译：用于中国临床研究的电子健康记录数据质量的障碍和促进者：定性研究

On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

摘要

著录项

相似文献

相关主题

期刊订阅