首页> 外文会议>Annual meeting of the Association for Computational Linguistics;Workshop on biomedical natural language processing >On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data
【24h】

On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

机译:从中国临床记录中学习更好的词嵌入方法:结合域内和域外数据的研究

获取原文

摘要

High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.
机译:高质量的词嵌入对于推进生物医学自然语言处理的应用具有重要意义。近年来,对基于英文医学文本的学习好的嵌入和评估嵌入质量的兴趣激增,但基于中医学文本,尤其是中国临床记录的研究数量有限。在此,我们提出了一种新颖的方法,在中国临床记录有限的情况下,使用域外数据作为辅助手段来提高学习嵌入的质量。此外,基于医学概念相似性进行了嵌入质量评估方法。实验结果表明,选择好的训练样本是必要的,收集正确数量的域外数据并在嵌入质量和训练时间消耗之间进行权衡是更好嵌入的必要因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号