首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >A Method for Generating Synthetic Electronic Medical Record Text
【24h】

A Method for Generating Synthetic Electronic Medical Record Text

机译:一种生成合成电子医学记录文本的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning (ML) and Natural Language Processing (NLP) have achieved remarkable success in many fields and have brought new opportunities and high expectation in the analyses of medical data, of which the most common type is the massive free-text electronic medical records (EMR). However, the free EMR texts are lacking consistent standards, rich of private information, and limited in availability. Also, it is often hard to have a balanced number of samples for the types of diseases under study. These problems hinder the development of ML and NLP methods for EMR data analysis. To tackle these problems, we developed a model called Medical Text Generative Adversarial Network or mtGAN, to generate synthetic EMR text. It is based on the GAN framework and is trained by the REINFORCE algorithm. It takes disease tags as inputs and generates synthetic texts as EMRs for the corresponding diseases. We evaluate the model from micro-level, macro-level and application-level on a Chinese EMR text dataset. The results show that the method has a good capacity to fit real data and can generate realistic and diverse EMR samples. This provides a novel way to avoid potential leakage of patient privacy while still supply sufficient well-controlled cohort data for developing downstream ML and NLP methods.
机译:机器学习(ML)和自然语言处理(NLP)在许多领域取得了显着的成功,并在医疗数据的分析中带来了新的机会和高期望,其中最常见的类型是大规模的自由文本电子病历( EMR)。但是,免费的EMR文本缺乏一致的标准,丰富的私人信息,并限制可用性。此外,通常难以为正在研究的疾病类型具有平衡数量的样本。这些问题妨碍了ML和NLP方法的开发,用于EMR数据分析。为了解决这些问题,我们开发了一种名为“医学文本生成的对冲网络或MTGAN的型号”,可以生成合成EMR文本。它基于GaN框架,并受加强算法训练。它将疾病标签作为输入作为输入,并生成合成文本作为相应疾病的EMR。我们在中国EMR Text DataSet上评估从微级,宏观级别和应用程序级别的模型。结果表明,该方法具有良好的容纳实际数据,可以产生现实和多样化的EMR样本。这提供了一种新颖的方法来避免患者隐私的潜在泄漏,同时仍提供足够的控制下游ML和NLP方法的足够良好的控制队列数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号