首页> 外文期刊>JMIR Medical Informatics >Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
【24h】

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

机译:使用在线机器学习和受控词汇表进行异构临床报告的有效信息提取框架

获取原文
           

摘要

Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.
机译:背景技术从叙述的医疗报告中提取结构化数据受到异构结构和词汇表的复杂性的挑战,并且通常需要大量的人工工作。传统的基于机器的方法缺乏获取用户反馈以实时改进提取算法的能力。目的我们的目标是提供一个通用的信息提取框架,该框架可支持各种临床报告,并使人与机器之间的动态交互能够产生高度准确的结果。方法已经在在线机器学习的基础上构建了临床信息提取系统IDEAL-X。它一次处理一个文档,并且用户交互被记录为反馈,以实时更新学习模型。更新的模型用于预测后续文档中要提取的值。一旦预测准确性达到用户可接受的阈值,即可对其余文档进行批处理。可定制的受控词汇表可用于支持提取。结果根据报告类型,使用了三个数据集进行实验:100例心脏导管插入术报告,100例冠状动脉造影报告和100例综合报告-每个报告均结合了病史和体检报告,出院摘要,门诊便笺,门诊信和住院药物报告。数据提取通过3种方法执行:在线机器学习,受控词汇表以及这些方法的组合。系统提供的F1分数大于95%的结果。结论IDEAL-X采用了一种独特的基于在线机器学习的方法,并结合了受控词汇来支持临床报告的数据提取。该系统可以快速学习和改进,因此具有很高的适应性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号