Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

Shuai Zheng  PhD; James J Lu  PhD; Nima Ghasemzadeh  MD; Salim S Hayek  MD; Arshed A Quyyumi  MD; Fusheng Wang  PhD

首页> 外文期刊>JMIR Medical Informatics >Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

【24h】

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

机译：使用在线机器学习和受控词汇表进行异构临床报告的有效信息提取框架

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

机译：背景技术从叙述的医疗报告中提取结构化数据受到异构结构和词汇表的复杂性的挑战，并且通常需要大量的人工工作。传统的基于机器的方法缺乏获取用户反馈以实时改进提取算法的能力。目的我们的目标是提供一个通用的信息提取框架，该框架可支持各种临床报告，并使人与机器之间的动态交互能够产生高度准确的结果。方法已经在在线机器学习的基础上构建了临床信息提取系统IDEAL-X。它一次处理一个文档，并且用户交互被记录为反馈，以实时更新学习模型。更新的模型用于预测后续文档中要提取的值。一旦预测准确性达到用户可接受的阈值，即可对其余文档进行批处理。可定制的受控词汇表可用于支持提取。结果根据报告类型，使用了三个数据集进行实验：100例心脏导管插入术报告，100例冠状动脉造影报告和100例综合报告-每个报告均结合了病史和体检报告，出院摘要，门诊便笺，门诊信和住院药物报告。数据提取通过3种方法执行：在线机器学习，受控词汇表以及这些方法的组合。系统提供的F1分数大于95％的结果。结论IDEAL-X采用了一种独特的基于在线机器学习的方法，并结合了受控词汇来支持临床报告的数据提取。该系统可以快速学习和改进，因此具有很高的适应性。

著录项

来源
《JMIR Medical Informatics》 |2017年第2期|共页
作者
Shuai Zheng PhD; James J Lu PhD; Nima Ghasemzadeh MD; Salim S Hayek MD; Arshed A Quyyumi MD; Fusheng Wang PhD;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Robotic-assisted vs. open radical prostatectomy: A machine learning framework for intelligent analysis of patient-reported outcomes from online cancer support groups [J] . Ranasinghe Weranja, de Silva Daswin, Bandaragoda Tharindu, Urologic oncology . 2018,第12期

机译：机器人辅助与开放性激进前列腺切除术：来自在线癌症支持群体患者报告的结果的智能分析机器学习框架
2. Combining the Benefits of Electronic and Online Dictionaries with CALL Web sites to Produce Effective and Enjoyable Vocabulary and Language Learning Lessons [J] . John Paul Loucky Computer assisted language learning . 2005,第5期

机译：将电子词典和在线词典的优势与CALL网站相结合，以产生有效且令人愉悦的词汇和语言学习课程
3. English vocabulary online teaching based on machine learning recognition and target visual detection [J] . Wu Jierong, Chen Baodi Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta1期

机译：基于机器学习识别和目标视觉检测的英语词汇在线教学
4. An Effective Machine Learning Framework for Data Elements Extraction from the Literature of Anxiety Outcome Measures to Build Systematic Review [C] . Shubhaditya Goswami, Sukanya Pal, Simon Goldsworthy, International conference on business information systems . 2019

机译：一个有效的机器学习框架，用于从焦虑结果文献中提取数据元素，以建立系统的评价
5. Machine Learning Frameworks for Data-Driven Personalized Clinical Decision Support and the Clinical Impact [D] . Lee, Changhee. 2021

机译：用于数据驱动个性化临床决策支持的机器学习框架和临床影响
6. Concepts Issues and Standards. Application of Controlled Medical Vocabularies: A Controlled Vocabulary Framework for Report Generation in Bone-Scintigraphy [O] . Jochen Bernauer 1990

机译：概念问题和标准。受控医学词汇的应用：用于骨骼闪烁显像的报告生成的受控词汇框架
7. A framework to classify heterogeneous Internet traffic with Machine Learning and Deep Learning techniques for satellite communications [O] . Fannia Pacheco, Ernesto Exposito, Mathieu Gineste 2020

机译：将异构互联网流量与机器学习和卫星通信深层学习技术进行分类的框架

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

摘要

著录项

相似文献

相关主题

期刊订阅