A clinical text classification paradigm using weak supervision and deep representation

Yanshan Wang; Sunghwan Sohn; Sijia Liu; Feichen Shen; Liwei Wang; Elizabeth J. Atkinson; Shreyasee Amin; Hongfang Liu

首页> 外文期刊>BMC Medical Informatics and Decision Making >A clinical text classification paradigm using weak supervision and deep representation

【24h】

A clinical text classification paradigm using weak supervision and deep representation

机译：使用弱监督和深度表示的临床文本分类范例

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in clinical narratives. Machine learning approaches have been shown to be effective for clinical text classification tasks. However, a successful machine learning model usually requires extensive human efforts to create labeled training data and conduct feature engineering. In this study, we propose a clinical text classification paradigm using weak supervision and deep representation to reduce these?human efforts. We develop a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models. Since machine learning is trained on labels generated by the automatic NLP?algorithm, this training process is called weak supervision. We evaluat the paradigm effectiveness on two institutional case studies at Mayo Clinic: smoking status classification and proximal femur (hip) fracture classification, and one case study using a public dataset: the i2b2 2006 smoking status classification shared task. We test four widely used machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron Neural Networks (MLPNN), and Convolutional Neural Networks (CNN), using this paradigm. Precision, recall, and F1 score are used as metrics to evaluate performance. CNN achieves the best performance in both institutional tasks (F1 score: 0.92 for Mayo Clinic smoking status classification and 0.97 for fracture classification). We show that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms. We also observe two drawbacks of the proposed paradigm that CNN is more sensitive to the size of training data, and that the proposed paradigm might not be effective for complex multiclass classification tasks. The proposed clinical text classification paradigm could reduce human efforts of labeled training data creation and feature engineering for applying machine learning to clinical text classification by leveraging weak supervision and deep representation. The experimental experiments have validated the effectiveness of paradigm by two institutional and one shared clinical text classification tasks.

机译：自动临床文本分类是一种自然语言处理（NLP）技术，可以解锁嵌入在临床叙事中的信息。机器学习方法已被证明对临床文本分类任务有效。但是，成功的机器学习模型通常需要大量的人力来创建标记的训练数据并进行特征工程。在这项研究中，我们提出了一种使用弱监督和深度表示的临床文本分类范例，以减少这些人为的努力。我们开发了一种基于规则的NLP算法，以自动为训练数据生成标签，然后将预训练的词嵌入用作训练机器学习模型的深度表示功能。由于机器学习是根据自动NLP算法生成的标签进行训练的，因此该训练过程称为弱监督。我们在Mayo诊所的两个机构案例研究中评估范式的有效性：吸烟状况分类和股骨近端（髋部）骨折分类，以及一个使用公共数据集的案例研究：i2b2 2006吸烟状况分类共享任务。我们使用此范例测试了四个广泛使用的机器学习模型，即支持向量机（SVM），随机森林（RF），多层感知器神经网络（MLPNN）和卷积神经网络（CNN）。精度，召回率和F1分数用作评估性能的指标。 CNN在两项机构任务中均表现最佳（F1评分：梅奥诊所吸烟状况分类为0.92，骨折分类为0.97）。我们显示，词嵌入在范例中显着胜过tf-idf和主题建模功能，并且与基于规则的NLP算法相比，CNN从弱监督中捕获了其他模式。我们还观察到拟议范式的两个缺点，即CNN对训练数据的大小更敏感，并且拟议范式对于复杂的多类分类任务可能无效。所提出的临床文本分类范例可以通过弱监督和深度表示来减少标记训练数据创建和特征工程的人工操作，从而将机器学习应用于临床文本分类。实验实验通过两个机构和一个共享的临床文本分类任务验证了范例的有效性。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2019年第1期|共13页
作者
Yanshan Wang; Sunghwan Sohn; Sijia Liu; Feichen Shen; Liwei Wang; Elizabeth J. Atkinson; Shreyasee Amin; Hongfang Liu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词
Clinical text classificationNatural language processingElectronic health recordsMachine learningWeak supervision;

机译：临床文本分类自然语言处理电子病历机器学习弱监督;

相似文献

外文文献
中文文献
专利

1. Deep text classification of Instagram data using word embeddings and weak supervision [J] . Hammar Kim, Jaradat Shatha, Dokoohaki Nima, Web Intelligence . 2020,第1期

机译：使用Word Embeddings和弱监管的Instagram数据的深文本分类
2. Combining supervised term-weighting metrics for SVM text classification with extended term representation [J] . Haddoud Mounia, Mokhtari Aicha, Lecroq Thierry, Knowledge and information systems . 2016,第3期

机译：将用于SVM文本分类的监督术语权重度量与扩展术语表示相结合
3. Improving Recognition of Complex Aerial Scenes Using a Deep Weakly Supervised Learning Paradigm [J] . Praveer Singh, Nikos Komodakis IEEE Geoscience and Remote Sensing Letters . 2018,第12期

机译：使用深度弱监督学习范例改善复杂空中场景的识别
4. Deep Text Prior: Weakly Supervised Learning for Assertion Classification [C] . Vadim Liventsev, Irina Fedulova, Dmitry Dylov International conference on artificial neural networks . 2019

机译：深度文本优先：断言分类的弱监督学习
5. Contextualized, Metadata-Empowered, Coarse-To-Fine Weakly-Supervised Text Classification [D] . Mekala, Dheeraj. 2021

机译：Contextualized，MetAdata-Empowered，粗略弱弱监督的文本分类
6. A clinical text classification paradigm using weak supervision and deep representation [O] . Yanshan Wang, Sunghwan Sohn, Sijia Liu, 2019

机译：使用弱监督和深度表示的临床文本分类范例
7. Improving Recognition of Complex Aerial Scenes Using a Deep Weakly Supervised Learning Paradigm [O] . Praveer Singh, Nikos Komodakis 2018

机译：利用深弱监督的学习范式改善对复杂的空域场景的认可

A clinical text classification paradigm using weak supervision and deep representation

摘要

著录项

相似文献

相关主题

期刊订阅