Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

Yang Feihong; Wang Xuwen; Ma Hetong; Li Jiao

首页> 外文期刊>BMC Medical Informatics and Decision Making >Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

【24h】

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

机译：变形金刚 - Sklearn：用基于变压器的模型的医疗语言理解的工具包

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transformer is an attention-based architecture proven the state-of-the-art model in natural language processing (NLP). To reduce the difficulty of beginning to use transformer-based models in medical language understanding and expand the capability of the scikit-learn toolkit in deep learning, we proposed an easy to learn Python toolkit named transformers-sklearn. By wrapping the interfaces of transformers in only three functions (i.e., fit, score, and predict), transformers-sklearn combines the advantages of the transformers and scikit-learn toolkits. In transformers-sklearn, three Python classes were implemented, namely, BERTologyClassifier for the classification task, BERTologyNERClassifier for the named entity recognition (NER) task, and BERTologyRegressor for the regression task. Each class contains three methods, i.e., fit for fine-tuning transformer-based models with the training dataset, score for evaluating the performance of the fine-tuned model, and predict for predicting the labels of the test dataset. transformers-sklearn is a user-friendly toolkit that (1) Is customizable via a few parameters (e.g., model_name_or_path and model_type), (2) Supports multilingual NLP tasks, and (3) Requires less coding. The input data format is automatically generated by transformers-sklearn with the annotated corpus. Newcomers only need to prepare the dataset. The model framework and training methods are predefined in transformers-sklearn. We collected four open-source medical language datasets, including TrialClassification for Chinese medical trial text multi label classification, BC5CDR for English biomedical text name entity recognition, DiabetesNER for Chinese diabetes entity recognition and BIOSSES for English biomedical sentence similarity estimation. In the four medical NLP tasks, the average code size of our script is 45 lines/task, which is one-sixth the size of transformers’ script. The experimental results show that transformers-sklearn based on pretrained BERT models achieved macro F1 scores of 0.8225, 0.8703 and 0.6908, respectively, on the TrialClassification, BC5CDR and DiabetesNER tasks and a Pearson correlation of 0.8260 on the BIOSSES task, which is consistent with the results of transformers. The proposed toolkit could help newcomers address medical language understanding tasks using the scikit-learn coding style easily. The code and tutorials of transformers-sklearn are available at https://doi.org/10.5281/zenodo.4453803 . In future, more medical language understanding tasks will be supported to improve the applications of transformers_sklearn.

机译：变压器是一种基于关注的体系结构，证明了自然语言处理（NLP）中的最先进模型。为了减少开始使用基于变压器的模型的医学语言的难度，并扩大Scikit-Learn工具包在深度学习中的能力，我们提出了一个易于学习的Python Toolkit命名为Transformers-Sklearn。通过仅在三个功能中包装变压器的接口（即，适合，分数和预测），Transformers-Sklearn结合了变压器和Scikit-Learn工具包的优势。在变换器 - Sklearn中，实现了三种Python类，即分类任务的BertolumaceClassifier，用于命名实体识别（ner）任务的BertologyNerClassifier，以及回归任务的BertologyRegressor。每个类都包含三种方法，即，使用训练数据集进行微调变换器的模型，用于评估微调模型性能的分数，并预测预测测试数据集的标签。变换器 - Sklearn是一个用户友好的工具包，（1）通过一些参数（例如，model_name_or_path和model_type），（2）支持多语言NLP任务，（3）需要更少的编码。输入数据格式由变换器-Sklearn自动生成带注释的语料库。新人只需要准备数据集。模型框架和训练方法是在变换器 - Sklearn中的预定义。我们收集了四种开源医务语言数据集，包括用于中文医学审判文本的Trimclasionification Multi标签分类，BC5CDR用于英语生物医学文本名称实体识别，中国糖尿病实体识别和Biosses的糖尿病患者为英语生物医学句子相似性估算。在四个医疗NLP任务中，我们脚本的平均代码大小为45行/任务，该任务是变形金刚脚本的大小。实验结果表明，基于预磨料BERT模型的变压器 - Sklearn分别在Timclassify，BC5CDR和Diabetesner任务中分别达到0.8225,0.8703和0.6908的宏F1分别，以及0.8260的Pearson相关性，这与之一致变压器的结果。该建议的工具包可以帮助新人地址使用Scikit-Greating编码风格轻松了解医疗语言的理解任务。 Transformers-sklearn的代码和教程可在https://do.org/10.5281/zenodo.4453803中获得。将来，将支持更多医疗语言理解任务，以改善变换器_sklearn的应用。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2021年第2期|共8页
作者
Yang Feihong; Wang Xuwen; Ma Hetong; Li Jiao;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词
TransformerNLPToolkitDeep LearningMedical Language Understanding;

机译：变换录nolegekitdeep学习医疗语言理解;

相似文献

外文文献
中文文献
专利

1. ParsBERT: Transformer-based Model for Persian Language Understanding [J] . Farahani Mehrdad, Gharachorloo Mohammad, Farahani Marzieh, Neural processing letters . 2021,第6期

机译：Parsbert：基于变压器的波斯语理解模型
2. Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech [J] . Alireza Roshanzamir, Hamid Aghajan, Mahdieh Soleymani Baghshah BMC Medical Informatics and Decision Making . 2021,第1期

机译：基于变压器的深度神经网络语言模型，用于针对目标演讲的阿尔茨海默病风险评估
3. Enhancing Transformer-based language models with commonsense representations for knowledge-driven machine comprehension [J] . Li Ronghan, Jiang Zejun, Wang Lifang, Knowledge-Based Systems . 2021,第MAYa23期

机译：通过用于知识驱动的机器理解的致致通知表示，增强基于变压器的语言模型
4. Probing for Multilingual Numerical Understanding in Transformer-Based Language Models [C] . Devin Johnson, Denise Mak, Drew Barker, BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP . 2020

机译：基于变压器的语言模型中的多语言数值识别探讨
5. Conditional Neural Language Models for Multimodal Learning and Natural Language Understanding [D] . Kiros, Jamie Ryan. 2018

机译：用于多模式学习和自然语言理解的条件神经语言模型
6. Transformers-sklearn: a toolkit for medical language understanding with transformer-based models [O] . Feihong Yang, Xuwen Wang, Hetong Ma, 2021

机译：变换器 - Sklearn：用基于变压器的模型的医疗语言理解的工具包
7. Transformer-based deep neural network language models for Alzheimer's disease risk assessment from targeted speech [O] . Alireza Roshanzamir, Hamid Aghajan, Mahdieh Soleymani Baghshah 2021

机译：基于变压器的深神经网络语言模型，用于阿尔茨海默病风险评估来自目标言论

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

摘要

著录项

相似文献

相关主题

期刊订阅