PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

机译：PubMed 200k RCT：医学文摘中顺序句子分类的数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present PubMed 200k RCT1, a new dataset based on PubMed for sequential sentence classification. The dataset consists of approximately 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract using one of the following classes: background, objective, method, result, or conclusion. The purpose of releasing this dataset is twofold. First, the majority of datasets for sequential short-text classification (i.e., classification of short texts that appear in sequences) are small: we hope that releasing a new large dataset will help develop more accurate algorithms for this task. Second, from an application perspective, researchers need better tools to efficiently skim through the literature. Automatically classifying each sentence in an abstract would help researchers read abstracts more efficiently, especially in fields where abstracts may be long, such as the medical field.

机译：我们提出了PubMed 200k Rct1，这是一个基于Pubmed的新数据集进行顺序句子分类。该数据集由大约200,000个随机对照试验组成，总计230万张。每个摘要的每个句子都使用以下课程之一标记在摘要中的作用：背景，目标，方法，结果或结论。释放此数据集的目的是双重的。首先，用于顺序短文本分类的大多数数据集（即，序列中出现的短文本的分类）很小：我们希望释放新的大型数据集将有助于为此任务开发更准确的算法。其次，从申请角度来看，研究人员需要更好的工具，以有效地浏览文献。自动在摘要中对每个句子进行分类将帮助研究人员更有效地阅读摘要，特别是在摘要可能长的领域，例如医疗领域。

著录项

来源
《International joint conference on natural language processing》|2017年|308-313|共6页
会议地点
作者
Franck Dernoncourt; Ji Young Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A deep learning classifier for sentence classification in biomedical and computer science abstracts [J] . Neural computing & applications . 2020,第11期

机译：生物医学与计算机科学句子句子分类的深层学习分类器
2. Semantic Analysis of Macedonian Medical Abstracts Indexed in the PubMed Database using GoPubMed [J] . Spiroski, Mirko Macedonian Journal of Medical Sciences . 2013,第2期

机译：使用GoPubMed在PubMed数据库中索引的马其顿医学摘要的语义分析
3. Professional medical writing support (PMWS) and the reporting quality of randomized controlled trial (RCT) abstracts among high-impact general medical journals [J] . Mills Ira, Sheard Catherine, Hays Meredith, Current medical research and opinion . 2017,第Suppla1期

机译：专业的医疗写作支持（PMWS）和随机对照试验的报告质量（RCT）摘要在高影响一般医学期刊中的摘要
4. PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts [C] . Franck Dernoncourt, Ji Young Lee International joint conference on natural language processing . 2017

机译：PubMed 200k RCT：医疗摘要中的顺序句子分类数据集
5. Classification and sequential pattern mining from uncertain datasets. [D] . Hooshsadat, Metanat. 2011

机译：来自不确定数据集的分类和顺序模式挖掘。
6. PubstractHelper: A Web-based Text-Mining Tool for Marking Sentences in Abstracts from PubMed Using Multiple User-Defined Keywords [O] . Chou-Cheng Chen, Chung-Liang Ho 2014

机译：PubstractHelper：基于Web的文本挖掘工具用于使用多个用户定义的关键字标记PubMed中的摘要中的句子
7. Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts [O] . Di Jin, Peter Szolovits 2018

机译：医学科学摘要中顺列句子分类的分层神经网络

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

摘要

著录项

相似文献

相关主题

期刊订阅