首页> 外文学位 >Multi-dimensional fragment classification in biomedical text.
【24h】

Multi-dimensional fragment classification in biomedical text.

机译:生物医学文本中的多维片段分类。

获取原文
获取原文并翻译 | 示例

摘要

Automated text categorization is the task of automatically assigning input text to a set of categories. With the increasing availability of large collections of scientific literature, text categorization plays a critical role in managing information and knowledge, and biomedical text categorization is becoming an important area of research. The work presented here is motivated by the possibility of using automated text categorization to identify and characterize information-bearing text within biomedical literature. Under a recently suggested classification scheme [ShWR06], we examine the feasibility of using machine learning methods to automatically classify biomedical sentence fragments into a set of categories, which were defined to characterize and accommodate certain types of information needs. The categories are grouped into five dimensions: Focus, Polarity, Certainty, Evidence, and Trend. We conduct experiments using a set of manually annotated sentences that were sampled from different sections of biomedical journal articles. A classification model based on Maximum Entropy, designed specifically for this purpose, as well as two other popular algorithms in the area of text categorization, Naive Bayes and Support Vector Machine (SVM), are trained and evaluated on the manually annotated dataset. The preliminary results show that machine learning methods can classify biomedical text along certain dimensions with good accuracy.
机译:自动文本分类是自动将输入文本分配给一组类别的任务。随着大量科学文献的提供,文本分类在管理信息和知识中起着至关重要的作用,而生物医学文本分类正成为重要的研究领域。本文提出的工作是受使用自动文本分类来识别和表征生物医学文献中含信息文本的可能性所激发。在最近提出的分类方案[ShWR06]下,我们研究了使用机器学习方法将生物医学句子片段自动分类为一组类别的可行性,这些类别被定义为表征和适应某些类型的信息需求。这些类别分为五个维度:焦点,极性,确定性,证据和趋势。我们使用一组手动注释的句子进行实验,这些句子是从生物医学期刊文章的不同部分中抽取的。为此专门设计了基于最大熵的分类模型,以及在文本分类领域的两个其他流行算法,即朴素贝叶斯和支持向量机(SVM),并在手动注释的数据集上进行了评估。初步结果表明,机器学习方法可以沿一定维度对生物医学文本进行准确分类。

著录项

  • 作者

    Pan, Fengxia.;

  • 作者单位

    Queen's University (Canada).;

  • 授予单位 Queen's University (Canada).;
  • 学科 Computer Science.
  • 学位 M.Sc.
  • 年度 2006
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号