首页> 美国卫生研究院文献>other >A Part-Of-Speech Term Weighting Scheme for Biomedical Information Retrieval

【2h】

A Part-Of-Speech Term Weighting Scheme for Biomedical Information Retrieval

机译：生物医学信息检索的词性项加权算法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users’ search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) or searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks.

机译：在数字化时代，信息检索（IR）可以根据用户的搜索查询从大型馆藏中检索文档并对其进行排名，已广泛应用于生物医学领域。一些IR使用案例是使用电子健康记录（EHR）建立患者队列或搜索感兴趣主题的文献。同时，已经开发了诸如标记化或词性（POS）标记之类的自然语言处理（NLP）来处理临床文档或生物医学文献。我们假设可以将NLP合并到IR中以增强常规IR模型。在这项研究中，我们提出了两种支持NLP的IR模型POS-BoW和POS-MRF，它们分别将基于POS的自动术语加权方案合并到词袋（BoW）和Markov随机场（MRF）IR模型中。在提出的模型中，通过使用循环坐标法迭代计算基于POS的术语权重，其中沿每个坐标应用黄金分割线搜索算法以优化由平均平均精度（MAP）定义的目标函数。在经验实验中，我们使用了Text Retrieval Conference（TREC）2011和2012中的Medical Records记录中的数据集以及TREC 2004中的Genomics记录。对TREC 2011和2012 Medical Records记录的评估表明，对于POS- BoW模型，与BoW模型相比，IR评估指标，MAP，bpref和P @ 10的平均改善率分别为10.88％，4.54％和3.82％；对于POS-MRF模型，与MRF模型相比，这些比率分别为13.59％，8.20％和8.78％。此外，我们通过实验验证了所提出的加权方法优于简单的启发式和基于频率的加权方法，并验证了我们的POS类别选择。使用在该实验中计算出的最佳权重，我们在TREC 2004 Genomics轨道上测试了建议的模型，分别获得了POS-BoW和POS-MRF的平均8.63％和10.04％的改善率。这些重大改进证明了将POS标签用于生物医学IR任务的有效性。

著录项

期刊名称 other
作者
Yanshan Wang; Stephen Wu; Dingcheng Li; Saeed Mehrabi; Hongfang Liu;
展开▼
作者单位

展开▼
年(卷),期 -1(63),-1
年度 -1
页码 379–389
总页数 31
原文格式 PDF
正文语种
中图分类
关键词
biomedical information retrieval natural language processing part-of-speech bag-of-word markov random field;

机译：生物医学信息检索;自然语言处理;词性;词袋;马尔可夫随机域;

相似文献

外文文献
中文文献
专利

1. Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval [J] . Hui Zhang, Deqing Wang, Wenjun Wu, Enterprise information systems . 2012,第4期

机译：术语频率-文档频率的功能：企业信息检索的新术语加权方案
2. A novel term weighting scheme based on discrimination power obtained from past retrieval results [J] . Sa-kwang Song, Sung Hyon Myaeng Information Processing & Management . 2012,第5期

机译：一种基于过去检索结果判别力的术语加权新方案
3. Term Weighting Schemes Experiment Based on SVD for Malay Text Retrieval [J] . Nordianah Ab Samat, Masrah Azrifah Azmi Murad, Muhamad Taufik Abdullah, International journal of computer science and network security . 2008,第10期

机译：基于SVD的马来文本检索词加权方案实验
4. Improving Information Retrieval Through a Global Term Weighting Scheme [C] . Daniel Cuellar, Elva Diaz, Eunice Ponce-de-Leon-Senti Mexican conference on pattern recognition . 2015

机译：通过全球术语加权方案改善信息检索
5. A single document-based term weighting scheme by supporting terms. [D] . Cheng, Juan. 2006

机译：通过支持术语的单个基于文档的术语加权方案。
6. A New Biomedical Passage Retrieval Framework for Laboratory Medicine: Leveraging Domain-specific Ontology Multilevel PRF and Negation Differential Weighting [O] . Keejun Han, Hyoeun Shim, Mun Y. Yi 2018

机译：实验室医学的新生物医学通道检索框架：利用领域特定的本体多级PRF和否定差分加权
7. A Part-Of-Speech term weighting scheme for biomedical information retrieval [O] . Yanshan Wang, Stephen Wu, Dingcheng Li, 2016

机译：用于生物医学信息检索的言语术语权重方案
8. Improve Precategorized Collection Retrieval by Using Supervised Term Weighting Schemes. [R] . Zhao, Y., Karypis, G. 2001

机译：利用监督期限加权方案改进预分类收集检索。

A Part-Of-Speech Term Weighting Scheme for Biomedical Information Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅