首页> 外文期刊>Journal of Bioinformatics and Computational Biology >THE VALUE OF AN IN-DOMAIN LEXICON IN GENOMICS QA
【24h】

THE VALUE OF AN IN-DOMAIN LEXICON IN GENOMICS QA

机译:域内词汇在基因组学质量保证中的价值

获取原文
获取原文并翻译 | 示例
           

摘要

This paper demonstrates that a large-scale lexicon tailored for the biology domain isneffective in improving question analysis for genomics Question Answering (QA). Wenuse the TREC Genomics Track data to evaluate the performance of different questionnanalysis methods. It is hard to process textual information in biology, especially innmolecular biology, due to a huge number of technical terms which rarely appear in generalnEnglish documents and dictionaries. To support biological Text Mining, we havendeveloped a domain-specific resource, the BioLexicon. Started in 2006 from scratch, thisnlexicon currently includes more than four million biomedical terms consisting of newlyncurated terms and terms collected from existing biomedical databases. While conventionalngenomics QA systems provide query expansion based on thesauri and dictionaries,nit is not clear to what extent a biology-oriented lexical resource is effective for questionnpre-processing for genomics QA. Experiments on the genomics QA data set show thatnquestion analysis using the BioLexicon performs slightly better than that using n-gramsnand the UMLS Specialist Lexicon.
机译:本文证明了为生物学领域量身定制的大型词典在改善基因组问题问答(QA)的问题分析方面是无效的。 Wenuse TREC基因组跟踪数据可评估不同问题分析方法的性能。由于大量的技术术语很少出现在一般英语文档和词典中,因此很难在生物学(尤其是分子生物学)中处理文本信息。为了支持生物文本挖掘,我们已经开发了特定领域的资源BioLexicon。从2006年从零开始,这个词库目前包括超过400万个生物医学术语,其中包括新创建的术语和从现有生物医学数据库中收集的术语。尽管传统的基因组学QA系统提供了基于叙词表和词典的查询扩展,但尚不清楚面向生物学的词汇资源在多大程度上对基因组学QA进行问题预处理有效。对基因组学QA数据集的实验表明,使用BioLexicon进行提问分析的性能略好于使用n-gramsn和UMLS Specialist Lexicon进行的提问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号