...
首页> 外文期刊>BMC Bioinformatics >The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
【24h】

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

机译:M5nr:一个新颖的非冗余数据库,包含来自多个来源和相关工具的蛋白质序列和注释

获取原文
           

摘要

Background Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. Description We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. Conclusions The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.
机译:序列相似性结果的背景计算正在成为基因组分析中的限制因素。以开放的,可交换的格式编码的序列相似性搜索结果可能会限制对这些数据集进行计算重新分析的需求。共享相似结果的先决条件是共同的参考。描述我们介绍了一种机制,该机制可自动维护一个全面的,非冗余的蛋白质数据库,并按季度创建此资源。此外,我们提供了将相似性搜索转换为许多注释名称空间的工具,例如KEGG或NCBI的GenBank。结论我们提供的数据和工具允许使用一次计算创建多个结果集,从而可以在大型序列数据集的组之间共享计算结果。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号