首页> 外文期刊>BMC Bioinformatics >EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
【24h】

EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

机译:EST2uni:一个开放的并行工具,用于自动化EST分析和数据库创建,具有数据挖掘Web界面和微阵列表达数据集成

获取原文
       

摘要

Background Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. Results We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. Conclusion The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at http://bioinf.comav.upv.es/est2uni . This site also provides detailed instructions for installation and configuration of the software package. The code is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community.
机译:背景表达序列标签(EST)集合由大量的单遍,冗余,部分序列组成,需要对其进行处理,聚类和注释,以去除低质量和向量区域,消除冗余和测序错误,以及提供生物学相关信息。为了提供在EST的分析中执行不同步骤的合适方法,必须开发适合特定EST项目的本地需求的灵活的计算管道。此外,EST集合必须通过用户友好的界面存储在高度结构化的关系数据库中,以供研究人员使用,该界面允许高效而复杂的数据挖掘,从而为充分利用它们提供最大的功能。结果我们创建了EST2uni,这是一个集成的,高度可配置的EST分析管道和数据挖掘软件包,可自动进行EST集合的预处理,聚类,注释,数据库创建和数据挖掘。该管道使用标准的EST分析工具,并且该软件具有模块化设计,以便于添加新的分析方法及其配置。当前执行的分析包括功能和结构注释,SNP和微卫星发现,整合先前已知的遗传标记数据和基因表达结果以及协助cDNA微阵列设计。它可以在PC群集中并行运行,以减少分析所需的时间。它还创建一个链接到数据库的网站,以显示馆藏统计信息,并具有复杂的查询功能以及用于数据挖掘和检索的工具。结论此处介绍的软件包为EST集合的管理提供了一个有效而完整的生物信息学工具,该工具非常容易适应不同EST项目的本地需求。该代码可在GPL许可下免费获得,并可从http://bioinf.comav.upv.es/est2uni获得。该站点还提供了有关软件包安装和配置的详细说明。该代码正在积极开发中,以结合生物信息学界发布的新分析,方法和算法。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号