首页> 外文期刊>Studies in Health Technology and Informatics >High Performance GRID Based Implementation for Genomics and Protein Analysis
【24h】

High Performance GRID Based Implementation for Genomics and Protein Analysis

机译:基于高性能GRID的基因组学和蛋白质分析实现

获取原文
获取原文并翻译 | 示例
           

摘要

Starting from the genomic and proteomic sequence data, a complex computational infrastructure as been established with the objective to develop a GRID based system to to automate the analysis, prediction and annotation processes of genomic DNA. To support of this type of analysis, several algorithms as been used to recognize biological signals involved in the identification of genes and proteins. The system implemented can be use to analyse the content of the large number of genomic sequences. For this reason, the system realized is capable of using a computational architecture specifically designed for intensive computing based on GRID technologies developed throughout the BIOINFOGRID European project. We developed a GRID based workflow to correlate different kind of Bioinformatics data, going from the Genomics Nucleotide to the Protein Sequence. The first step in the workflow consists of submitting a nucleotide sequence that is elaborated by a specific software for gene prediction. In particular this tool performs a search in the nucleotide sequence to find out the key components of gene. The predicted gene is then translated in the corresponding protein sequence. Based on protein sequence is then possible to identify the domains that characterize the protein functionality using specific tools of domain prediction. Protein domains classification are very important in the analysis of the macromolecular functionality. To analyze a whole protein family from large genome of various organism means to elaborate a large amount of data that requires huge computational resources. To analyze all this data we suggest the use of a high performance platform based on grid technology. We have implemented our applications on a wide area grid platform for scientific applications Fhttp://www.grid.it and http://grid-it.cnaf.infn.itl composed of about 1000 CPU's. The grid infrastructure consists in a collection of computing elements and storage elements that jointly concur to define a platform for high performance elaboration. In this study a grid based application is presented to compute the protein domain analysis in a distributed way. This approach has high performance because the protein domains are checked with different software in parallel in different grid sites.
机译:从基因组和蛋白质组序列数据开始,建立了复杂的计算基础设施,其目的是开发基于GRID的系统,以使基因组DNA的分析,预测和注释过程自动化。为了支持这种类型的分析,使用了几种算法来识别参与基因和蛋白质鉴定的生物信号。所实施的系统可用于分析大量基因组序列的内容。因此,所实现的系统能够使用基于整个BIOINFOGRID欧洲项目开发的GRID技术而专门设计用于密集计算的计算体系结构。我们开发了基于GRID的工作流程,以关联从基因组核苷酸到蛋白质序列的不同种类的生物信息学数据。工作流程的第一步是提交核苷酸序列,该序列由特定的软件进行详细的基因预测。特别地,该工具在核苷酸序列中进行搜索以找出基因的关键成分。然后将预测的基因翻译成相应的蛋白质序列。然后,可以使用域预测的特定工具,基于蛋白质序列来识别表征蛋白质功能的域。蛋白质结构域分类在大分子功能分析中非常重要。要从各种生物的大型基因组中分析整个蛋白质家族,就意味着要准备大量需要大量计算资源的数据。为了分析所有这些数据,我们建议使用基于网格技术的高性能平台。我们已经在由约1000个CPU组成的科学应用Fhttp://www.grid.it和http://grid-it.cnaf.infn.itl的广域网格平台上实现了应用程序。网格基础架构由一组计算元素和存储元素共同组成,以定义一个用于实现高性能的平台。在这项研究中,提出了一种基于网格的应用程序,以分布式方式计算蛋白质结构域分析。这种方法具有很高的性能,因为在不同的网格位置使用不同的软件并行检查了蛋白结构域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号