首页> 美国卫生研究院文献>BMC Bioinformatics >A Grid-based solution for management and analysis of microarrays in distributed experiments
【2h】

A Grid-based solution for management and analysis of microarrays in distributed experiments

机译:用于分布式实验中微阵列管理和分析的基于网格的解决方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip® Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper.
机译:近年来,已经提出了几种系统来管理大型微阵列实验的复杂性。尽管已经取得了良好的结果,但是大多数系统往往在一个或多个领域中缺乏。基于网格的方法可以为生物数据的存储和分析提供共享,标准化和可靠的解决方案,以最大程度地提高实验成果。因此,由于需要远程访问大量分布式数据以及扩展TB级数据集的计算性能,因此已采用Grid框架。已计划进行两项不同的生物学研究,以突出我们基于网格的平台可能产生的益处。所描述的环境依赖于gLite Grid中间件提供的存储服务和计算服务。网格环境还能够利用元数据的附加值,以使用户更好地分类和搜索实验。已经实现了最新的Grid门户,以向最终用户隐藏框架的复杂性,并使他们能够轻松访问可用的服务和数据。描述了门户的功能架构。作为对系统性能的首次测试,已经对来自ArrayExpress数据库的Affymetrix GeneChip ® Rat Expression Array RAE230A的数据集进行了基因表达分析。分析序列包括三个步骤:(i)组打开和图像集上传,(ii)归一化,和(iii)基于模型的基因表达(基于PM / MM差异模型)。已经开发了两种不同的dChip软件Linux版本(顺序和并行)来实施分析,并已在集群上进行了测试。从结果可以看出,分析过程的并行化和在分布式计算资源上并行作业的执行实际上提高了性能。此外,已经针对Grid环境测试了通过Grid中间件上传和访问分布式数据集的可能性,以及在管理分布式计算资源上的作业执行能力方面。网格测试的结果将在另一篇论文中进行讨论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号