首页> 美国卫生研究院文献>GigaScience >Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
【2h】

Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis

机译:软件开发的逐步分布式开放式创新竞赛:基因组-全关联分析的加速

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. >Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. >Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics.
机译:>背景:不同基因型与疾病相关表型性状的关联提供了巨大的潜力,既可以帮助确定新的治疗靶点,又可以对从特定药物类别中受益最大的患者进行分层。低成本基因分型和测序的发展使得在人群和治疗干预研究中常规收集大规模基因分型数据成为常规。另外,一系列新技术被用于捕获众多新的和复杂的表型描述符。结果,基因型和表型数据集呈指数增长。全基因组关联研究使用逻辑回归等方法将基因型和表型联系起来。由于现有的关联分析工具限制了从不断增加的数据量中提取值的效率,因此迫切需要能够加速大型基因型-表型数据集关联分析的新软件工具。 >结果:使用开放式创新(OI)和基于竞赛的众包,大大加快了领先的社区标准遗传学软件包(PLINK 1.07)中的逻辑回归分析。 OI使我们能够在不到6个月的时间内实现这一目标,方法是快速接触具有专业且难以发现的技能的高技能程序员。通过基于人群的竞赛,确定了计算,数值和算法方法的组合,从而将PLINK 1.07中的逻辑回归速度提高了18到45倍。将竞赛衍生的逻辑回归代码与粗粒度的并行化,多线程以及通过分布式创新进一步开发的数据初始化代码的相关更改相结合,对于6678个主题的数据集,我们实现了591倍的端到端加速645LINK863变体,与PLINK 1.07的逻辑回归相比。这表示运行时间从4.8小时减少到29秒。此项目中开发的加速逻辑回归代码已合并到PLINK2项目中。 >结论:我们使用基于竞争的迭代OI,开发了一种新的,更快的逻辑回归的实现方法,用于全基因组关联研究分析。我们将介绍有关成功运行生物信息学OI流程的经验教训和建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号