首页> 美国卫生研究院文献>other >Semantic workflows for benchmark challenges: Enhancing comparability reusability and reproducibility
【2h】

Semantic workflows for benchmark challenges: Enhancing comparability reusability and reproducibility

机译:应对基准挑战的语义工作流:增强可比性可重用性和可再现性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers’ approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.
机译:基准挑战,例如结构预测的关键评估(CASP)和逆向工程评估和方法对话(DREAM),已在推动生物信息学方法的发展中发挥了重要作用。通常,发布挑战,然后竞争对手根据盲目的测试数据进行预测。然后,挑战者将答案提交给中央服务器,并在其中进行评分。通过使挑战者提交Docker容器的系统已经实现了自动化这些挑战的最新努力,其中Docker容器是打包代码及其所有依赖项的软件单元,可以在云上运行。尽管它们具有为生物信息学界提供公正的测试平台的不可思议的价值,但仍有机会进一步增强基准挑战的潜在影响。具体而言,当前的方法仅评估端到端的性能。直接比较方法或参数几乎是不可能的。此外,由于缺乏具体细节,工具和参数含糊不清以及共享和维护方面的问题,科学界无法轻易重用挑战者的方法。最后,由于未明确定义所提议的工作流程,因此未捕获为何使用特定步骤的直觉,这使得理解数据流和利用率变得很麻烦。在这里,我们介绍一种基于WINGS语义工作流系统的克服这些限制的方法。具体来说,WINGS使研究人员可以将完整的语义工作流程作为挑战提交。通过将条目作为工作流提交,不仅可以比较挑战者的结果和表现,还可以比较所采用的方法。当数十个挑战条目可能使用几乎相同的工具,但参数只有细微的变化(以及结果的根本差异)时,这一点尤其重要。 WINGS使用组件驱动的工作流程设计,并通过推理数据特征来提供智能的参数和数据选择。事实证明,这在生物信息学工作流程中尤为重要,因为使用默认值或不正确的参数值会导致结果急剧变化。通过使用抽象工作流可以轻松比较不同的挑战条目,这也便于重用。 WINGS位于基于云的设置中,该设置存储数据,依赖项和工作流,以便于共享和使用。它还具有通过Pegasus工作流执行系统使用分布式计算来扩展工作流执行的能力。我们演示了此体系结构在DREAM蛋白质组学挑战中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号