首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Explaining a Weighted DAG with Few Paths for Solving Genome-Guided Multi-Assembly
【24h】

Explaining a Weighted DAG with Few Paths for Solving Genome-Guided Multi-Assembly

机译:解释具有很少路径的加权DAG,以解决基因组引导的多重装配

获取原文
获取原文并翻译 | 示例
       

摘要

RNA-Seq technology offers new high-throughput ways for transcript identification and quantification based on short reads, and has recently attracted great interest. This is achieved by constructing a weighted DAG whose vertices stand for exons, and whose arcs stand for split alignments of the RNA-Seq reads to the exons. The task consists of finding a number of paths, together with their expression levels, which optimally explain the weights of the graph under various fitting functions, such as least sum of squared residuals. In (Tomescu et al. BMC Bioinformatics, 2013) we studied this problem when the number of allowed solution paths was linear in the number of arcs. In this paper, we further refine this problem by asking for a bounded number of solution paths, which is the setting of most practical interest. We formulate this problem in very broad terms, and show that for many choices of the fitting function it becomes NP-hard. Nevertheless, we identify a natural graph parameter of a DAG , which we call and denote , and give a dynamic programming algorithm running in time , where is the number of vertices and is the maximum weight of . This implies that the problem is fixed-parameter tractable (FPT) in the parameters , , and . We also show that the arc-width of DAGs constructed from simulated and real RNA-Seq reads is small in practice. Finally, we study the approximability of this problem, and, in particular, give a fully polynomial-time approximation scheme (FPTAS) for the case when the fitting function penalizes the maximum ratio between the weights of the arcs and their predicted coverage.
机译:RNA-Seq技术为基于短读的转录物鉴定和定量提供了新的高通量方法,并且最近引起了极大的兴趣。这是通过构建加权DAG来实现的,该DAG的顶点表示外显子,其弧线表示RNA-Seq读数与外显子的拆分比对。该任务包括找到许多路径以及它们的表达水平,这些路径可以在各种拟合函数(例如最小平方残差之和)下最优地解释图的权重。在(Tomescu等人,BMC Bioinformatics,2013)中,当允许的解路径数与弧数成线性关系时,我们研究了此问题。在本文中,我们通过要求有限数量的求解路径来进一步完善这个问题,这是最实际的关注点。我们用非常广泛的术语表述了这个问题,并表明对于拟合函数的许多选择,它变得很困难。不过,我们确定了DAG的自然图形参数(称为并表示),并给出了及时运行的动态编程算法,其中的顶点数和的最大权重。这意味着问题是参数和中的固定参数易处理(FPT)。我们还表明,从模拟和实际RNA-Seq读取构建的DAG的弧宽在实践中很小。最后,我们研究了该问题的逼近性,尤其是当拟合函数对弧的权重与其预测的覆盖率之间的最大比值进行惩罚时,给出了一个完全多项式时间逼近方案(FPTAS)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号