首页> 外文期刊>Statistics and Its Interface >Statistical issues in binding site identification through CLIP-seq
【24h】

Statistical issues in binding site identification through CLIP-seq

机译:通过CLIP-seq识别结合位点的统计问题

获取原文
           

摘要

With the advent and development of CLIP-seq technologies, a growing number of CLIP-seq experiments are being performed to identify the targets of RNA-binding proteins and understand the regulation mechanism of these proteins. Although broad similarities exist between CLIP-seq and ChIP-seq, statistical methods developed to identify binding sites from ChIP-seq data are not directly applicable to CLIP-seq data because of some differences between the two technologies. First, transcript abundance has a large impact on CLIP-seq results, and needs to be accounted for when analyzing CLIP-seq data. Second, mutations near the binding sites from CLIP-seq data offer valuable information that can be incorporated in analysis. Other differences arise from the ability of RNA to form complex secondary structures and from many other technical aspects of the two purification protocols. To date, no systematic studies have been conducted to investigate the general statistical properties of CLIP-seq data, the merits of including RNA-seq as a matching control, and the performance of different binding site identification methods for CLIP-seq data. In this study, we performed a comprehensive evaluation of various statistical issues in using CLIP-seq data to identify RNA-protein binding sites. We demonstrate the value of RNA-seq data in background estimation and peak calling. We show that the large dispersion in CLIP-seq data compared to ChIP-seq data is the main reason for the difficulty in peak calling in the former. Using both real and simulated data, we also show the importance of biological/technical replicates and of combining mutation and peak analysis to accurately identify binding sites from CLIP-seq data. Full Text (PDF format).
机译:随着CLIP-seq技术的出现和发展,正在进行越来越多的CLIP-seq实验,以鉴定RNA结合蛋白的靶标并了解这些蛋白的调控机制。尽管CLIP-seq和ChIP-seq之间存在广泛的相似性,但是由于两种技术之间的某些差异,为从ChIP-seq数据中识别结合位点而开发的统计方法并不直接适用于CLIP-seq数据。首先,转录本丰度对CLIP-seq结果有很大的影响,因此在分析CLIP-seq数据时需要加以考虑。其次,来自CLIP-seq数据的结合位点附近的突变提供了可纳入分析的有价值的信息。 RNA形成复杂二级结构的能力以及两种纯化方案的许多其他技术方面都产生了其他差异。迄今为止,尚未进行系统的研究来研究CLIP-seq数据的一般统计特性,包括RNA-seq作为匹配对照的优点以及CLIP-seq数据的不同结合位点鉴定方法的性能。在这项研究中,我们对使用CLIP-seq数据识别RNA-蛋白质结合位点的各种统计问题进行了全面评估。我们证明了RNA序列数据在背景估计和峰调用中的价值。我们显示,与ChIP-seq数据相比,CLIP-seq数据的较大分散是造成前者难以进行峰调用的主要原因。使用真实和模拟数据,我们还显示了生物学/技术重复以及将突变和峰分析相结合以从CLIP-seq数据准确识别结合位点的重要性。全文(PDF格式)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号