...
首页> 外文期刊>BMC Genomics >Assisted transcriptome reconstruction and splicing orthology
【24h】

Assisted transcriptome reconstruction and splicing orthology

机译:辅助转录组重建和剪接矫形

获取原文
           

摘要

Background Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve — as has been strongly suggested in the literature — it is important to evaluate assisted transcriptome prediction ; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs . Results We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade. Conclusions Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.
机译:背景转录组重建(定义为鉴定可能由基因表达的所有蛋白质同工型)是一项非常困难的计算任务。有了真实数据,基于RNA序列数据的最佳方法只能识别出21%的表达转录本。在等待算法和测序技术改善的同时(正如文献所强烈建议的那样),评估辅助转录组的预测很重要;这是一个物种中另一种转录如何作为另一种相对接近物种中蛋白质同工型的预测因子的问题。大多数基于证据的基因预测因子都使用其他物种的转录本来注释基因组,但从未完全量化使用外部物种的转录本的程序的预测能力。这种评估的基础是正确识别具有相同剪接模式的转录本对,称为剪接直向同源物。结果基于核苷酸序列中所有剪接位点的直向同源物对的鉴定以及蛋白质水平的比对,我们提出了严格的剪接直向同源物的程序定义。使用我们的定义,我们比较了高度精选的CCDS数据库中的24382个人转录物和17909小鼠转录物,并鉴定了11122个剪接直向同源物。在预测模式下,我们显示人转录本可用于推断超过62%的小鼠蛋白质同工型。如果将预测限制在八年前已知的笔录中,则该百分比将增长到74%。使用带有CCDS时间戳的发行版,我们还分析了过去十年中剪接直向同源物数量的演变。结论现已认识到剪接在真核生物蛋白质多样性中起主要作用,但剪接同工型直向同源物的定义仍是近似的。在这里,我们提出了一个适用于保守的可变剪接位点的细微变化的定义,并用它来验证众多准确的直系同源异构体预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号