首页> 外文会议>IEEE International Conference on Computational Advances in Bio and Medical Sciences >Genome-wide identification and evolutionary analysis of long non-coding RNAs in cereals
【24h】

Genome-wide identification and evolutionary analysis of long non-coding RNAs in cereals

机译:谷物中长非编码RNA的全基因组鉴定和进化分析

获取原文

摘要

We identified lncRNA candidates in four economically important cereals (Poaceae): 7,196 in Zea mays, 1,974 in Sorghum bicolor, 4,236 in Setaria italica and 2,542 in Oryza sativa, using computational methods; we then compared these RNAs across the species. Our approach involved screening a reference-guided transcriptome assembly of RNA-Seq data for RNAs that were at least 200 bases in length with at most 70 amino acids in open reading frames and with a lack of homology in the Uniprot database. A sequence composition analysis of the lncRNA candidates, in comparison to protein-coding transcripts, highlighted distinctive features, including a low GC content, a paucity of introns and a hexamer usage bias, consistent with what has been found for mammalian lncRNAs. RepeatMasker identified from 1% (rice) to 19% (maize) of the candidate lncRNAs as being transcribed from transposable elements, based on a dataset with 3,853 transposable elements. We compared the candidate lncRNAs with 25,141 miRNAs from miRBase, and found that less than 1% of them could be potential miRNA precursors. The cross-species comparisons, which included a sequence- and structure-based lncRNA homology search, synteny analysis, and lncRNA secondary structure prediction, uncovered some limited sequence similarity. In sub-regions, we predicted conserved secondary structures using covariation analysis. We used the comparative sequence and synteny analyses to predict the existence of lncRNAs in S. italica; experimental tests confirmed the presence of these RNAs. Our results are consistent with a model of very rapid evolution of lncRNAs.
机译:我们使用计算方法在四个经济上重要的谷物(禾本科)中鉴定了lncRNA候选物:玉米中的7,196,双色高粱中的1,974,italia italica中的4,236和水稻中的2,542。然后,我们比较了整个物种中的这些RNA。我们的方法涉及筛选RNA-Seq数据的参考指导转录组,以筛选长度至少为200个碱基,在开放阅读框中具有最多70个氨基酸,且Uniprot数据库中缺乏同源性的RNA。与蛋白编码转录本相比,lncRNA候选物的序列组成分析突出了独特的特征,包括低GC含量,内含子少和六聚体使用偏倚,这与哺乳动物lncRNA的发现一致。 RepeatMasker基于具有3853个可转座因子的数据集,从可转座因子中转录出的候选lncRNA的识别率为1%(大米)至19%(玉米)。我们将候选lncRNA与来自miRBase的25,141个miRNA进行了比较,发现其中不到1%是潜在的miRNA前体。跨物种比较,包括基于序列和结构的lncRNA同源性搜索,同义分析和lncRNA二级结构预测,发现了一些有限的序列相似性。在子区域中,我们使用协方差分析预测了保守的二级结构。我们使用比较序列和同义分析来预测意大利链球菌中lncRNA的存在。实验测试证实了这些RNA的存在。我们的结果与lncRNA的快速进化模型相吻合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号