首页> 美国卫生研究院文献>PLoS Clinical Trials >Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
【2h】

Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes

机译:关闭目标修剪和CTTdocker程序以发现基因组中隐藏的超家族基因座

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes.
机译:当代的基因组序列分析能力大大落后于快速发展的测序技术。从数量不断增加的基因组数据中检索生物学有意义的信息对于功能基因组学研究将是非常有益的。例如,超家族基因的复制,组织,进化和功能在生活的许多方面都可以说是重要的。但是,在许多测序的基因组中注释的不完整通常会导致超家族的比较基因组研究中结论有误。在这里,我们介绍一种Perl软件,称为Closing Target Trimming(CTT),用于自动识别CentOS 7平台上任何已测序基因组中的基因家族的大多数(如果不是全部)成员。为了使其他操作系统上的更广泛的应用程序受益,我们还创建了一个Docker应用程序包CTTdocker。我们关于F-box基因超家族的测试数据显示,在两个具有良好注释的植物基因组中,拟南芥和水稻分别具有78.2和79%的基因发现准确性。为了进一步证明该程序的有效性,我们将其遍历了18个植物基因组和5个非植物基因组,以比较F-box和BTB超家族的扩增。该程序发现,平均F-box和BTB成员总数分别为植物基因组中的新基因座,分别占总F-box和BTB成员的12.7%和9.3%,而仅在脊椎动物基因组中发现了少量新成员。因此,植物和动物中可能存在不同的Cullin-RING泛素连接酶进化和调控机制。我们还注释并比较了Pkinase家族成员在多种生物中的作用,包括从Ensembl数据库中随机选择的10种真菌,10种后生动物,10种脊椎动物和10种其他植物。我们的CTT注释在这40个基因组中平均恢复了Pkinase超家族的基因座(包括假基因)多14%,证明了其在注释任何基因组超家族成员中的强大可复制性和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号