首页> 外文期刊>Nature >Identification of the human DPR core promoter element using machine learning
【24h】

Identification of the human DPR core promoter element using machine learning

机译:使用机器学习识别人DPR核心启动子元件

获取原文
获取原文并翻译 | 示例
           

摘要

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription(1-5), but the downstream core promoter in humans has been difficult to understand(1-3). Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength. We then analysed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These results show that the DPR is a functionally important core promoter element that is widely used in human promoters. Notably, there appears to be a duality between the DPR and the TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.A machine learning approach shows that the downstream core promoter region (DPR) is widely used in human gene promoters, and that many promoters contain either a DPR or a TATA box, but not both.
机译:RNA聚合酶II(POL II)核心启动子是导致DNA转录(1-5)发起的信号的会聚的战略部位,但人类的下游核心启动子难以理解(1-3) 。在这里,我们分析了人Pol II核心启动子,并使用机器学习来为下游核心启动子区(DPR)和塔塔盒产生预测模型。我们开发了一种称为Harpe(随机启动子元素的高通量分析)的方法,以创造数十万个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或TATA)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或TATA)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个DPR(或TATA)变体,每个DPR(或塔塔)变体,每个DPR(或塔塔)变体,每个变体具有已知的转录强度。然后,我们通过支持向量回归(SVR)分析了HARPE数据,为序列图提供了综合模型,发现基于SVR的方法比基于共有的方法更有效地预测转录活动。这些结果表明,DPR是一种功能上重要的核心启动子元素,广泛用于人类启动子。值得注意的是,DPR和TATA框之间似乎是一种二元性,因为许多启动子包含一个或另一个元素。更广泛地,这些发现表明,功能性DNA主题可以通过机器学习分析来识别一套综合序列变型。机器学习方法表明,下游核心启动子区(DPR)广泛用于人类基因启动子,以及许多启动子包含DPR或TATA盒,但不是两者。

著录项

  • 来源
    《Nature》 |2020年第7825期|459-463|共5页
  • 作者单位

    Univ Calif San Diego Sect Mol Biol La Jolla CA 92093 USA;

    Univ Calif San Diego Sect Mol Biol La Jolla CA 92093 USA;

    Univ Calif San Diego Sect Mol Biol La Jolla CA 92093 USA;

    Univ Calif San Diego Sect Mol Biol La Jolla CA 92093 USA;

    Univ Calif San Diego Sect Mol Biol La Jolla CA 92093 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);美国《生物学医学文摘》(MEDLINE);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号