首页> 外文期刊>Journal of Molecular Biology >Computational Identification of Transcription Factor Binding Sites via a Transcription-factor-centric Clustering (TFCC) Algorithm.
【24h】

Computational Identification of Transcription Factor Binding Sites via a Transcription-factor-centric Clustering (TFCC) Algorithm.

机译:通过以转录因子为中心的聚类(TFCC)算法对转录因子结合位点进行计算鉴定。

获取原文
获取原文并翻译 | 示例
           

摘要

While microarray-based expression profiling has facilitated the use of computational methods to find potential cis-regulatory promoter elements, few current in silico approaches explicitly link regulatory motifs with the transcription factors that bind them. We have thus developed a TF-centric clustering (TFCC) algorithm that may provide such missing information through incorporation of biological knowledge about TFs. TFCC is a semi-supervised clustering algorithm which relies on the assumption that the expression profiles of some TFs may be related to those of the genes under their control. We examined this premise and found the vicinities of TFs in expression space are often enriched with the genes they regulate. So, instead of clustering genes based on the mutual similarity of their expression profiles to each other, we used TFs as seeds to group together genes whose expression patterns correlate with that of a particular TF. Then a Gibbs sampling algorithm was applied to search for shared cis-regulatory elements in promoters of clustered genes. Our working hypothesis was that if a TF-centric cluster indeed contains many targets of the seeding TF, at least one of the discovered motifs would be the site bound by the very same TF. We tested the TFCC approach on eight cell cycle and sporulation regulating TFs whose binding sites have been previously characterized in Saccharomyces cerevisiae, and correctly identified binding site motifs for half of them. In addition, we also made de novo predictions for some unknown TF binding sites. (c) 2002 Elsevier Science Ltd.
机译:尽管基于微阵列的表达谱分析促进了使用计算方法来寻找潜在的顺式调控启动子元件,但目前很少有计算机方法将调控基序与结合它们的转录因子明确联系起来。因此,我们开发了一种以TF为中心的聚类(TFCC)算法,该算法可以通过结合有关TF的生物学知识来提供此类缺失的信息。 TFCC是一种半监督聚类算法,它基于以下假设:某些TF的表达谱可能与其控制下的基因的表达谱有关。我们检查了这个前提,发现表达空间中TF的附近通常富含它们调节的基因。因此,我们不是基于基因表达谱的相互相似性来聚类基因,而是使用TF作为种子来将其表达模式与特定TF相关的基因组合在一起。然后,使用吉布斯采样算法在聚类基因的启动子中搜索共享的顺式调控元件。我们的工作假设是,如果以TF为中心的簇确实包含许多种子TF的靶标,则至少一个发现的基序将是由同一TF束缚的位点。我们测试了TFCC方法的八个细胞周期和孢子调节TF,其结合位点先前已在酿酒酵母中得到了表征,并正确鉴定了其中一半的结合位点基序。此外,我们还对一些未知的TF结合位点进行了从头预测。 (c)2002爱思唯尔科学有限公司。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号