...
首页> 外文期刊>BMC Veterinary Research >SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
【24h】

SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents

机译:SPIC:一种基于信息内容比较转录因子结合位点基序的新颖相似性度量

获取原文
           

摘要

BackgroundDiscovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications.MethodsA novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets.ResultsWhen used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs.ConclusionsWe have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of.
机译:背景技术发现转录因子结合位点(TFBS)是解密在基因组中加密的复杂基因调控网络的主要挑战之一。一组由转录因子(TF)识别的短DNA序列被称为基序,可以以矩阵形式准确表达,例如位置特异性得分矩阵(PSSM)和位置频率矩阵。很多时候,我们需要通过查找相似的图案来查询图案数据库中的一个图案,合并可能由相同TF识别的相似的TFBS图案,分离不相关的图案,或过滤出虚假的图案。因此,在所有这些应用中,都需要一种新颖的度量来抓住无关主题之间的细微差异,并突出显示同一组主题之间的相似性。尽管以前已经提出了几种用于母题相似度的度量标准,但它们的性能仍远远不能满足这些应用的要求。方法本文提出了一种新的度量标准,名称为SPIC(位置信息内容相似度),用于测量列之间的相似度。一个主题和另一个主题的列。在定义相似度得分时,我们考虑第二个主题的PSSM列可以产生第一个主题的PFM列的可能性,然后将第二个主题的PSSM列的信息内容乘以该可能性,反之亦然。我们评估了SPIC的性能,结合了具有仿射间隙罚分功能的局部或全局比对方法,以计算两个图案之间的相似性。我们还将SPIC与七个现有的最新技术指标进行了比较,以了解它们对来自同一组的主题进行聚类以及从三个数据集中的数据库中检索主题的能力。结果与Smith-Waterman局部对齐方法一起使用时具有仿射差距罚函数(空位开放罚分等于1,空位延伸罚分等于0.5),SPIC优于现有的七个最新主题相似度度量标准,并结合了数据库搜索中匹配主题的最佳对齐方式,并对其进行聚类结论我们开发了一种新颖的主题相似度度量,可以比数据库搜索中的主题更准确地匹配主题,并且比其他七个度量更有效地将相似的主题聚类并区分无关的主题意识到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号