...
【24h】

Ortholog Clustering on a Multipartite Graph

机译:多部分图上的Ortholog聚类

获取原文
获取原文并翻译 | 示例
           

摘要

We present a method for automatically extracting groups of orthologous genes from a large set of genomes by a new clustering algorithm on a weighted multipartite graph. The method assigns a score to an arbitrary subset of genes from multiple genomes to assess the orthologous relationships between genes in the subset. This score is computed using sequence similarities between the member genes and the phylogenetic relationship between the corresponding genomes. An ortholog cluster is found as the subset with the highest score, so ortholog clustering is formulated as a combinatorial optimization problem. The algorithm for finding an ortholog cluster runs in time O(|E| + |V| log |V|), where V and E are the sets of vertices and edges, respectively, in the graph. However, if we discretize the similarity scores into a constant number of bins, the runtime improves to O(|E| + |V|). The proposed method was applied to seven complete eukaryote genomes on which the manually curated database of eukaryotic ortholog clusters, KOG, is constructed. A comparison of our results with the manually curated ortholog clusters shows that our clusters are well correlated with the existing clusters
机译:我们提出了一种通过加权多部分图上的新聚类算法自动从一大组基因组中自动提取直系同源基因组的方法。该方法给来自多个基因组的任意基因子集分配分数,以评估子集中基因之间的直系同源关系。使用成员基因之间的序列相似性和相应基因组之间的系统发生关系来计算该分数。发现直系同源簇是得分最高的子集,因此直系同源簇被表述为组合优化问题。查找直系同源簇的算法在时间O(| E | + | V | log | V |)中运行,其中V和E分别是图中的顶点和边的集合。但是,如果我们将相似性分数离散化为恒定数量的bin,则运行时间将提高为O(| E | + | V |)。该方法被应用于七个完整的真核生物基因组,在其上构建了人工策划的真核直系同源簇KOG数据库。将我们的结果与人工策划的直系同源聚类进行比较,结果表明我们的聚类与现有聚类具有很好的相关性

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号