...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Attribute clustering for grouping, selection, and classification of gene expression data
【24h】

Attribute clustering for grouping, selection, and classification of gene expression data

机译:属性聚类,用于基因表达数据的分组,选择和分类

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significa-nt classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.
机译:本文提出了一种属性聚类方法,该方法能够基于基因的相互依赖性对基因进行分组,从而从基因表达数据中挖掘出有意义的模式。它可以用于基因分组,选择和分类。将关系表划分为属性子组可以选择组内或组内的少量属性进行分析。通过对属性进行聚类,可以减少数据挖掘算法的搜索范围。搜索维数的减少对于基因表达数据中的数据挖掘特别重要,因为此类数据通常由大量基因(属性)和少量基因表达谱(元组)组成。通常,大多数数据挖掘算法都是经过开发和优化的,以适应元组的数量而不是属性的数量。当属性的数量超过元组的数量时,情况变得更糟,在这种情况下,由于机会而实际上无关紧要的报告模式的可能性变得很高。由于上述原因,基因分组和选择是重要的预处理步骤,对于许多数据挖掘算法而言,基因分组和选择在应用于基因表达数据时是有效的。本文定义了属性聚类的问题,并介绍了一种解决方法。我们提出的方法通过优化从反映属性之间相互依赖关系的信息量度的准则函数中将相互依赖的属性分组到群集中。通过将我们的算法应用于基因表达数据,发现了有意义的基因簇。基于组内属性相互依赖性的基因分组有助于捕获每个组中基因关联模式的不同方面。然后,从每组中选择的重要基因将包含有用的信息,用于基因表达的分类和鉴定。为了评估所提出方法的性能,我们将其应用于两个著名的基因表达数据集,并将我们的结果与其他方法获得的结果进行了比较。我们的实验表明,提出的方法能够找到有意义的基因簇。通过选择与簇内其他具有高度多重相关性的基因子集,可以获得重要的分类信息。因此,一小部分选定的基因可用于构建具有很高分类率的分类器。从库中,可以识别不同类别的基因表达。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号