...
首页> 外文期刊>BMC Genomics >Comparison and evaluation of pathway-level aggregation methods of gene expression data
【24h】

Comparison and evaluation of pathway-level aggregation methods of gene expression data

机译:基因表达数据的途径水平聚合方法的比较和评估

获取原文
           

摘要

Background Microarray experiments produce expression measurements in genomic scale. A way to derive functional understanding of the data is to focus on functional sets of genes, such as pathways, instead of individual genes. While a common practice for the pathway-level analysis has been functional enrichment analysis such as over-representation analysis and gene set enrichment analysis, an alternative approach has also been explored. In this approach, gene expression data are first aggregated at pathway level to transform the original data into a compact representation in which each row corresponds to a pathway instead of a gene. Thereafter the pathway expression data can be used for differential expression and classification analyses in pathway space, leveraging existing algorithms usually applied to gene expression data. While several studies have proposed the pathway-level aggregation methods, it remains unclear how they compare with one another, since the evaluations were done to a limited extent. Thus this study presents a comprehensive evaluation of six most prominent aggregation methods. Results The compared methods include five existing methods--mean of all member genes ( Mean all ), mean of condition-responsive genes ( Mean CORGs ), analysis of sample set enrichment scores (ASSESS), principal component analysis (PCA), and partial least squares (PLS)--and a variant of an existing method ( Mean top 50% , averaging top half of member genes). Comprehensive and stringent benchmarking was performed by collecting seven pairs of related but independent datasets encompassing various phenotypes. Aggregation was done in the space of KEGG pathways. Performance of the methods was assessed by classification accuracy validated both internally and externally, and by examining the correlative extent of pathway signatures between the dataset pairs. The assessment revealed that (i) the best accuracy and correlation were obtained from ASSESS and Mean top 50% , (ii) Mean all showed the lowest accuracy, and (iii) Mean CORGs and PLS gave rise to the largest extent of discordance in the pathway signature correlation. Conclusions The two best performing method (ASSESS and Mean top 50% ) are suggested to be preferred. The benchmarking analysis also suggests that there is both room and necessity for developing a novel method for pathway-level aggregation.
机译:背景微阵列实验产生基因组规模的表达测量。得出对数据的功能理解的一种方法是关注基因的功能集,例如途径,而不是单个基因。尽管途径水平分析的常见做法是功能富集分析,例如过度表达分析和基因集富集分析,但也探索了一种替代方法。在这种方法中,首先在途径水平上汇总基因表达数据,以将原始数据转换为紧凑的表示形式,其中每一行对应于一个途径而不是一个基因。此后,利用通常应用于基因表达数据的现有算法,可以将途径表达数据用于途径空间中的差异表达和分类分析。尽管有几项研究提出了途径水平的聚集方法,但由于评估的范围有限,因此尚不清楚它们如何相互比较。因此,本研究提出了对六种最突出聚合方法的综合评估。结果比较的方法包括五种现有方法-所有成员基因的均值(Mean all),条件反应基因的均值(Mean CORGs),样品集富集得分分析(ASSESS),主成分分析(PCA)和部分最小二乘(PLS)-以及现有方法的一种变体(均值前50%,平均成员基因的前半部分)。通过收集涵盖不同表型的七对相关但独立的数据集来进行全面而严格的基准测试。聚集在KEGG途径的空间中进行。通过内部和外部验证的分类准确性,以及通过检查数据集对之间的路径签名的相关程度,来评估方法的性能。评估显示(i)最佳准确性和相关性是从ASSESS和均值前50%获得的;(ii)平均均显示了最低的准确性,并且(iii)平均CORG和PLS导致了最大程度的不一致。通路签名相关性。结论建议使用两种最佳执行方法(ASSESS和Mean top 50%)。基准分析还表明,开发一种新的通路水平聚集方法既有空间,也有必要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号