...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Developing optimal prediction models for cancer classification using gene expression data
【24h】

Developing optimal prediction models for cancer classification using gene expression data

机译:使用基因表达数据开发癌症分类的最佳预测模型

获取原文
获取原文并翻译 | 示例
           

摘要

Microarrays can provide genome-wide expression patterns for various cancers, especially for tumor sub-types that may exhibit substantially different patient prognosis. Using such gene expression data, several approaches have been proposed to classify tumor sub-types accurately. These classification methods are not robust, and often dependent on a particular training sample for modelling, which raises issues in utilizing these methods to administer proper treatment for a future patient. We propose to construct an optimal, robust prediction model for classifying cancer sub-types using gene expression data. Our model is constructed in a step-wise fashion implementing cross-validated quadratic discriminant analysis. At each step, all identified models are validated by an independent sample of patients to develop a robust model for future data. We apply the proposed methods to two microarray data sets of cancer: the acute leukemia data by Golub et al.~3 and the colon cancer data by Alon et al.~(12) We have found that the dimensionality of our optimal prediction models is relatively small for these cases and that our prediction models with one or two gene factors outperforms or has competing performance, especially for independent samples, to other methods based on 50 or more predictive gene factors. The methodology is implemented and developed by the procedures in R and Splus. The source code can be obtained at http://hesweb1.med.virginia.edu/bioinformatics.
机译:微阵列可以为各种癌症提供全基因组表达模式,尤其是对于可能表现出实质上不同的患者预后的肿瘤亚型。利用这样的基因表达数据,已经提出了几种方法来准确地分类肿瘤亚型。这些分类方法并不可靠,并且通常依赖于特定的训练样本进行建模,这在利用这些方法为将来的患者提供适当的治疗方面提出了问题。我们建议使用基因表达数据来构建用于分类癌症亚型的最佳,稳健的预测模型。我们的模型以逐步方式构造,实现了交叉验证的二次判别分析。在每个步骤中,将通过独立的患者样本验证所有已识别的模型,以开发出可靠的模型以用于将来的数据。我们将拟议的方法应用于癌症的两个微阵列数据集:Golub等人〜3的急性白血病数据和Alon等人〜12的结肠癌数据。我们发现,最佳预测模型的维数是对于这些情况而言,相对较小,并且我们的具有一个或两个基因因子的预测模型优于或具有竞争性能,尤其是对于独立样本而言,相对于基于50个或更多预测基因因子的其他方法而言。该方法是通过R和Splus中的过程实施和开发的。可以从http://hesweb1.med.virginia.edu/bioinformatics获得源代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号