首页> 美国卫生研究院文献>Computational and Structural Biotechnology Journal >A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
【2h】

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

机译:基因表达数据分析的配对对特征选择方法综述

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses.
机译:随着来自微阵列,RNA测序(RNA-seq)和单细胞RNA-seq等各种技术的基因表达数据的快速积累,有必要进行尺寸缩减和特征(签名基因)选择从如此高维度的数据中得出的意义。这些计算方法极大地促进了进一步的数据分析和解释,例如基因功能富集分析,癌症生物标志物检测以及精密医学中的药物靶向鉴定。尽管已经开发了许多方法来选择生物信息学中的特征,但是为特定问题选择合适的方法并寻求最合理的排名特征仍然是一个挑战。同时,配对病例对照设计(MCCD)下的配对基因表达数据变得越来越流行,该数据通常用于多组学整合研究中,并且可以通过抵消混杂特征的相似分布来提高特征选择效率。专为配对数据设计的适当特征选择方法(称为匹配对特征选择(MPFS))尚未并行开发。在这篇综述中,我们通过应用三种分类方法,比较了两种真实数据集上的10种特征选择方法(八种MPFS方法和两种传统的非配对方法)的性能,并通过它们的程序运行来分析了这些方法的算法复杂性。这篇综述旨在以使读者容易理解MPFS的特征的方式来全面介绍MPFS,并从中寻找线索来选择合适的分析方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号