Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique

机译：对不完整的基因表达数据进行分类：使用非预先输入特征过滤和最佳优先搜索技术进行集成学习

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

(1) Background: Gene-expression data usually contain missing values (MVs). Numerous methods focused on how to estimate MVs have been proposed in the past few years. Recent studies show that those imputation algorithms made little difference in classification. Thus, some scholars believe that how to select the informative genes for downstream classification is more important than how to impute MVs. However, most feature-selection (FS) algorithms need beforehand imputation, and the impact of beforehand MV imputation on downstream FS performance is seldom considered. (2) Method: A modified chi-square test-based FS is introduced for gene-expression data. To deal with the challenge of a small sample size of gene-expression data, a heuristic method called recursive element aggregation is proposed in this study. Our approach can directly handle incomplete data without any imputation methods or missing-data assumptions. The most informative genes can be selected through a threshold. After that, the best-first search strategy is utilized to find optimal feature subsets for classification. (3) Results: We compare our method with several FS algorithms. Evaluation is performed on twelve original incomplete cancer gene-expression datasets. We demonstrate that MV imputation on an incomplete dataset impacts subsequent FS in terms of classification tasks. Through directly conducting FS on incomplete data, our method can avoid potential disturbances on subsequent FS procedures caused by MV imputation. An experiment on small, round blue cell tumor (SRBCT) dataset showed that our method found additional genes besides many common genes with the two compared existing methods.

机译：（1）背景：基因表达数据通常包含缺失值（MV）。在过去的几年中，已经提出了许多专注于如何估计MV的方法。最近的研究表明，这些插补算法在分类上没有什么区别。因此，一些学者认为，如何为下游分类选择信息基因比推算MV更重要。但是，大多数特征选择（FS）算法都需要预先进行插补，并且很少考虑预先MV插补对下游FS性能的影响。（2）方法：针对基因表达数据引入了一种改进的基于卡方检验的FS。为了应对基因表达数据样本量较小的挑战，本研究提出了一种称为递归元素聚合的启发式方法。我们的方法可以直接处理不完整的数据，而无需任何估算方法或缺少数据的假设。可以通过阈值选择最有用的基因。之后，利用最佳优先搜索策略来找到用于分类的最佳特征子集。（3）结果：我们将我们的方法与几种FS算法进行了比较。对十二个原始的不完整癌症基因表达数据集进行评估。我们证明，在不完整数据集上的MV插补会影响分类任务中的后续FS。通过直接对不完整的数据进行FS，我们的方法可以避免由于MV插值而对后续FS程序造成潜在的干扰。在小的圆形蓝细胞肿瘤（SRBCT）数据集上进行的实验表明，我们的方法除发现了许多常见基因外，还发现了另外两个与现有方法比较的基因。

著录项

期刊名称 International Journal of Molecular Sciences
作者
Yuanting Yan; Tao Dai; Meili Yang; Xiuquan Du; Yiwen Zhang; Yanping Zhang;
展开▼
作者单位

展开▼
年(卷),期 2018(19),11
年度 2018
页码 3398
总页数 22
原文格式 PDF
正文语种
中图分类分子生物学;
关键词
gene-expression data feature selection best first search classification;

机译：基因表达数据;特征选择;最佳优先搜索;分类;

相似文献

外文文献
中文文献
专利

1. Feature Selection and Ensemble Learning Techniques in One-Class Classifiers: An Empirical Study of Two-Class Imbalanced Datasets [J] . Chih-Fong Tsai, Wei-Chao Lin Quality Control, Transactions . 2021,第1期

机译：单级分类器中的特征选择和集合学习技术：两级不平衡数据集的实证研究
2. Spatial modelling of gully headcuts using UAV data and four best-first decision classifier ensembles (BFTree, Bag-BFTree, RS-BFTree, and RF-BFTree) [J] . Hosseinalizadeh Mohsen, Kariminejad Narges, Chen Wei, Geomorphology . 2019,第MARa15期

机译：使用UAV数据和四个最佳优先决策分类器集合（BFTree，Bag-BFTree，RS-BFTree和RF-BFTree）对沟渠头部空间进行空间建模
3. Ensemble of Extreme Learning Machines with trained classifier combination and statistical features for hyperspectral data [J] . Paweł Ksieniewicz, Bartosz Krawczyk, Michał Woźniak Neurocomputing . 2018,第JANa3期

机译：具有训练有素的分类器组合和高光谱数据统计功能的极限学习机集合
4. Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog [C] . Umme Aymun Siddiqua, Tanveer Ahsan, Abu Nowshed Chy International Conference on Computer and Information Technology . 2016

机译：将基于规则的分类器与功能集和机器学习技术集成在一起，以对微博进行情感分析
5. Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics. [D] . Ding, Zejin. 2011

机译：用于高度不平衡数据学习的多元化集成分类器及其在生物信息学中的应用。
6. Classifying EEG for Brain-Computer Interface: Learning Optimal Filters for Dynamical System Features [O] . Le Song, Julien Epps 2007

机译：脑电接口的脑电分类：学习动态系统功能的最佳滤波器
7. Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique [O] . Yuanting Yan, Tao Dai, Meili Yang, 2018

机译：分类不完整的基因表达数据：使用非预测功能过滤和最佳搜索技术进行集合学习

Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique

摘要

著录项

相似文献

相关主题

期刊订阅