首页> 外文学位 >Creating fast and accurate machine learning ensembles through training dataset preprocessing.

【24h】

Creating fast and accurate machine learning ensembles through training dataset preprocessing.

机译：通过训练数据集预处理创建快速而准确的机器学习集合。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning algorithms make it possible to process large amounts of information faster and more accurately than ever before. Classification and regression algorithms build high-level mathematical models which can be used to approximate functions that map complex, high-dimensional input features to certain output classes or real-valued states.;Single machine learning models can be accurate and effective, but combining independent component machine learning models into groups, called ensembles, has been shown to increase overall classification accuracy for many problems. Ensemble techniques increase classification accuracy with the trade-off of increasing computation time, especially classifier training time.;In this dissertation, we will investigate the creation of highly efficient machine learning ensembles that have fewer component models than existing ensemble algorithms and that can be trained in a much shorter period of time. We use several forms of training dataset preprocessing in order to prepare the data to be used to create accurate ensembles. In particular, we use data clustering and dimensionality reduction using singular value decomposition to create small ensembles that retain high accuracies, but require fewer components than other ensemble methods. We also show how these algorithms can be used to work with a variety of data mining datasets to achieve a high classification accuracy that would be acceptable for use in practical applications.

机译：机器学习算法可以比以往更快，更准确地处理大量信息。分类和回归算法可建立高级数学模型，可用于近似将复杂的高维输入特征映射到某些输出类别或实值状态的函数。;单个机器学习模型可以准确有效，但可以独立组合已经证明，将组件机器学习模型分为组（称为集合）可以提高许多问题的总体分类准确性。集成技术通过增加计算时间，尤其是分类器训练时间来权衡取舍，从而提高分类精度。本文将研究创建高效的机器学习集成，其组件模型比现有集成算法少，并且可以训练在更短的时间内我们使用几种形式的训练数据集预处理，以准备用于创建准确合奏的数据。特别是，我们使用数据聚类和使用奇异值分解的降维来创建保留高准确性但与其他整体方法相比所需的组件更少的小型整体。我们还展示了如何将这些算法用于各种数据挖掘数据集，以实现较高的分类精度，在实际应用中可以接受。

著录项

作者
Whitehead, Matthew E. N.;
展开▼
作者单位

Indiana University.;

展开▼
授予单位 Indiana University.;
学科 Artificial Intelligence.;Computer Science.
学位 Ph.D.
年度 2010
页码 175 p.
总页数 175
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Ensemble Machine Learning for Leukemia Cancer Diagnosis based on Microarray Datasets [J] . Nashat Alrefai International Journal of Applied Engineering Research . 2019,第21期

机译：基于微阵列数据集的白血病癌症诊断的集合机器学习
2. An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models [J] . Jason R. Hattrick-Simpers, Brian DeCost, A. Gilad Kusne, Integrating Materials and Manufacturing Innovation . 2021,第2期

机译：开放组合衍射数据集包括共识人员和机器学习标签，具有量化的不确定性，用于培训新机器学习模型
3. Rapid and accurate modeling of PV modules based on extreme learning machine and large datasets of Ⅰ-Ⅴ curves [J] . Chen Z., Yu H., Luo L., Applied Energy . 2021,第Juna15期

机译：基于极端学习机的PV模块快速准确地建模，Ⅰ-Ⅳ曲线的大型数据集
4. Concerto: Leveraging Ensembles for Timely, Accurate Model Training Over Voluminous Datasets [C] . Walid Budgaga, Matthew Malensek, Sangmi Lee Pallickara, IEEE/ACM International Conference on Big Data Computing, Applications and Technologies . 2020

机译：协奏曲：利用乐合品及时，准确的模型训练在庞大的数据集上
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Fast Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the ‘Extreme Learning Machine’ Algorithm [O] . Mark D. McDonnell, Migel D. Tissera, Tony Vladusich, -1

机译：通过使用极限学习机算法训练浅层神经网络分类器实现快速简单和准确的手写数字分类
7. Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm. [O] . Mark D McDonnell, Migel D Tissera, Tony Vladusich, 2015

机译：利用“极限学习机”算法训练浅层神经网络分类器快速，简单，准确的手写数字分类。

Creating fast and accurate machine learning ensembles through training dataset preprocessing.

摘要

著录项

相似文献

相关主题

期刊订阅