首页> 外文学位 >Creating fast and accurate machine learning ensembles through training dataset preprocessing.
【24h】

Creating fast and accurate machine learning ensembles through training dataset preprocessing.

机译:通过训练数据集预处理创建快速而准确的机器学习集合。

获取原文
获取原文并翻译 | 示例

摘要

Machine learning algorithms make it possible to process large amounts of information faster and more accurately than ever before. Classification and regression algorithms build high-level mathematical models which can be used to approximate functions that map complex, high-dimensional input features to certain output classes or real-valued states.;Single machine learning models can be accurate and effective, but combining independent component machine learning models into groups, called ensembles, has been shown to increase overall classification accuracy for many problems. Ensemble techniques increase classification accuracy with the trade-off of increasing computation time, especially classifier training time.;In this dissertation, we will investigate the creation of highly efficient machine learning ensembles that have fewer component models than existing ensemble algorithms and that can be trained in a much shorter period of time. We use several forms of training dataset preprocessing in order to prepare the data to be used to create accurate ensembles. In particular, we use data clustering and dimensionality reduction using singular value decomposition to create small ensembles that retain high accuracies, but require fewer components than other ensemble methods. We also show how these algorithms can be used to work with a variety of data mining datasets to achieve a high classification accuracy that would be acceptable for use in practical applications.
机译:机器学习算法可以比以往更快,更准确地处理大量信息。分类和回归算法可建立高级数学模型,可用于近似将复杂的高维输入特征映射到某些输出类别或实值状态的函数。;单个机器学习模型可以准确有效,但可以独立组合已经证明,将组件机器学习模型分为组(称为集合)可以提高许多问题的总体分类准确性。集成技术通过增加计算时间,尤其是分类器训练时间来权衡取舍,从而提高分类精度。本文将研究创建高效的机器学习集成,其组件模型比现有集成算法少,并且可以训练在更短的时间内我们使用几种形式的训练数据集预处理,以准备用于创建准确合奏的数据。特别是,我们使用数据聚类和使用奇异值分解的降维来创建保留高准确性但与其他整体方法相比所需的组件更少的小型整体。我们还展示了如何将这些算法用于各种数据挖掘数据集,以实现较高的分类精度,在实际应用中可以接受。

著录项

  • 作者

    Whitehead, Matthew E. N.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 175 p.
  • 总页数 175
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号