首页> 外文期刊>Journal of near infrared spectroscopy >The importance of balanced data sets for partial least squares discriminant analysis: classification problems using hyperspectral imaging data
【24h】

The importance of balanced data sets for partial least squares discriminant analysis: classification problems using hyperspectral imaging data

机译:平衡数据集对偏最小二乘判别分析的重要性:使用高光谱成像数据的分类问题

获取原文
获取原文并翻译 | 示例
           

摘要

This study investigates the effect of imbalanced spectral data in the training set, when developing partial least squares discriminant analysis (PLS-DA) classification models for use in future predictions. The experimental study was performed using a real hyperspectral short-wavelength infrared image data set collected from bakery products (buns) containing contaminants (flies) but similar applications for other insects, paper and plastic were also tested. The contaminants represent a very small proportion of the images relative to the bun. The PLS-DA model aims at accurately detecting and classifying the contaminants and this requires a modification of the calibration data set. The paper deals with problems caused by unbalanced calibration data sets and how to remedy them. In the example it was demonstrated that, by balancing the calibration data from 58,476 bun pixels+279 fly pixels to 279 bun+279 fly pixels, the number of true predictions could be improved with a smaller number of PLS components used in the model. The improvement for flies increased from 65% true predictions with ten PLS components to >99% true prediction with five to six PLS components. The true prediction for bun went from 100% to 99.5% with six PLS components which is an acceptable reduction. Theoretical explanations are included.
机译:本研究在开发偏最小二乘判别分析(PLS-DA)分类模型以用于将来的预测时,研究了训练集中不平衡光谱数据的影响。实验研究是使用真实的高光谱短波红外图像数据集进行的,该数据集是从包含污染物(果蝇)的烘焙产品(面包)中收集的,但也测试了其他昆虫,纸张和塑料的类似用途。相对于面包,污染物仅占图像的很小一部分。 PLS-DA模型旨在准确地检测和分类污染物,这需要修改校准数据集。本文讨论了由不平衡的校准数据集引起的问题以及如何纠正它们。在该示例中证明,通过将校准数据从58,476个面包像素+ 279个飞行像素平衡到279个面包+ 279个飞行像素,可以使用模型中使用的PLS组件数量较少来改善真实预测的数量。苍蝇的改进从具有10个PLS分量的65%真实预测增加到具有5至6个PLS分量的> 99%真实预测。面包的真实预测从100%降至99.5%,其中包含六种PLS成分,这是可以接受的降低。包括理论解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号