An optimization-based undeflated PLS (OUPLS)method to handlemissing data in the training set

Eranda Harinath Puwakkatiya-Kankanamage; Salvador García-Munoz; Lorenz T. Bieglera

首页> 外文期刊>Journal of Chemometrics >An optimization-based undeflated PLS (OUPLS)method to handlemissing data in the training set

【24h】

An optimization-based undeflated PLS (OUPLS)method to handlemissing data in the training set

机译：基于优化的未压缩PLS（OUPLS）方法来处理训练集中的数据丢失

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Advances in sensory systems have led to many industrial applications with large amounts of highly correlated data, particularly in chemical and pharmaceutical processes.With these correlated data sets, it becomes important to consider advanced modeling approaches built to deal with correlated inputs in order to understand the underlying sources of variability and how this variability will affect the final quality of the product. Additional to the correlated nature of the data sets, it is also common to find missing elements and noise in these data matrices. Latent variable regression methods such as partial least squares or projection to latent structures (PLS) have gained much attention in industry for their ability to handle ill-conditioned matrices with missing elements. This feature of the PLS method is accomplished through the nonlinear iterative PLS (NIPALS) algorithm, with a simple modification to consider the missing data. Moreover, in expectation maximization PLS (EM-PLS), imputed values are provided for missing data elements as initial estimates, conventional PLS is then applied to update these elements, and the process iterates to convergence. This study is the extension of previous work for principal component analysis (PCA), where we introduced nonlinear programming (NLP) as a means to estimate the parameters of the PCA model. Here, we focus on the parameters of a PLS model. As an alternative tomodified NIPALS and EM-PLS, this paper presents an efficient NLP-based technique to find model parameters for PLS, where the desired properties of the parameters can be explicitly posed as constraints in the optimization problem of the proposed algorithm. We also present a number of simulation studies, where we compare effectiveness of the proposed algorithm with competing algorithms.

机译：感官系统的进步已导致许多工业应用获得大量高度相关的数据，特别是在化学和制药过程中。有了这些相关的数据集，考虑构建用于处理相关输入的高级建模方法以了解其重要性就变得很重要。潜在的可变性来源以及这种可变性将如何影响产品的最终质量。除了数据集的相关性质外，在这些数据矩阵中查找丢失的元素和噪声也很常见。诸如局部最小二乘或潜在结构投影（PLS）之类的潜在变量回归方法因其能够处理缺少元素的病态矩阵而备受关注。 PLS方法的此功能是通过非线性迭代PLS（NIPALS）算法实现的，并进行了简单修改以考虑丢失的数据。此外，在期望最大化PLS（EM-PLS）中，为丢失的数据元素提供了估算值作为初始估计，然后应用常规PLS更新这些元素，并且过程迭代到收敛。这项研究是对主成分分析（PCA）先前工作的扩展，在此我们引入了非线性规划（NLP）作为估算PCA模型参数的一种方法。在这里，我们重点介绍PLS模型的参数。作为修改后的NIPALS和EM-PLS的替代方法，本文提出了一种基于NLP的高效技术来查找PLS的模型参数，其中，所需参数的属性可以明确地作为提出算法的优化问题中的约束。我们还提出了许多仿真研究，在这些研究中，我们比较了所提出算法与竞争算法的有效性。

著录项

来源
《Journal of Chemometrics》 |2014年第7期|共10页
作者
Eranda Harinath Puwakkatiya-Kankanamage; Salvador García-Munoz; Lorenz T. Bieglera;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类化学;
关键词
latent variables; missing data; PLS; NIPALS; nonlinear programming; IPOPT;

机译：潜在变量;缺失数据;PLS;NIPALS;非线性规划;IPOPT;

相似文献

外文文献
中文文献
专利

1. An optimization-based undeflated PLS (OUPLS)method to handlemissing data in the training set [J] . Eranda Harinath Puwakkatiya-Kankanamage, Salvador García-Munoz, Lorenz T. Bieglera Journal of Chemometrics . 2014,第7期

机译：基于优化的未压缩PLS（OUPLS）方法来处理训练集中的数据丢失
2. Multiple criteria optimization-based data mining methods and applications: a systematic survey [J] . Yong Shi Knowledge and information systems . 2010,第3期

机译：基于多准则优化的数据挖掘方法和应用：系统调查
3. Multiple criteria optimization-based data mining methods and applications: a systematic survey [J] . Yong Shi Knowledge and Information Systems . 2010,第3期

机译：基于多准则优化的数据挖掘方法和应用：系统调查
4. An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases [C] . Rani T.Sobha, Soujanya P.V. International Conference on Contemporary Computing . 2013

机译：使用小的训练集处理不平衡数据集的整体方法：应用于激酶药物
5. Methods for integrating and comparing coexpression information over multiple data sets and applications in mice aging. [D] . Southworth, Lucinda Kay. 2009

机译：整合和比较多个数据集上的共表达信息的方法及其在小鼠衰老中的应用。
6. Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates [O] . Gary L. Gadbury, Qinfang Xiang, Lin Yang, 2008

机译：在大规模公共数据库时代使用等离子数据集评估统计方法：使用错误发现率的图示
7. Fast methods for training Gaussian processes on large data sets [O] . Moore, Christopher J., Chua, Alvin J. K., Berry, Christopher P. L., 2016

机译：在大数据集上训练高斯过程的快速方法

An optimization-based undeflated PLS (OUPLS)method to handlemissing data in the training set

摘要

著录项

相似文献

相关主题

期刊订阅