首页> 外文期刊>The American statistician >Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles
【24h】

Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles

机译:缺失数据的原型分析:通过查看基于极端配置文件的少数几个来查看所有样本

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, we propose several methodologies for handling missing or incomplete data in archetype analysis (AA) and archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, that is, they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real datasets: a well-known climate dataset and a global development dataset. We illustrate how these unsupervised methodologies allow complex data to be understood, even by nonexperts. for this article are available online.
机译:在本文中,我们提出了几种用于在原型分析(AA)和原型分析(ADA)中处理丢失或不完整数据的方法。 AA寻求查找原型,它们是数据点的凸面组合,并将样本近似于那些原型的混合。在ADA中,代表性的原型数据属于样本,即它们是实际数据点。通过提出的程序,缺失数据不会被丢弃或以前填补归属,并且有关以前的方法,保证了关于原型的位置的理论特性。新程序通过考虑解决方案计算中的缺失值或通过跳过它们来调整AA算法。在第一种情况下,先前方法的解决方案被修改以实现理论,提出了一种新的过程,其中缺失的值由装配的值更新。在该第二案例中,该过程基于估计样品与施加AA或ADA的新空间中这些异化之间的异化的估计,并且这些结果用于在原始空间中提供溶液。在模拟研究中进行了比较分析,结果有利。该方法也应用于两个实时数据集:一个着名的气候数据集和全局开发数据集。我们说明了这些无监督的方法如何允许将复杂的数据理解,甚至是非强行。本文可在线获取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号