首页> 外文期刊>Advances in Adaptive Data Analysis >DATA FUSION IN SEVERAL ALGORITHMS
【24h】

DATA FUSION IN SEVERAL ALGORITHMS

机译:几种算法中的数据融合

获取原文
获取原文并翻译 | 示例
           

摘要

Data fusion consists of the process of integrating several datasets with some common variables, and other variables available only in partial datasets. The main problem of data fusion can be described as follows. From one source, having X~0 and Y~0 datasets (with N~0 observations by multiple x and y variables, n and m of those, respectively), and from another source, having X~1 data (with N~1 observations by the same n x-variables), we need to estimate the missing portion of the Y~1 data (of size N~1 by m variables) in order to combine all the data into one set. Several algorithms are considered in this work, including estimation of weights proportional to the distances from each ith observation in the X~1 "recipients" dataset to all observations in the X~0 "donors" dataset. Or we can use a sample balancing technique with the maximum effective base performed by applying ridge-regression for the Gifi system of binaries obtained from the x-variables for the best fit of the "donors" X~0 data to the margins defined by each respondent in the "recipients" X~1 dataset. Then the weighted regressions of each y in the Y~0 dataset by all variables in the X~0 are constructed. For each ith observation in the dataset X~0, these regressions are used for predicting the y-variables in the Y~1 "recipients" dataset. If X and Y are the same n variables from different sources, the dual partial least squares technique and a special regression model with dummies defining each of the three available sets are used for prediction of the Y~1 data.
机译:数据融合包括将多个数据集与一些公共变量以及其他仅在部分数据集中可用的变量进行集成的过程。数据融合的主要问题可以描述如下。从一个来源获得X〜0和Y〜0数据集(分别由多个x和y变量(分别为n和m)进行N〜0个观测),从另一个来源获得X〜1数据(其中N〜1个)观察相同的n个x变量),我们需要估计Y〜1数据(大小为N〜1,由m个变量组成)的缺失部分,以便将所有数据组合为一组。在这项工作中考虑了几种算法,包括权重的估计与从X〜1“收件人”数据集中的每个第i个观察值到X〜0“捐助者”数据集中的所有观察值的距离成比例。或者,我们可以使用样本平衡技术,通过对从x变量获取的二进制文件的Gifi系统应用岭回归来实现最大有效基数,以使“供体” X〜0数据最适合每个定义的边距“收件人” X〜1数据集中的受访者。然后构造X〜0中所有变量在Y〜0数据集中每个y的加权回归。对于数据集X〜0中的第ith个观察,这些回归用于预测Y〜1“收件人”数据集中的y变量。如果X和Y是来自不同来源的相同n个变量,则使用对偶偏最小二乘技术和特殊的回归模型(具有定义三个可用集合中的每一个的虚拟变量)来预测Y〜1数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号