...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Transfer across Completely Different Feature Spaces via Spectral Embedding
【24h】

Transfer across Completely Different Feature Spaces via Spectral Embedding

机译:通过光谱嵌入在完全不同的特征空间之间转移

获取原文
获取原文并翻译 | 示例
           

摘要

In many applications, it is very expensive or time consuming to obtain a lot of labeled examples. One practically important problem is: can the labeled data from other related sources help predict the target task, even if they have 1) different feature spaces (e.g., image versus text data), 2) different data distributions, and 3) different output spaces? This paper proposes a solution and discusses the conditions where this is highly likely to produce better results. It first unifies the feature spaces of the target and source data sets by spectral embedding, even when they are with completely different feature spaces. The principle is to devise an optimization objective that preserves the original structure of the data, while at the same time, maximizes the similarity between the two. A linear projection model, as well as a nonlinear approach are derived on the basis of this principle with closed forms. Second, a judicious sample selection strategy is applied to select only those related source examples. At last, a Bayesian-based approach is applied to model the relationship between different output spaces. The three steps can bridge related heterogeneous sources in order to learn the target task. Among the 20 experiment data sets, for example, the images with wavelet-transformed-based features are used to predict another set of images whose features are constructed from color-histogram space; documents are used to help image classification, etc. By using these extracted examples from heterogeneous sources, the models can reduce the error rate by as much as 50 percent, compared with the methods using only the examples from the target task.
机译:在许多应用中,获得很多带标签的示例非常昂贵或耗时。一个实际上很重要的问题是:即使其他信息源具有1)不同的特征空间(例如,图像与文本数据),2)不同的数据分布以及3)不同的输出空间,来自其他相关来源的标记数据也可以帮助预测目标任务吗?本文提出了一种解决方案,并讨论了极有可能产生更好结果的条件。它首先通过频谱嵌入来统一目标数据集和源数据集的特征空间,即使它们具有完全不同的特征空间。原理是设计一个优化目标,该目标既保留数据的原始结构,同时又使两者之间的相似性最大化。在此原理的基础上,采用封闭形式导出了线性投影模型以及非线性方法。其次,采用明智的样本选择策略仅选择那些相关的源示例。最后,采用基于贝叶斯的方法对不同输出空间之间的关系进行建模。这三个步骤可以桥接相关的异构源,以学习目标任务。例如,在20个实验数据集中,具有基于小波变换特征的图像将用于预测另一组图像的特征,这些图像的特征是从颜色直方图空间构建的。文档可用于帮助图像分类等。与仅使用目标任务中示例的方法相比,通过使用从异构源中提取的示例,模型可以将错误率降低多达50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号