首页> 外文学位 >Learning from partially labeled data: Unsupervised and semi-supervised learning on graphs and learning with distribution shifting.

【24h】

Learning from partially labeled data: Unsupervised and semi-supervised learning on graphs and learning with distribution shifting.

机译：从部分标记的数据中学习：在图上进行无监督和半监督学习，并通过分布转移进行学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis focuses on two fundamental machine learning problems: unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics, where a large amount of data is available, but no or only a small amount of labeled data exists. Obtaining classification labels in these domains is usually quite difficult because it involves either manual labeling or physical experimentation. This thesis approaches these problems from two perspectives: graph based and distribution based.;First, I investigate a series of graph based learning algorithms that are able to exploit information embedded in different types of graph structures. These algorithms allow label information to be shared between nodes in the graph---ultimately communicating information globally to yield effective unsupervised and semi-supervised learning. In particular, I extend existing graph based learning algorithms, currently based on undirected graphs, to more general graph types, including directed graphs, hypergraphs and complex networks. These richer graph representations allow one to more naturally capture the intrinsic data relationships that exist, for example, in Web data, relational data, bioinformatics and social networks. For each of these generalized graph structures I show how information propagation can be characterized by distinct random walk models, and then use this characterization to develop new unsupervised and semi-supervised learning algorithms.;Second, I investigate a more statistically oriented approach that explicitly models a learning scenario where the training and test examples come from different distributions. This is a difficult situation for standard statistical learning approaches, since they typically incorporate an assumption that the distributions for training and test sets are similar, if not identical. To achieve good performance in this scenario, I utilize unlabeled data to correct the bias between the training and test distributions. A key idea is to produce resampling weights for bias correction by working directly in a feature space and bypassing the problem of explicit density estimation. The technique can be easily applied to many different supervised learning algorithms, automatically adapting their behavior to cope with distribution shifting between training and test data.

机译：本文着重于两个基本的机器学习问题：无监督学习（其中没有可用的标签信息）和半监督学习（其中除未标记的数据外还提供少量标签）。这些问题出现在许多真实的单词应用程序中，例如Web分析和生物信息学，这些应用程序中有大量数据可用，但不存在或仅存在少量标记数据。在这些领域中获取分类标签通常非常困难，因为它涉及手动标签或物理实验。本文从两个角度解决了这些问题：基于图和基于分布。首先，我研究了一系列基于图的学习算法，它们能够利用嵌入在不同类型图结构中的信息。这些算法允许标签信息在图中的节点之间共享-最终在全球范围内交流信息，以产生有效的无监督和半监督学习。特别是，我将当前基于无向图的基于图的学习算法扩展到更普通的图类型，包括有向图，超图和复杂网络。这些更丰富的图形表示使人们可以更自然地捕获存在于Web数据，关系数据，生物信息学和社交网络中的固有数据关系。对于这些通用图结构，我展示了如何通过不同的随机游走模型来表征信息传播，然后使用这种表征来开发新的无监督和半监督学习算法。其次，我研究了一种更加以统计为导向的方法，可以对模型进行显式建模一种学习场景，其中的培训和测试示例来自不同的分布。对于标准的统计学习方法来说，这是一个困难的情况，因为它们通常合并一个假设，即训练和测试集的分布是相似的，即使不相同。为了在这种情况下获得良好的性能，我利用了未标记的数据来纠正训练和测试分布之间的偏差。一个关键思想是通过直接在特征空间中工作并绕过显式密度估计的问题来产生用于偏差校正的重采样权重。该技术可以轻松地应用于许多不同的监督学习算法，自动调整其行为以应对训练数据与测试数据之间的分布偏移。

著录项

作者
Huang, Jiayuan.;
展开▼
作者单位

University of Waterloo (Canada).;

展开▼
授予单位 University of Waterloo (Canada).;
学科 Computer Science.
学位 Ph.D.
年度 2007
页码 169 p.
总页数 169
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive semi-supervised learning on labeled and unlabeled data with different distributions [J] . Akinori Fujino, Naonori Ueda, Masaaki Nagata Knowledge and information systems . 2013,第1期

机译：对具有不同分布的标记和未标记数据进行自适应半监督学习
2. SEMI-SUPERVISED ANALYSIS OF HUMAN BRAIN TUMOURS FROM PARTIALLY LABELED MRS INFORMATION,USING MANIFOLD LEARNING MODELS [J] . RA´UL CRUZ-BARBOSAALFREDO VELLIDO International Journal of Neural Systems . 2011,第1期

机译：使用流形学习模型从部分标记的MRS信息进行人脑肿瘤的半监督分析
3. SEMI-SUPERVISED ANALYSIS OF HUMAN BRAIN TUMOURS FROM PARTIALLY LABELED MRS INFORMATION, USING MANIFOLD LEARNING MODELS [J] . RAUL CRUZ-BARBOSA, ALFREDO VELLIDO International Journal of Neural Systems . 2011,第1期

机译：使用流形学习模型从部分标记的MRS信息进行人脑肿瘤的半监督分析
4. Using Unsupervised Learning for Graph Construction in Semi-supervised Learning with Graphs [C] . Escalante Diego Alonso Chavez, Taubin Gabriel, Nonato Luis Gustavo, 2013 26th Conference on Graphics, Patterns and Images . 2013

机译：在图的半监督学习中使用无监督学习进行图构建
5. Learning with Limited Labeled Data in Biomedical Domain by Disentanglement and Semi-Supervised Learning [D] . Gyawali, Prashnna Kumar. 2021

机译：通过解剖学和半监督学习在生物医学领域的有限标记数据学习
6. Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees [O] . David Hübner, Thibault Verhoeven, Konstantin Schmid, -1

机译：在脑机接口中从标签比例中学习：在线无监督学习保证
7. Label Propagation with Augmented Anchors: A Simple Semi-supervised Learning Baseline for Unsupervised Domain Adaptation [O] . Yabin Zhang, Bin Deng, Kui Jia, 2020

机译：用增强锚点标记传播：无监督域适应的简单半监督学习基线

Learning from partially labeled data: Unsupervised and semi-supervised learning on graphs and learning with distribution shifting.

摘要

著录项

相似文献

相关主题

期刊订阅