首页> 外文学位 >Detecting important subgroups and rare classes in numerical data sets.
【24h】

Detecting important subgroups and rare classes in numerical data sets.

机译:在数字数据集中检测重要的子组和稀有类别。

获取原文
获取原文并翻译 | 示例

摘要

This thesis focuses on two problems in the area of machine learning. The first is the well-known subgroup discovery problem where the goal is to identify statistically interesting subgroups related to target attributes. Attributes must be discretized during the subgroup discovery process. We describe an algorithm for the discretization of continuous target attributes. The algorithm identifies patterns in the target data and uses them to select the discretization cutpoints. We use the algorithm in a new subgroup discovery method that utilizes a novel quality function to evaluate the interestingness of subgroups. Tests show that the discretization method leads to improved insight. We also define a new data mining problem that identifies members of a rare class of data using one given instance of the rare class. We call this the needles-in-haystack problem. Members of a rare class of data, the needles, have been hidden in a set of records, the haystack. The only information regarding the characterization of the rare class is a single instance of a needle. It is assumed that members of the needle class are similar to each other according to an unknown needle characterization. The goal is to find the needle records hidden in the haystack. This thesis describes an effective algorithm for that task.
机译:本文着重研究机器学习领域中的两个问题。第一个是众所周知的子组发现问题,其目标是识别与目标属性相关的统计上有趣的子组。在子组发现过程中必须离散化属性。我们描述了连续目标属性离散化的算法。该算法识别目标数据中的模式,并使用它们来选择离散化截止点。我们在一种新的子组发现方法中使用该算法,该方法利用一种新颖的质量函数来评估子组的趣味性。测试表明,离散化方法可以提高洞察力。我们还定义了一个新的数据挖掘问题,该问题使用稀有类的一个给定实例来识别稀有类数据的成员。我们将此称为“大海捞针”问题。一组稀有数据的成员,针头,已经藏在大海捞针中。关于稀有类的表征的唯一信息是针的单个实例。假设根据未知的针头特征,针头类的成员彼此相似。目的是找到隐藏在大海捞针中的针记录。本文描述了一种有效的算法。

著录项

  • 作者

    Moreland, Katherine.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 67 p.
  • 总页数 67
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

  • 入库时间 2022-08-17 11:38:19

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号