首页> 外国专利> Iterative validation and sampling-based clustering using error-tolerant frequent item sets

Iterative validation and sampling-based clustering using error-tolerant frequent item sets

机译：使用容错频繁项集的迭代验证和基于采样的聚类

页面导航

摘要
著录项
相似文献

摘要

Iterative validation for efficiently determining error-tolerant frequent itemsets is disclosed. A description of the application of error-tolerant frequent itemsets to efficiently determining clusters as well as initializing clustering algorithms are also given. In one embodiment, a method determines a sample set of error-tolerant frequent itemsets (ETF's) within a uniform random sample of data within a database. This sample set of ETF's is independently validated, so that, for example, spurious ETF's and spurious dimensions within the ETF's can be removed. The validated sample set of ETF's, is added to the set of ETF's for the database. This process is repeated with additional uniform samples that are mutually exclusive from prior uniform samples, to continue building the database's set of ETF's, until no new sample sets can be found. The method is significantly more efficient than disk-based methods in the prior art, and the data clusters found are often not discovered by traditional clustering algorithm in the prior art.

机译：公开了用于有效确定容错频繁项目集的迭代验证。还给出了将容错频繁项集应用于有效确定聚类以及初始化聚类算法的描述。在一个实施例中，一种方法确定数据库内数据的统一随机样本内的容错频繁项目集（ETF）的样本集。此ETF样本集经过独立验证，因此，例如，可以删除虚假ETF和ETF中的虚假维度。将经过验证的ETF样本集添加到数据库的ETF集合中。重复此过程并使用与先前的统一样本互斥的其他统一样本，以继续构建数据库的ETF集，直到找不到新的样本集为止。该方法比现有技术中的基于磁盘的方法明显更有效，并且发现的数据簇通常不是现有技术中的传统聚类算法发现的。

著录项

公开/公告号US6490582B1

专利类型
公开/公告日2002-12-03

原文格式PDF
申请/专利权人 MICROSOFT CORPORATION;
展开▼

申请/专利号US20000500172
发明设计人 PAUL S. BRADLEY;CHENG YANG;USAMA M. FAYYAD;
展开▼

申请日2000-02-08
分类号G06F173/00;
国家 US
入库时间 2022-08-22 00:04:44

相似文献

专利
外文文献
中文文献