首页>
外国专利>
Iterative validation and sampling-based clustering using error-tolerant frequent item sets
Iterative validation and sampling-based clustering using error-tolerant frequent item sets
展开▼
机译:使用容错频繁项集的迭代验证和基于采样的聚类
展开▼
页面导航
摘要
著录项
相似文献
摘要
Iterative validation for efficiently determining error-tolerant frequent itemsets is disclosed. A description of the application of error-tolerant frequent itemsets to efficiently determining clusters as well as initializing clustering algorithms are also given. In one embodiment, a method determines a sample set of error-tolerant frequent itemsets (ETF's) within a uniform random sample of data within a database. This sample set of ETF's is independently validated, so that, for example, spurious ETF's and spurious dimensions within the ETF's can be removed. The validated sample set of ETF's, is added to the set of ETF's for the database. This process is repeated with additional uniform samples that are mutually exclusive from prior uniform samples, to continue building the database's set of ETF's, until no new sample sets can be found. The method is significantly more efficient than disk-based methods in the prior art, and the data clusters found are often not discovered by traditional clustering algorithm in the prior art.
展开▼