首页> 外文期刊>Knowledge-Based Systems >Finding multiple global linear correlations in sparse and noisy data sets
【24h】

Finding multiple global linear correlations in sparse and noisy data sets

机译:在稀疏和嘈杂的数据集中找到多个全局线性相关性

获取原文
获取原文并翻译 | 示例
           

摘要

Finding linear correlations is an important research problem with numerous real-world applications. In real-world data sets, linear correlation may not exist in the entire data set. Some linear correlations are only visible in certain data subsets. On one hand, a lot of local correlation clustering algorithms assume that the data points of a linear correlation are locally dense. These methods may miss some global correlations when data points are sparsely distributed. On the other hand, existing global correlation clustering methods may fail when the data set contains a large amount of non-correlated points or the actual correlations are coarse. This paper proposes a simple and fast algorithm DCSearch for finding multiple global linear correlations in a data set. This algorithm is able to find the coarse and global linear correlation in noisy and sparse data sets. By using the classical divide and conquer strategy, it first divides the data set into subsets to reduce the search space, and then recursively searches and prunes the candidate correlations from the subsets. Empirical studies show that DCSearch can efficiently reduce the number of candidate correlations during each iteration. Experimental results on both synthetic and real data sets demonstrate that DCSearch is effective and efficient in finding global linear correlations in sparse and noisy data sets.
机译:寻找线性相关性是许多实际应用中的重要研究问题。在实际数据集中,线性相关可能不存在于整个数据集中。某些线性相关仅在某些数据子集中可​​见。一方面,许多局部相关性聚类算法假设线性相关性的数据点局部密集。当数据点稀疏分布时,这些方法可能会错过一些全局相关性。另一方面,当数据集包含大量不相关点或实际相关性较粗糙时,现有的全局相关性聚类方法可能会失败。本文提出了一种简单快速的DCSearch算法,用于查找数据集中的多个全局线性相关性。该算法能够在嘈杂和稀疏数据集中找到粗略和全局线性相关性。通过使用经典的分而治之策略,它首先将数据集划分为子集以减少搜索空间,然后递归搜索并修剪子集中的候选相关性。实证研究表明,DCSearch可以有效地减少每次迭代过程中候选相关性的数量。综合和真实数据集的实验结果表明,DCSearch在稀疏和嘈杂数据集中寻找全局线性相关性是有效且高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号