首页> 外文学位 >Probabilistic approximate algorithms for distributed data mining in peer-to-peer networks.
【24h】

Probabilistic approximate algorithms for distributed data mining in peer-to-peer networks.

机译:对等网络中分布式数据挖掘的概率近似算法。

获取原文
获取原文并翻译 | 示例

摘要

Peer-to-peer(P2P) computing is emerging as a new distributed computing paradigm for novel applications that involves exchange of information among peers with little centralized coordination. Analyzing data distributed in P2P networks requires peer-to-peer data mining algorithms that can mine the data without data centralization. However, replicating result of centralized data mining in an exact fashion is often communication-wise expensive. Approximate algorithms can be a realistic and communication-efficient alternative in this case. This dissertation concentrates on developing approximate data mining algorithms suitable for P2P networks, that closely estimates the result of centralized data mining algorithm with probabilistic guarantee using minimal communication.;The dissertation introduces the concept of approximate local algorithms that can estimate data mining result within desired accuracy boundary with user-specified probabilistic guarantee by operating within a spatial locality of the executioner-node. As a foundation of probabilistic approximation in P2P network, a random-walk based uniform data sampling approach is proposed, that removes the bias and dependence in sampling caused by varying degrees of connectivity and sizes of data shared. Then the sampling technique is applied to develop approximate local algorithms for solving the specific data mining problem of K-means clustering and frequent itemset mining in the context of P2P network. Two K-means clustering algorithms are developed, one of which extends the concept of centralized K-means algorithm to distributed dynamic peer-to-peer environment, while the other provides probabilistic guarantee on accuracy of clustering result in a static P2P network. A frequent itemset mining algorithm is developed as a direct application of the uniform data sampling technique that discovers most of the frequent itemsets with high probability using bounded communication.;The main contribution of this research work is to introduce the concept of approximate local algorithms for data mining in P2P network that provides probabilistic guarantee of accuracy. It builds a basic tool for approximate data analysis in P2P network, a uniform data sampling technique, and develops communication efficient approximate local algorithms for mining data distributed in such network. The algorithms developed here provide data mining results within desired accuracy level and probabilistic guarantee, and shows good scalability with low communication overhead.
机译:对等(P2P)计算正在作为一种新颖的分布式计算范例出现,该范例适用于新颖的应用程序,该应用程序涉及在同位体之间进行信息交换而几乎不需要集中的协调。分析P2P网络中分布的数据需要点对点数据挖掘算法,该算法可以在不集中数据的情况下挖掘数据。但是,以精确的方式复制集中数据挖掘的结果通常在通信方面很昂贵。在这种情况下,近似算法可能是一种现实且具有通信效率的替代方法。本文主要研究开发适用于P2P网络的近似数据挖掘算法,以最小的通信量以概率保证紧密估计集中式数据挖掘算法的结果。通过在执行者节点的空间位置内进行操作,可以为用户指定的概率保证提供边界。作为P2P网络中概率逼近的基础,提出了一种基于随机游走的统一数据采样方法,该方法消除了因连接程度和共享数据大小不同而导致的采样偏差和依赖性。然后采用采样技术开发近似的局部算法,以解决P2P网络环境下的K均值聚类和频繁项集挖掘的特定数据挖掘问题。开发了两种K-means聚类算法,其中一种将集中式K-means算法的概念扩展到分布式动态对等环境,而另一种则为静态P2P网络中的聚类结果的准确性提供了概率保证。频繁项目集挖掘算法是作为统一数据采样技术的直接应用而开发的,该技术使用有界通信以高概率发现大多数频繁项目集。;本研究工作的主要贡献是引入了数据的近似局部算法的概念在P2P网络中进行挖掘以提供准确性的概率保证。它构建了用于P2P网络中近似数据分析的基本工具,一种统一的数据采样技术,并开发了用于挖掘此类网络中分布的数据的通信有效的近似局部算法。此处开发的算法可在所需的准确度水平和概率保证范围内提供数据挖掘结果,并显示出良好的可伸缩性和较低的通信开销。

著录项

  • 作者

    Datta, Souptik.;

  • 作者单位

    University of Maryland, Baltimore County.;

  • 授予单位 University of Maryland, Baltimore County.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 220 p.
  • 总页数 220
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号