首页> 外文学位 >Probabilistic approximate algorithms for distributed data mining in peer-to-peer networks.

【24h】

Probabilistic approximate algorithms for distributed data mining in peer-to-peer networks.

机译：对等网络中分布式数据挖掘的概率近似算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Peer-to-peer(P2P) computing is emerging as a new distributed computing paradigm for novel applications that involves exchange of information among peers with little centralized coordination. Analyzing data distributed in P2P networks requires peer-to-peer data mining algorithms that can mine the data without data centralization. However, replicating result of centralized data mining in an exact fashion is often communication-wise expensive. Approximate algorithms can be a realistic and communication-efficient alternative in this case. This dissertation concentrates on developing approximate data mining algorithms suitable for P2P networks, that closely estimates the result of centralized data mining algorithm with probabilistic guarantee using minimal communication.;The dissertation introduces the concept of approximate local algorithms that can estimate data mining result within desired accuracy boundary with user-specified probabilistic guarantee by operating within a spatial locality of the executioner-node. As a foundation of probabilistic approximation in P2P network, a random-walk based uniform data sampling approach is proposed, that removes the bias and dependence in sampling caused by varying degrees of connectivity and sizes of data shared. Then the sampling technique is applied to develop approximate local algorithms for solving the specific data mining problem of K-means clustering and frequent itemset mining in the context of P2P network. Two K-means clustering algorithms are developed, one of which extends the concept of centralized K-means algorithm to distributed dynamic peer-to-peer environment, while the other provides probabilistic guarantee on accuracy of clustering result in a static P2P network. A frequent itemset mining algorithm is developed as a direct application of the uniform data sampling technique that discovers most of the frequent itemsets with high probability using bounded communication.;The main contribution of this research work is to introduce the concept of approximate local algorithms for data mining in P2P network that provides probabilistic guarantee of accuracy. It builds a basic tool for approximate data analysis in P2P network, a uniform data sampling technique, and develops communication efficient approximate local algorithms for mining data distributed in such network. The algorithms developed here provide data mining results within desired accuracy level and probabilistic guarantee, and shows good scalability with low communication overhead.

机译：对等（P2P）计算正在作为一种新颖的分布式计算范例出现，该范例适用于新颖的应用程序，该应用程序涉及在同位体之间进行信息交换而几乎不需要集中的协调。分析P2P网络中分布的数据需要点对点数据挖掘算法，该算法可以在不集中数据的情况下挖掘数据。但是，以精确的方式复制集中数据挖掘的结果通常在通信方面很昂贵。在这种情况下，近似算法可能是一种现实且具有通信效率的替代方法。本文主要研究开发适用于P2P网络的近似数据挖掘算法，以最小的通信量以概率保证紧密估计集中式数据挖掘算法的结果。通过在执行者节点的空间位置内进行操作，可以为用户指定的概率保证提供边界。作为P2P网络中概率逼近的基础，提出了一种基于随机游走的统一数据采样方法，该方法消除了因连接程度和共享数据大小不同而导致的采样偏差和依赖性。然后采用采样技术开发近似的局部算法，以解决P2P网络环境下的K均值聚类和频繁项集挖掘的特定数据挖掘问题。开发了两种K-means聚类算法，其中一种将集中式K-means算法的概念扩展到分布式动态对等环境，而另一种则为静态P2P网络中的聚类结果的准确性提供了概率保证。频繁项目集挖掘算法是作为统一数据采样技术的直接应用而开发的，该技术使用有界通信以高概率发现大多数频繁项目集。；本研究工作的主要贡献是引入了数据的近似局部算法的概念在P2P网络中进行挖掘以提供准确性的概率保证。它构建了用于P2P网络中近似数据分析的基本工具，一种统一的数据采样技术，并开发了用于挖掘此类网络中分布的数据的通信有效的近似局部算法。此处开发的算法可在所需的准确度水平和概率保证范围内提供数据挖掘结果，并显示出良好的可伸缩性和较低的通信开销。

著录项

作者
Datta, Souptik.;
展开▼
作者单位

University of Maryland, Baltimore County.;

展开▼
授予单位 University of Maryland, Baltimore County.;
学科 Computer Science.
学位 Ph.D.
年度 2008
页码 220 p.
总页数 220
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. DISTRIBUTED CONVEX OPTIMIZATION ALGORITHMS AND THEIR APPLICATION TO DISTRIBUTED CONTROL IN PEER-TO-PEER DATA STORAGE SYSTEM [J] . Iiduka Hideaki Journal of nonlinear and convex analysis . 2015,第11期

机译：点对点数据存储系统中的分布式凸优化算法及其在分布式控制中的应用
2. An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining with XML Data for Improved Response Time [J] . Sujni Paul International Journal of Computer Science & Information Technology (IJCSIT) . 2010,第2期

机译：XML数据并行和分布式数据挖掘中的优化分布式关联规则挖掘算法，可提高响应时间
3. A special issue of Journal of Parallel and Distributed Computing: Models and algorithms for high-performance distributed data mining [J] . Alfredo Cuzzocrea Journal of Parallel and Distributed Computing . 2011,第5期

机译：《并行与分布式计算杂志》特刊：高性能分布式数据挖掘的模型和算法
4. Approximate joint probabilistic data association algorithms, [C] . James A. Roecker, IBM Federal Systems Co., Boulder, Signal and Data Processing of Small Targets 1993 . 1993

机译：近似联合概率数据关联算法，
5. Approximate algorithms for data warehousing and data mining. [D] . Wu, Xintao. 2001

机译：数据仓库和数据挖掘的近似算法。
6. Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches [O] . Faranak Kazerouni, Azadeh Bayani, Farkhondeh Asadi, 2020

机译：基于长非编码RNA表达的数据挖掘算法类型2糖尿病预测：四种数据采矿方法的比较
7. A Local Distributed Peer-to-Peer Algorithm Using Multi-Party Optimization Based Privacy Preservation for Data Mining Primitive Computation [O] . Kamalika Das, Hillol Kargupta, Kanishka Bhaduri 2009

机译：基于多方优化的隐私保护的本地分布式对等算法，用于数据挖掘原始计算
8. Optimal Distributed Routing Algorithms for Datagram Communication Networks. [R] . Yee, J. R. 1989

机译：数据报通信网络的最优分布式路由算法。

Probabilistic approximate algorithms for distributed data mining in peer-to-peer networks.

摘要

著录项

相似文献

相关主题

期刊订阅