Peer-to-peer (P2P) databases are becomingprevalent on the Internet for distribution and sharing ofdocuments, applications, and other digital media. The problemof answering large-scale ad hoc analysis queries, for example,aggregation queries, on these databases poses uniquechallenges. Exact solutions can be time consuming and difficultto implement, given the distributed and dynamic nature of P2Pdatabases. In this paper, we presented novel sampling-basedtechniques for approximate answering of ad hoc aggregationqueries in such databases. Computing a high-quality randomsample of the database efficiently in the P2P environment iscomplicated due to several factors: the data is distributed(usually in uneven quantities) across many peers, within eachpeer, the data is often highly correlated, and, moreover, evencollecting a random sample of the peers is difficult toaccomplish. To counter these problems, proposed approachwill uses approach based on random walks of the P2P graph,as well as block-level sampling techniques.
展开▼