The problem of assessing the size of the World Wide Web is extremely difficult because sampling directly from theWeb is not possible. Several groups of researchers have invested considerable effort to develop sound sampling schemes which involve submitting a number of queries to several major search engines. In this paper we present a statistical approach for the analysis of datasets collected by query-based sampling, utilizing a hierarchical Bayes formulation of the Rasch model for multiple list population estimation. We show that our procedures accord with the real-world constraints and consequently they let us make credible inferences about the size of the World Wide Web.
展开▼