Due to the increasing number of independent data providers on the web, there is a growing number of web applications that require locating data sources distributed over the internet. Most of the current proposals in the literature focus on developing effective routing data synopses to answer simple XPath queries in structured or unstructured P2P networks. In this paper, we present an effective framework to support XPath queries extended with full-text search predicates over schema-less XML data distributed in a DHT-based P2P network. We construct two concise routing data synopses, termed structural summary and peer-document synopsis, to route the user query to most relevant peers that own documents that can satisfy the query. To evaluate the structural components in the query, a general query footprint derivation algorithm is developed to extract the query footprint from the query and match it with structural summaries. To improve the search performance, we adopt a lazy query evaluation strategy for evaluating the full-text search predicates in the query. Finally, we develop effective strategies to balance the data load distribution in the system. We conduct extensive experiments to show the scalability of our system, validate the efficiency and accuracy of our routing data synopses, and demonstrate the effectiveness of our load balancing schemes.
展开▼