首页> 外文学位 >Semantics and result disambiguation for keyword search on tree data.
【24h】

Semantics and result disambiguation for keyword search on tree data.

机译:对树数据进行关键词搜索的语义和结果歧义消除。

获取原文
获取原文并翻译 | 示例

摘要

Keyword search is a popular technique for searching tree-structured data (e.g., XML, JSON) on the web because it frees the user from learning a complex query language and the structure of the data sources. However, the convenience of keyword search comes with drawbacks. The imprecision of the keyword queries usually results in a very large number of results of which only very few are relevant to the query. Multiple previous approaches have tried to address this problem. Some of them exploit structural and semantic properties of the tree data in order to filter out irrelevant results while others use a scoring function to rank the candidate results. These are not easy tasks though and in both cases, relevant results might be missed and the users might spend a significant amount of time searching for their intended result in a plethora of candidates. Another drawback of keyword search on tree data, also due to the incapacity of keyword queries to precisely express the user intent, is that the query answer may contain different types of meaningful results even though the user is interested in only some of them.;Both problems of keyword search on tree data are addressed in this dissertation. First, an original approach for answering keyword queries is proposed. This approach ex- tracts structural patterns of the query matches and reasons with them in order to return meaningful results ranked with respect to their relevance to the query. The proposed semantics performs comparisons between patterns of results by using different types of homomorphisms between the patterns. These comparisons are used to organize the patterns into a graph of patterns which is leveraged to determine ranking and filtering semantics. The experimental results show that the approach produces query results of higher quality compared to the previous ones. To address the second problem, an original approach for clustering the keyword search results on tree data is introduced. The clustered output allows the user to focus on a subset of the results, and to save time and effort while looking for the relevant results. The approach performs clustering at different levels of granularity to group similar results together effectively. The similarity of the results and result clusters is decided using relations on structural patterns of the results defined based on homomorphisms between path patterns. An originality of the clustering approach is that the clusters are ranked at different levels of granularity to quickly guide the user to the relevant result patterns. An efficient stack-based algorithm is presented for generating result patterns and constructing the clustering hierarchy. The extensive experimentation with multiple real datasets show that the algorithm is fast and scalable. It also shows that the clustering methodology allows the users to effectively retrieve their intended results, and outperforms a recent state-of-the-art clustering approach. In order to tackle the second problem from a different aspect, diversifying the results of keyword search is addressed. Diversification aims to provide the users with a ranked list of results which balances the relevance and redundancy of the results. Measures for quantifying the relevance and dissimilarity of result patterns are presented and a heuristic for generating a diverse set of results using these metrics is introduced.
机译:关键字搜索是一种在网络上搜索树状结构数据(例如XML,JSON)的流行技术,因为它使用户免于学习复杂的查询语言和数据源的结构。然而,关键字搜索的便利性具有缺点。关键字查询的不精确性通常会导致大量结果,其中只有极少数与查询相关。先前的多种方法试图解决这个问题。他们中的一些人利用树数据的结构和语义属性来过滤掉不相关的结果,而另一些人则使用评分功能对候选结果进行排名。但是,这不是一件容易的事,在这两种情况下,都可能会错过相关的结果,并且用户可能会花费大量时间来搜索其预期结果,从而产生过多的候选人。同样由于关键字查询无法准确表达用户意图而对树数据进行关键字搜索的另一个缺点是,即使用户仅对其中一些感兴趣,查询答案仍可能包含不同类型的有意义的结果。本文针对树数据进行关键词搜索的问题。首先,提出了一种用于回答关键词查询的原始方法。该方法提取查询匹配的结构模式和与之匹配的原因,以便返回根据其与查询的相关性排序的有意义的结果。所提出的语义通过在模式之间使用不同类型的同态来执行结果模式之间的比较。这些比较用于将模式组织成模式图,从而确定排名和过滤语义。实验结果表明,与以前的方法相比,该方法产生的查询结果质量更高。为了解决第二个问题,引入了一种用于在树数据上聚类关键字搜索结果的原始方法。集群输出使用户可以专注于结果的子集,并在寻找相关结果时节省时间和精力。该方法以不同的粒度级别执行聚类,以将相似的结果有效地组合在一起。使用基于路径模式之间的同态性定义的结果的结构模式之间的关系来确定结果和结果集群的相似性。聚类方法的独创性是将聚类按不同的粒度级别进行排名,以快速将用户引导至相关的结果模式。提出了一种有效的基于堆栈的算法,用于生成结果模式并构建聚类层次结构。对多个真实数据集的大量实验表明,该算法快速且可扩展。它还表明,聚类方法允许用户有效地检索其预期结果,并且胜过了最新的聚类方法。为了从不同方面解决第二个问题,解决了关键词搜索结果的多样化。多样化的目的是为用户提供排名结果列表,以平衡结果的相关性和冗余性。提出了量化结果模式相关性和不相似性的措施,并介绍了使用这些指标生成一组不同结果的启发式方法。

著录项

  • 作者

    Aksoy, Cem.;

  • 作者单位

    New Jersey Institute of Technology.;

  • 授予单位 New Jersey Institute of Technology.;
  • 学科 Computer science.;Web studies.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 144 p.
  • 总页数 144
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号