首页> 外文学位 >Places, networks, and crowds: Scalable data management and analysis for emerging online applications.
【24h】

Places, networks, and crowds: Scalable data management and analysis for emerging online applications.

机译:地点,网络和人群:新兴在线应用程序的可扩展数据管理和分析。

获取原文
获取原文并翻译 | 示例

摘要

The amount of information that is currently generated, gathered, and stored has reached unprecedented levels. Various sources such as websites, local business catalogs, social networks, and Qustion and Answer (Q/A) sites contain vast amounts of data that can potentially be very valuable for both web users and companies. The large size of these datasets poses an obstacle to their effective utilization since answering queries regarding the data and extracting the useful parts of it becomes more demanding. This dissertation focuses on emerging online applications that need to manage and analyze large datasets like these; it comprises three parts each of which studies a different problem and a different type of data.;The first part studies the problem of efficient spatio-textual query processing. Location-based search services, such as Google maps, allow users to issue text queries constrained to a specific geographic location. In order to efficiently process these queries, previous work focused on optimizations regarding the spatial aspect. We provide a solution that gives higher priority to the textual aspect while using only a coarse-grained spatial structure. Our experiments show that this solution outperforms existing approaches by up to two orders of magnitude.;The second part focuses on efficient pairwise distance estimation in large graphs. Point-to-point distance estimation is a fundamental and well-studied problem with numerous applications such as Social Search, but previous algorithms become intractable as the size of the graph grows. We take a fresh look at this setting and approach it as a learning problem, using structural properties of the graph as features in the learning process. Our experiments verify that this approach leads to lower prediction errors than the state-of-the-art solutions.;Finally, the third part proposes a system that utilizes content available in Q/A sites, such as Stack Overflow, in order to efficiently generate and evaluate test questions that assess the technical skills of job candidates. Upon extracting relevant threads from the Q/A sites, our system combines Crowdsourcing and Item Response Theory so as to re-purpose this content to generate tests. Our experiments show that the quality of these tests is comparable to, or higher than, that of tests that are used in practice. At the same time, we achieve a per-test question cost that is lower than that of licensing questions from existing test banks.
机译:当前生成,收集和存储的信息量已达到前所未有的水平。网站,本地业务目录,社交网络和“问答”(Q / A)网站等各种来源包含大量数据,这些数据可能对Web用户和公司都非常有价值。由于回答有关数据的查询和提取数据的有用部分变得更加困难,因此这些数据集的大尺寸对其有效利用构成了障碍。本文的重点是新兴的在线应用程序,这些应用程序需要管理和分析此类大型数据集。它包括三个部分,每个部分研究一个不同的问题和不同类型的数据。第一部分研究有效的时空文本查询处理问题。基于位置的搜索服务(例如Google地图)允许用户发布限于特定地理位置的文本查询。为了有效地处理这些查询,以前的工作集中在有关空间方面的优化上。我们提供了一种解决方案,该解决方案在仅使用粗粒度空间结构的同时,对文本方面给予了更高的优先级。我们的实验表明,该解决方案的性能要比现有方法高两个数量级。第二部分着重于大型图中的有效成对距离估计。点对点距离估计是诸如社交搜索之类的众多应用程序中的一个基本且经过充分研究的问题,但是随着图形大小的增长,以前的算法变得棘手。我们重新审视此设置并将其作为学习问题,将图的结构属性用作学习过程中的特征。我们的实验证明,与最新解决方案相比,此方法可导致较低的预测误差。最后,第三部分提出了一种系统,该系统利用Q / A站点中可用的内容(例如Stack Overflow)来有效生成并评估测试问题,以评估求职者的技术技能。从Q / A站点提取相关线程后,我们的系统将Crowdsourcing和项目响应理论相结合,以便重新利用此内容来生成测试。我们的实验表明,这些测试的质量与实际使用的测试相当或更高。同时,我们实现的每次测试问题成本低于现有测试银行的许可问题成本。

著录项

  • 作者

    Christoforaki, Maria.;

  • 作者单位

    Polytechnic Institute of New York University.;

  • 授予单位 Polytechnic Institute of New York University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 122 p.
  • 总页数 122
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号