首页> 外文会议>2011 7th International Conference on Emerging Technologies >Implementing MapReduce over language and literature data over the UK National Grid Service
【24h】

Implementing MapReduce over language and literature data over the UK National Grid Service

机译:通过英国国家网格服务在语言和文献数据上实施MapReduce

获取原文

摘要

Humanities researchers are producing large volumes and heterogeneous varieties of language and literature data collections in digital format. These collections include dictionaries, thesauri, corpora, images, audio and video resources. The increased availability of these datasets brought about by advances and adaptations of the Internet and increased digitisation of humanities data resources, poses new challenges for humanities researchers. Many of these challenges are related to data access and usage and include security, integrity, interoperability, information retrieval, sharing, licensing and copyright. The JISC-funded project Enhancing Repositories for Language and Literature Research (ENROLLER; https://www.enroller.org.uk) is addressing these issues through development of a targeted e-Research environment. A key component of this effort is in supporting large-scale analysis of diverse language and literature data sets. To this end, this paper presents the application of the MapReduce algorithm, that supports information retrieval and linguistic analysis on those datasets. In particular, we describe how MapReduce is used to provide advanced bulk search capabilities exploiting a range of high performance computing resources including the UK National Grid Service (www.ngs.ac.uk) and ScotGrid (www.scotgrid.ac.uk) to offer a step change in the kinds of research that can be undertaken by this community. We also present performance analysis results based on the application of these systems.
机译:人文研究人员正在以数字格式生成大量多样的语言和文学数据集。这些集合包括字典,叙词表,语料库,图像,音频和视频资源。互联网的进步和适应以及人文数据资源的数字化带来了这些数据集可用性的提高,这对人文研究人员提出了新的挑战。其中许多挑战与数据访问和使用有关,包括安全性,完整性,互操作性,信息检索,共享,许可和版权。 JISC资助的项目“语言和文学研究增强存储库”(ENROLLER; https://www.enroller.org.uk)正在通过开发有针对性的电子研究环境来解决这些问题。这项工作的关键组成部分是支持对各种语言和文学数据集进行大规模分析。为此,本文介绍了MapReduce算法的应用,该算法支持对这些数据集进行信息检索和语言分析。特别是,我们描述了如何使用MapReduce通过一系列高性能计算资源(包括英国国家网格服务(www.ngs.ac.uk)和ScotGrid(www.scotgrid.ac.uk))来提供高级批量搜索功能,以实现以下目的:在这个社区可以进行的研究类型中提供了一个逐步的改变。我们还将基于这些系统的应用提出性能分析结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号