...
首页> 外文期刊>BioData Mining >Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
【24h】

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

机译:MapReduce编程框架在临床大数据分析中的应用:当前形势和未来趋势

获取原文
           

摘要

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
机译:临床环境中海量数据集的出现给数据存储和分析带来了挑战和机遇。这种所谓的“大数据”挑战了传统的分析工具,并且将越来越需要适用于其他领域的新颖解决方案。信息和通信技术的进步为效率和可扩展性提供了最可行的大数据分析解决方案。大数据解决方案是多线程的,并且必须针对大量半结构化/非结构化数据量身定制数据访问方法,这一点至关重要。 MapReduce编程框架使用功能编程中的两个常见任务:Map和Reduce。 MapReduce是一个新的并行处理框架,而Hadoop是其在单个计算节点或群集上的开源实现。与现有的并行处理范例(例如,网格计算和图形处理单元(GPU))相比,MapReduce和Hadoop具有两个优点:1)容错存储通过复制计算任务以及在不同的数据块上进行克隆来实现可靠的数据处理计算集群中的计算节点; 2)通过批处理框架和Hadoop分布式文件系统(HDFS)进行高吞吐量数据处理。数据存储在HDFS中,并可供从属节点进行计算。在本文中,我们回顾了MapReduce编程框架及其实现平台Hadoop在临床大数据和相关医疗健康信息学领域的现有应用。在分布式系统上使用MapReduce和Hadoop代表了临床大数据处理和利用方面的重大进步,并在新兴的大数据分析时代开辟了新的机遇。本文的目的是总结临床大数据分析中的最新技术,并强调增强临床大数据分析工具的结果可能需要的工作。本文通过总结MapReduce编程框架和Hadoop平台在医疗卫生信息学相关领域中处理大量临床数据的潜在用途而得出结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号