首页> 外文会议>SPIE Medical Imaging Conference >Scalable Storage of Whole Slide Images and Fast Retrieval of Tiles Using Apache Spark
【24h】

Scalable Storage of Whole Slide Images and Fast Retrieval of Tiles Using Apache Spark

机译:使用Apache Spark的整个幻灯片图像的可扩展存储和快速检索瓷砖

获取原文

摘要

Whole slide images (WSIs) can greatly improve the workflow of pathologists through the development of software for automatic detection and analysis of cellular and morphological features. However, the gigabyte size of a WSI poses serious challenge for scalable storage a,nd fast retrieval, which is essential for next-generation image analytics. In this paper, we propose a system for scalable storage of WSIs and fast retrieval of image tiles using Apache Spark, a space-filling curve, and popular data storage formats. We investigate two schemes for storing the tiles of WSIs. In the first scheme, all the WSIs were stored in a single table (partitioned by certain table attributes for fast retrieval). In the second scheme, each WSI is stored in a separate table. The records in each table are sorted using the index values assigned by the space-filling curve. We also study two data storage formats for storing WSIs: Parquet and ORC (Optimized Row Columnar). Through performance evaluation on a 16-node cluster in CloudLab, we observed that ORC enables faster retrieval of tiles than Parquet and requires 6 times less storage space. We also observed that the two schemes for storing WSIs achieved comparable performance. On an average, our system took 2 secs to retrieve a single tile and less than 6 seconds for 8 tiles on up to 80 WSIs. We also report the tile retrieval performance of our system on Microsoft Azure to gain insight on how the underlying computing platform can affect the performance of our system.
机译:整个幻灯片图像(WSIS)可以通过开发软件来大大提高病理学家的工作流程,用于自动检测和分析细胞和形态特征。然而,WSI的千兆字节大小对可扩展存储A,ND快速检索构成了严峻的挑战,这对于下一代图像分析至关重要。在本文中,我们提出了一种用于使用Apache Spark,空格填充曲线和流行数据存储格式的可扩展存储和快速检索图像瓦片的系统。我们调查用于存储WSI的瓷砖的两种方案。在第一个方案中,所有WSI都存储在单个表中(由某些表属性进行划分以进行快速检索)。在第二种方案中,每个WSI存储在单独的表中。每个表中的记录使用空格填充曲线分配的索引值对排序。我们还研究了用于存储WSIS的两种数据存储格式:镶木地板和兽人(优化的行柱状)。通过对CloudLab中的16节点群集的性能评估,我们观察到ORC能够比木质划分更快地检索瓷砖,并且存储空间少6倍。我们还观察到,用于存储WSIS的两种方案取得了可比的性能。平均而言,我们的系统拍摄了2秒钟,以检索单个磁贴,少于6秒,最多80个WSI。我们还报告了我们在Microsoft Azure上的系统的瓷砖检索性能,以了解底层计算平台如何影响系统性能的洞察力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号