期刊论文详细信息
Journal of Big Data
Performance-efficient distributed transfer and transformation of big spatial histopathology datasets in the cloud
Esma Yildirim1 
[1] Department of Mathematics and Computer Science, Queensborough Community College of CUNY, New York, USA;
关键词: Big data applications;    Content-based image retrieval;    Cloud networks;    Distributed transfer algorithms;    Digital microscopy;    Whole slide images;   
DOI  :  10.1186/s40537-021-00546-3
来源: Springer
PDF
【 摘 要 】

Whole Slide Image (WSI) datasets are giga-pixel resolution, unstructured histopathology datasets that consist of extremely big files (each can be as large as multiple GBs in compressed format). These datasets have utility in a wide range of diagnostic and investigative pathology applications. However, the datasets present unique challenges: The size of the files, propriety data formats, and lack of efficient parallel data access libraries limit the scalability of these applications. Commercial clouds provide dynamic, cost-effective, scalable infrastructure to process these datasets, however, we lack the tools and algorithms that will transfer/transform them onto the cloud seamlessly, providing faster speeds and scalable formats. In this study, we present novel algorithms that transfer these datasets onto the cloud while at the same time transforming them into symmetric scalable formats. Our algorithms use intelligent file size distribution, and pipelining transfer and transformation tasks without introducing extra overhead to the underlying system. The algorithms, tested in the Amazon Web Services (AWS) cloud, outperform the widely used transfer tools and algorithms, and also outperform our previous work. The data access to the transformed datasets provides better performance compared to the related work. The transformed symmetric datasets are fed into three different analytics applications: a distributed implementation of a content-based image retrieval (CBIR) application for prostate carcinoma datasets, a deep convolutional neural network application for classification of breast cancer datasets, and to show that the algorithms can work with any spatial dataset, a Canny Edge Detection application on satellite image datasets. Although different in nature, all of the applications can easily work with our new symmetric data format and performance results show near-linear speed-ups as the number of processors increases.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202203043224166ZK.pdf 2580KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:8次