An Introduction to DosNA

Distributed Numpy Arrays for High-performance cloud computing

Abstract

The cloud-primarily deals with data as object stores such as S3; however, HPC data processing is primarily done using filesystems such as HDF5, which can make offloading data to the cloud difficult. DosNa is a python wrapper that can distribute N-dimensional arrays over an Object Store server. The main goal of DosNa is to provide an easy and seamless interface to store and manage N-Dimensional datasets over a remote cloud. It supports S3 and Ceph backends and allows parallelised data access through the MPI engine. Currently, features to allow for converting HDF5 files to DosNa Objects, an API to visualise data, object locking, BLOSC compression, and checksums are underway. This talk introduces DosNa and showcases the current features and what’s to come.

Date
Mar 29, 2022 2:15 PM
Location
Francis Crick Institute
1 Midland Rd, London, NW1 1AT
Gabryel Mason-Williams
Gabryel Mason-Williams
PhD student Computer Science and Research Software Associate