Technical Blog Post - Experiments with Kerchunk
Posted on August 2, 2023 (Last modified on October 19, 2023) • 1 min read • 208 wordsCheck out our technical blog post all about Kerchunk!
It covers how we have been experimenting with a library of python tools called Kerchunk to represent CEDA archive data for easier cloud access without the need to convert or duplicate data in cloud-optimised formats.
Kerchunk provides a single uniform method of representing chunked compressed data formats for cloud access without requiring format conversion. This is useful because converting archived data is impractical on a large scale and duplicating data in multiple formats requires a significant increase to the storage requirements for the archive.
This technical blog post would be interesting to anyone looking into cloud accessible analysis ready data. It may also be useful to anyone currently converting or looking to convert data into Zarr format, as Kerchunk provides an alternative with considerable advantages, like reduced computation time and storage requirements. Kerchunk files can also be updated due to changes in NetCDF files more quickly than reconverting to Zarr.
Experiments with different optimisation methods for kerchunk explored here include representing chunk data formulaically using tools from python libraries and custom syntax extending the usage of these tools to cover less uniform NetCDF file structure.
Find out more about our work with Kerchunk and our other technical projects here.