intake-virtual-icechunk#
An intake plugin for building and reading Icechunk stores via VirtualiZarr and intake-esm.
Concept#
The goal is a pipeline that takes a pre-built intake-esm datastore and produces a single virtual Icechunk store that mirrors its structure:
Open a pre-built intake-esm datastore with intake-esm.
For each dataset in the catalog, open the constituent files with VirtualiZarr to create virtual references — no data is copied.
Write each dataset as a named Zarr group inside one Icechunk store, using the catalog’s
groupby_attrsto derive the group name.Expose the result through an intake driver (
virtual_icechunk) that hides all Icechunk-specific complexity (sessions, stores, branches) behind an interface that feels like a hybrid of an esm-datastore and anxarray.Dataset— defaulting to Xarray semantics wherever possible, and falling back to esm-datastore conventions only where necessary (e.g. catalog search and group selection).
The end result is one Icechunk store, one group per dataset, fully virtual (no data
duplication), and accessible via intake.open_virtual_icechunk().
This package provides two things#
Building (
IcechunkStoreBuilder) — given a pre-built intake-esm catalog, creates virtual references with VirtualiZarr and writes each dataset as a named Zarr group inside a single Icechunk store.Reading (
IcechunkSource) — an intake driver for opening a group from an Icechunk store as anxarray.Datasetviaintake.open_virtual_icechunk().
Installation#
pip install intake-virtual-icechunk
License#
Apache-2.0. See LICENSE for details.
Get in touch#
If you encounter any issues with intake-virtual-icechunk or you’d like to request any new features, please open an issue here.