Skip to content

ksharonin/kerchunkC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KerchunkC - Draft Extension of Kerchunk into C/C++ functionality

Summary

C++ extensions of driver to read virtual Zarr datasets described in JSON metadata format. Optimize s3 reading performance and mult-dimensional reconstruction of array datasets.

See issue here: Unidata/netcdf-c#2777

Latest List of TODOs

  • Enable absolute indexing (abstract chunks such that the index for h5 is automatically mapped)
  • Enable local file reading (vs. just AWS remote bucket reading for stream of bytes)
  • H5Coro Integration (see SlideRule repository)

C++ Files (../code/c++)

  • Call chain overview:
    • main.cpp -> json_parse.h -> kerchunk_read.h -> prin_helpers.h -> mult_dim_form.h -> Finish

src

  • main.cpp: main entry point for program; key calls include: json_parse() and kerchunk_read()

Include

  • config.h: inputs and settings for program
    • e.g. HARDCODED_CHUNK_INDEX, HARDCODED_JSON_PATH
  • custom_structs.h: hold custom structs shared across program (excluding layer_t)
  • json_parse.h: given json path, parse out metdata relevant for all chunks and for index specific chunks
  • json.hpp: nholmann json processing library
  • iter_chunk.h: coordinate metadata extraction and runs for multiple chunk indexes
  • kerchunk_read.h: given JSON metadata, read s3 stream and perform decompression, shuffling, etc until original array obtained. Calls on mult_dim_form.h to regain full dimensions
  • mult_dim_form.h: given flat array, reconstruct the full dimensions as originally stored (bytes read as single flat dimensions from s3 stream)
  • print_helpers.h: debug printer functions, controlled by the constant DEBUG_PRINT_ON in config.h

Jupyter Files (../code/jupyter)

  • make_kerchunk_refs.ipynb: ipynb to generate JSON metadata from select s3 object
  • range_req_dynamic.ipynb: python edition of kerchunk process; use as verification and testing of c++ addition. Includes s3 byte stream, zlib decompression, unshuffle, dtype processing, and xarray comparison

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors