Re: [rasterio] multi-dimensional support

scott
 

Hi Norman and Sean,

Thanks for discussion this! Just wanted to chime in based on Norman's suggestion. I'm currently involved in a NASA-funded project to facilitate analysis of Cloud-based data archives, and as part of the Pangeo project we are really pushing for contributions to established Python packages. We've been using rasterio extensively to load single image files and multiband VRTs. xarray.open_rasterio() has been a great example of Python tools working very well together.

Currently it is a bit awkward to work with multidimensional data with subdatasets in gdal/rasterio. Alternatively, there are format-specific libraries and readers out there (xarray.open_dataset(), h5py, satpy), but I agree there would be a lot of value in a standard access pattern through rasterio, which other libraries could then inherit for I/O tasks. Xarray for example does not currently account for crs, which has been the topic of a lot of discussion (https://github.com/pydata/xarray/issues/2288 ), and writing is currently limited to netCDF and Zarr. I think the current state of things illustrates that the extendibility of Python is both an advantage and disadvantage for users, because people (especially newcomers) are confused by which packages to use and often end up with hodgepodge solutions.

One argument for incorporating the multidimensional API is that there is a tremendous amount out of netCDF and HDF data out there (in fact all NASA data is archived in these formats) and people are interested in translating to more Cloud-friendly formats (see for example https://github.com/pangeo-data/pangeo/issues/686 , https://github.com/pangeo-data/pangeo/issues/120 ). So, one timely use-case of multidimensional support would be transposing existing archives of HDF files for time series analysis and storing as tileDB on S3 or GCS. Or, as Norman mentioned, build multidimensional VRTs of COGs to sample in Time in addition to X and Y. Another common use-case is 1) open a large multidimensional netCDF file, 2) run some dimensionality-reducing analysis with your favorite Python library, 3) save the resulting 2D Geotiff.

My go-to place for any raster format conversion is gdal, and if python code is involved I first turn to rasterio. So if there is motivation to try to bring this new gdal feature into rasterio I'm very interested to see where it leads!

--Scott

Join dev@rasterio.groups.io to automatically receive all group messages.