Re: multi-dimensional support

Norman Barker

I was one of the stakeholders for subdataset support in GDAL with netCDF and it worked well with what we were trying to achieve back then, serving regularly gridded time series netcdf data through a WCS, I believe others have used subdataset support in the same way. It was possible to make this work by using external indexes and subdatasets. 

I also agree with your comment that Rasterio is a relatively small project and the code needs to have active users.

The main benefit is a common api for multi-dimensional data access within gdal. Currently using gdalinfo against hdf, netcdf or TileDB requires reading the output to understand the available data, or writing a parser for each of these format driver's metadata. These drivers have no common way to advertise through an API the dimensions and attributes they support. Because implementing subdataset support has been a little adhoc the access patterns are slightly different across drivers, the new api enforces a convention.

Killer features? A couple come to mind; Accessing data cubes with a common api to retrieve data along a z dimension, or sliced by time. These use cases would benefit from being supported in rasterio and using xarray/dask to process multi-dimensional data.

I will create a strawman for the API changes and if you and the community are interested then I can start on the code. 


On Fri, Aug 23, 2019 at 7:51 AM Howard Butler <howard@...> wrote:

On Aug 23, 2019, at 9:29 AM, Sean Gillies <sean.gillies@...> wrote:

I'm also a bit concerned about the small number of stakeholders for the new GDAL API. It appears to be only the HDF Group (yes?) with only three GDAL TC members voting to adopt it. The rest of the GDAL community seemed ambivalent.

Most folks are ambivalent about multi-dimensional support in GDAL, and they were ambivalent about subdatasets before that (which were a deficient implementation in a number of ways which precipitated the RFC). The RFC moved things forward in a positive direction, and it wasn't just about giving HDFLand a clean mapping to GDAL. It was about giving GDALLand the ability to more easily speak to an additional family of raster-like data. 

GDAL drivers that speak zarr, TileDB, Arrow, and HDF can now be adapted without the miserable compromises that subdatasets required in usability and data fidelity. That will allow people to bring the GDAL geo goodness to their data without reformatting simply to push it through the tool. I think these generic data structures are seeing much more action because they allow data-level interop without special purpose drivers across multiple software runtimes. The winds are blowing the same direction in point cloud land too.

Rasterio is a pretty small project and, in my opinion, can't afford to develop code that isn't going to be widely used.

A completely reasonable position.


Join to automatically receive all group messages.