Handle to underlying GDAL dataset


Sean Gillies
 

Hi Søren,

It's not possible for a GDAL dataset handle to be shared between rasterio and GDAL's Python bindings. I discourage people from combining these modules. Unless one is extremely careful and understands the internals of each, it's a great way to create perplexing bugs.

Rasterio's dataset.read() works at a higher level than GDAL's DatasetReadAsArray(). The latter is effectively the C function that rasterio uses internally. Rasterio has data type and nodata checking overhead that GDAL's function lacks and perhaps some logic that could be more optimized. Is the overhead really 30%? Would you be willing to use Python's timeit to race the last two lines of your code against each other? I would expect the difference to diminish as the size of the window grows.


On Tue, May 4, 2021 at 7:15 AM soren.rasmussen via groups.io <soren.rasmussen=alexandra.dk@groups.io> wrote:
Hi!

I would like to get a handle for the underlying GDAL dataset behind a RasterIO dataset.
Is this possible somehow?
 
The reason I want to do this is that, in some cases, GDAL operations are significantly faster. I have a case, where reading a subset of channels in a window with the just-released GDAL 3.3.0 is approximately 25% faster than RasterIO.
It would be nice to be able to use GDAL operations without having to open the dataset with both RasterIO and GDAL


from osgeo import gdal, gdal_array

rio_ds = rasterio.open(path)

gdal_ds = gdal.Open(path)
bands = [1,2,3]
resampling = rasterio.enums.Resampling.bilinear
win = ds.window(*bbox)
xoff, yoff, xsize, ysize = map(lambda x: int(round(x)), win.col_off, win.row_off, win.width, win.height)
rio_img = rio_ds.read(window=win, indexes=bands, out_shape=out_shape, resampling=resampling)
gdal_img = gdal_array.DatasetReadAsArray(gdal_ds, xoff=xoff, yoff=yoff, win_xsize=xsize, win_ysize=ysize, buf_xsize=out_shape[1], buf_ysize=out_shape[0], resample_alg=resampling, band_list=bands)
(Note that band_list is new in GDAL 3.3.0)

Best, Søren



--
Sean Gillies


soren.rasmussen@...
 

Hi!

I would like to get a handle for the underlying GDAL dataset behind a RasterIO dataset.
Is this possible somehow?
 
The reason I want to do this is that, in some cases, GDAL operations are significantly faster. I have a case, where reading a subset of channels in a window with the just-released GDAL 3.3.0 is approximately 25% faster than RasterIO.
It would be nice to be able to use GDAL operations without having to open the dataset with both RasterIO and GDAL


from osgeo import gdal, gdal_array

rio_ds = rasterio.open(path)

gdal_ds = gdal.Open(path)
bands = [1,2,3]
resampling = rasterio.enums.Resampling.bilinear
win = ds.window(*bbox)
xoff, yoff, xsize, ysize = map(lambda x: int(round(x)), win.col_off, win.row_off, win.width, win.height)
rio_img = rio_ds.read(window=win, indexes=bands, out_shape=out_shape, resampling=resampling)
gdal_img = gdal_array.DatasetReadAsArray(gdal_ds, xoff=xoff, yoff=yoff, win_xsize=xsize, win_ysize=ysize, buf_xsize=out_shape[1], buf_ysize=out_shape[0], resample_alg=resampling, band_list=bands)
(Note that band_list is new in GDAL 3.3.0)

Best, Søren