Re: rio stack with compression

Sean Gillies

Hi Luke,

I don't understand the issue entirely, but as Even Rouault explains in a "naive" use of the write() method of a Rasterio dataset (and rio-stack does use it naively) can result in redundant blocks of data written to the GeoTIFF file. The workarounds are to specify band interleaving (as you discovered), or increase GDAL_CACHEMAX to allow the entire output file to be kept in memory until it is written. Remember: a dataset's write() method causes data to be written to the GDAL block cache and data is written from that cache to the GeoTIFF file when the block is bumped out by new data coming in, or when the dataset object is closed.

These are the issues that I need to bring up in Rasterio's documentation.

Hope this helps!

On Thu, Sep 6, 2018 at 5:37 AM Luke Pinner <lukepinnerau@...> wrote:
I stacked some singleband rasters into a multiband raster using `rio stack` with a compress=deflate creation option.  The compressed output filesize was approx 1.4x the sum of the uncompressed input filesizes.  Specifying --co interleave=band when running `rio stack` got the output filesize down to 0.25x the uncompressed input (per issue #70).   Running `gdalbuildvrt` to stack them, then `rio convert` or `gdal_translate` on the VRT with the same creation options as the original rio stack command also results in an output raster of around 0.25x the uncompressed input, but with default pixel interleaving.

I realise you're planning to document issues with compression (#77) but in the meantime, do you have any idea why compression is so poor with default pixel interleaving using `rio stack` but quite ok when using `rio convert` (with default pixel interleaving) on an already stacked VRT?


Sean Gillies

Join to automatically receive all group messages.