Topics

Does rasterio load data into memory while reprojecting?

Denis Rykov
 

Hi there!

I have a general question about reprojection. Consider the following example:

with rasterio.open(fname_src) as src:
    with rasterio.open(fname_dst, 'w', **kwargs) as dst:
        for i in range(1, src.count + 1):
            rasterio.warp.reproject(
                source=rasterio.band(src, i),
                destination=rasterio.band(dst, i),
                src_transform=src.transform,
                src_crs=src.crs,
                dst_transform=dest_affine,
                dst_crs=dst_crs,
                resampling=resampling)

As I can see from source code rasterio reads each band as an N-d array.
Does it mean that rasterio loads each band into memory?

My main question: can I use this code for warping rasters of arbitrary size (or I'm limited by memory size)?

Thanks.

Sean Gillies
 

Hi Denis,

On Wed, Jul 4, 2018 at 4:09 AM, Denis Rykov <rykovd@...> wrote:
Hi there!

I have a general question about reprojection. Consider the following example:

with rasterio.open(fname_src) as src:
    with rasterio.open(fname_dst, 'w', **kwargs) as dst:
        for i in range(1, src.count + 1):
            rasterio.warp.reproject(
                source=rasterio.band(src, i),
                destination=rasterio.band(dst, i),
                src_transform=src.transform,
                src_crs=src.crs,
                dst_transform=dest_affine,
                dst_crs=dst_crs,
                resampling=resampling)

As I can see from source code rasterio reads each band as an N-d array.
Does it mean that rasterio loads each band into memory?

My main question: can I use this code for warping rasters of arbitrary size (or I'm limited by memory size)?

Thanks.

Yes, you can warp arbitrarily large rasters when you use rasterio.band(). The warper chunks the work so that its working arrays are no more than 64 MB in size. You can increase this limit with reproject's warp_mem_limit keyword argument.

The source dataset's bands are incrementally loaded into memory as the warper works, but only up to the limit of GDAL's block cache (Rasterio uses the GDAL C library). If you profile your program and see the memory allocation increase, this is due to GDAL's caching. You can see also the notes at http://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp#WarpandCacheMemory:TechnicalDetails, which apply to Rasterio.

Please note that you can warp multiple imagery bands in one reproject() call by passing sequences of band indexes.

  reproject(source=rasterio.band(src, src.indexes), destination=rasterio.band(src, dst.indexes), ...)

This is faster than looping over the bands.

--
Sean Gillies

Denis Rykov
 

Thank you very much, Sean, for such detailed reply!