Creating 12GB Create very large empty geotif with Rasterio without using large amounts of memory


Hobart, Geordie (NRCan/RNCan)
 

Hi there.

I was wondering if there was a way to create a multiband 12 GB geotif file with rasterio without consuming a large amount of memory.

I’m working on a shared system where resources are limited and requesting a large amount of memory for this job will take a long time to be scheduled.

I’ve come across some older solutions using other python libraries but would prefer to stick with rasterio if possible.

Thanks so much.

Geordie Hobart

CFS, NRCan  /  RNCan

 

From: main@rasterio.groups.io <main@rasterio.groups.io> On Behalf Of Sean Gillies via groups.io
Sent: June 21, 2021 12:42
To: main@rasterio.groups.io
Subject: Re: [rasterio] Shape of resulting merged image is not what I expect

 

Hi,

 

I'm happy report that you won't have this problem with version 1.2.5, which is coming soon to PyPI.

 

As a part of the work to fix the bug noted at https://github.com/mapbox/rasterio/blob/maint-1.2/CHANGES.txt#L10, we replaced a call to math.ceil() to a call of round() in a key place and now the output shape of merge is what you expect.

 

 

On Thu, Jun 10, 2021 at 9:02 AM <juanjo.gomeznavarro@...> wrote:

[Edited Message Follows]
[Reason: typos]

I'm not sure if this is a bug or a subtle unexpected behaviour of the function rasterio.merge.merge.

I'm creating a mosaic with several images for the whole globe. The resolution is such that the image should be 16000x8001 (width x height). The calculation of the resolution is done through these variables:

pixels = 16000

res = 360 / pixels

xmin = -180 - res / 2

xmax = 180 - res / 2

ymin = -90 - res / 2

ymax = 90 + res / 2

Which result in xmin, xmax, ymin, ymax = (-180.01125, 179.98875, -90.01125, 90.01125). OK, the problem is that when I use merge as:  
mosaic, out_trans = merge(files, bounds=(xmin, ymin, xmax, ymax), res=res,  method='max')

The shape os mosaic is NOT (4, 16000, 8001), but (4, 16000, 8002) (note the extra row).

I think the problem comes from a rounding issue. In my python terminal (ymax-ymin)/res = 8001.000000000001 (whereas (xmax-xmin)/res = 16000.0). This rounding error of 1E-12 seems to be forcing the number of rows to be 8002 instead of 8001. This might seem like a tiny approximation error, but the problem is that controlling the shape of the image is very important for further calculations downstream of the algorithm.


Is this what it is supposed to happen? Is there a way to "force" merge to produce an image of the expected shape 16000x8001?

Thanks.

  

--

Sean Gillies


yon.davies@...
 

Hey Hobart,

I've had success using Rasterio's windowed reading and writing to create a huge raster in chunks. You could do something like this:

# Create blank raster that will hold the data. Make sure the dst_profile has compression to make this small
with rio.open(r1, "w", **dst_profile) as dst:
       dst.write(np.zeros(shape=(x, y, z), height, width), dtype=dst_dtype))

# Get block windows from this blank raster
with rio.open(r1, "r") as src:
      block_windows = list(src.block_windows())
      for j, window in enumerate(block_windows):
            dst.write(stacked, window=window[1])

The block_windows are automatically created and usually, but you can play around and create rasterio.windows.Window() with a size of your choosing.

Creating the initial blank raster will require that you have enough memory to hold the np.zeros, which may not be true. In that case, you can create the initial blank raster on the disk using gdal_create in command line or in Python:

import gdal
import osr
driver = gdal.GetDriverByName('GTiff')
ds = driver.Create(outfn, xsize, ysize, nbands, dtype, options=['COMPRESS=LZW', 'TILED=YES'])

      


Alan Snow
 
Edited

Reading & writing with windows can be done with rioxarray (rasterio + xarray):

import rioxarray

rds = rioxarray.open_rasterio("input.tif", lock=False)
rds.rio.to_raster("output.tif", windowed=True)

You can also use dask to read/write in parallel:

Hope this helps,
Alan


Hobart, Geordie (NRCan/RNCan)
 

Thank you Alan. Good to know. Kind regards. Geordie

 

From: main@rasterio.groups.io <main@rasterio.groups.io> On Behalf Of Alan Snow
Sent: June 25, 2021 09:06
To: main@rasterio.groups.io
Subject: Re: [rasterio] Creating 12GB Create very large empty geotif with Rasterio without using large amounts of memory

 

Reading & writing with windows can be done with rioxarray (rasterio + xarray):

import rioxarray

rds = rioxarray.open_rasterio("input.tif")
rds.rio.to_raster("output.tif", windowed=True)

You can also use dask to read/write in parallel:


Hope this helps,
Alan


Hobart, Geordie (NRCan/RNCan)
 

Thanks Yon.

I want to avoid making the 12 GB numpy zero array and the associated memory demand so solution two is the best bet.

Greatly appreciated.

Geordie

 

From: main@rasterio.groups.io <main@rasterio.groups.io> On Behalf Of yon.davies@...
Sent: June 25, 2021 08:44
To: main@rasterio.groups.io
Subject: Re: [rasterio] Creating 12GB Create very large empty geotif with Rasterio without using large amounts of memory

 

Hey Hobart,

I've had success using Rasterio's windowed reading and writing to create a huge raster in chunks. You could do something like this:

# Create blank raster that will hold the data. Make sure the dst_profile has compression to make this small

with rio.open(r1, "w", **dst_profile) as dst:

       dst.write(np.zeros(shape=(x, y, z), height, width), dtype=dst_dtype))

# Get block windows from this blank raster

with rio.open(r1, "r") as src:
      block_windows = list(src.block_windows())

      for j, window in enumerate(block_windows):
            dst.write(stacked, window=window[1])

The block_windows are automatically created and usually, but you can play around and create rasterio.windows.Window() with a size of your choosing.

Creating the initial blank raster will require that you have enough memory to hold the np.zeros, which may not be true. In that case, you can create the initial blank raster on the disk using gdal_create in command line or in Python:

import gdal

import osr

driver = gdal.GetDriverByName('GTiff')
ds = driver.Create(outfn, xsize, ysize, nbands, dtype, options=['COMPRESS=LZW', 'TILED=YES'])


      


Luke
 

You don't need to pre-populate a dataset with a massive empty array, you can just open it in write mode and write chunks of data via windowed writing:

profile = {
    'driver': 'GTiff',
    'tiled': True,
    'height': huge_number,
    'width': huge_number,
    'count': 1,
    'dtype': 'float32',
    'blockxsize': 256,
    'blockysize': 256,
    'compress': 'LZW'
}
 
with rasterio.open('massive.tif', 'w', **profile) as dst:
    for window in dst.block_windows():
        dst.write(small_data_chunk, window=window, indexes=1)



Hobart, Geordie (NRCan/RNCan)
 

Thanks Luke. And everyone else for all great the suggestions. Much appreciated.

 

From: main@rasterio.groups.io <main@rasterio.groups.io> On Behalf Of Luke
Sent: June 25, 2021 18:10
To: main@rasterio.groups.io
Subject: Re: [rasterio] Creating 12GB Create very large empty geotif with Rasterio without using large amounts of memory

 

You don't need to pre-populate a dataset with a massive empty array, you can just open it in write mode and write chunks of data via windowed writing:

profile = {

    'driver': 'GTiff',

    'tiled': True,

    'height': huge_number,

    'width': huge_number,

    'count': 1,

    'dtype': 'float32',

    'blockxsize': 256,

    'blockysize': 256,

    'compress': 'LZW'

}

 

with rasterio.open('massive.tif', 'w', **profile) as dst:

    for window in dst.block_windows():

        dst.write(small_data_chunk, window=window, indexes=1)