Re: Create a Dataset from an array


Sean Gillies
 

Hi,

What you're asking for is possible, though not well abstracted by rasterio. It takes some extra GDAL configuration which I will try to explain.

GDAL's MEM format driver can read from an array that exists in memory. See https://gdal.org/drivers/raster/mem.html#dataset-name-format. So reading an existing numpy array at a very low level in GDAL is a matter of describing the numpy array's address and layout to GDAL. That information can be had from an array's "array interface". For example:

>>> import numpy
>>> arr = numpy.array(range(9), dtype="uint8").reshape(3,3)
>>> arr.__array_interface__
{'data': (22236512, False), 'strides': None, 'descr': [('', '|u1')], 'typestr': '|u1', 'shape': (3, 3), 'version': 3}

The GDAL dataset connection name for this is

>>> descr = "MEM:::DATAPOINTER=22236512,PIXELS=3,LINES=3,BANDS=1,DATATYPE=Byte" 

You can open this with rasterio in "r+" mode and attach a CRS and geo transform.

>>> with rasterio.open(descr, "r+") as dataset:
...     dataset.crs = "EPSG:4326"
...     dataset.transform = Affine.scale(2, -2) * Affine.identity()
...     print(dataset.profile)
...     print(dataset.read())
...
{'driver': 'MEM', 'dtype': 'uint8', 'nodata': None, 'width': 3, 'height': 3, 'count': 1, 'crs': CRS.from_epsg(4326), 'transform': Affine(2.0, 0.0, 0.0,
       0.0, -2.0, 0.0), 'tiled': False}
[[[0 1 2]
  [3 4 5]
  [6 7 8]]]

I don't have any such code in production so can't say anything about the performance or reliability. Obviously you have to be careful with the data pointer to avoid crashes.

On Thu, Aug 5, 2021 at 3:39 PM <ayr035@...> wrote:
Hi all,

I'm struggling to find a good way to create a dataset object from a numpy array. Does such a workflow exist in rasterio? I'm trying to avoid any unnecessary memory copies as this numpy array could be quite large and I definitely need to avoid writing to disk if at all possible.

I'm trying to do some pre-processing on rasters before a merge and need to wrap an array in a dataset-like object that supports resampling, windowed reading, etc for rasterio.merge. I saw an InMemoryRaster class which seems like it could work, but unfortunately, the read method of InMemoryRaster lacks API compatibility with Dataset.read. Is there a solution that is escaping me? In my ideal world, something like DatasetReader.from_array(arr, **arr_profile) where a dataset could be created directly from an array would be incredibly useful. Are there existing solutions to accomplish this


--
Sean Gillies

Join main@rasterio.groups.io to automatically receive all group messages.