Loading large ENVI rasters into a MemoryFile

nickubels@...
 

Hello,

 

In my project I’m dealing with 217 hyperspectral raster files in ENVI format. These rasters contain 420 bands, this means that I have to deal with files that are 30GB+ in size. To keep stuff maintainable I’m working on a a high performance cluster where I can perform the calculations on these rasters in parallel by splitting it into several tasks. Primary operation in this case is grabbing masks from these rasters (I’m only interested in the raster parts in building polygons). As I/O is a bit of a bottleneck on the cluster I was considering loading the raster into RAM to speed up processing as reading from the network storage isn’t fast enough. I also tried copying the file to the local disk in the node, but when several of the subtasks are assigned to the same node things quickly grind to a halt. 

However, I can’t really get MemoryFile to work. I tried two approaches:

data = open(rasterpath, ‘rb’).read()
with MemoryFile(data) as memfile:
with memfile.open() as raster:
print(raster.profile)


And

data = rasterio.open(rasterpath).read()
with MemoryFile(data) as memfile:
with memfile.open() as raster:
print(raster.profile)


In the first case, I’m getting the following stack trace:

raceback (most recent call last):
  File "rasterio/_base.pyx", line 199, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 188, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsimem/9ce098cf-1d79-4a73-88e4-5ada1bd35b1f.' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/s2912279/bachelorproject/Code/generate_trainingset.py", line 118, in <module>
    with memfile.open() as raster:
  File "/home/s2912279/bachelorproject/Code/venv/lib/python3.6/site-packages/rasterio/env.py", line 366, in wrapper
    return f(*args, **kwds)
  File "/home/s2912279/bachelorproject/Code/venv/lib/python3.6/site-packages/rasterio/io.py", line 130, in open
    return DatasetReader(vsi_path, driver=driver, **kwargs)
  File "rasterio/_base.pyx", line 201, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/vsimem/9ce098cf-1d79-4a73-88e4-5ada1bd35b1f.' not recognized as a supported file format.

In the second case the following stack trace is generated with very poor performance (it takes a good 25 minutes to load 30GB, where as the above takes about 3 minutes to load all data into RAM.

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 199, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 188, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsimem/9ce098cf-1d79-4a73-88e4-5ada1bd35b1f.' not recognized as a supported file format.

 

I’m doubting if my way of using the MemoryFile functionality is the correct way. Is there something I’m doing wrong or am I missing something?

 

Kind regards,

 

Nick

Join main@rasterio.groups.io to automatically receive all group messages.