Read using multithreading


Carlos García Rodríguez
 

Hello, I have an application that calls rasterio to open rasters in its main process. This application allows me to parallelize the process, so it will call the function where rasterio open is many times in parallel.
If I open and close the image every time by doing

with rasterio.open(self.file) as src:
    raster = src.read(window=window)

The process goes fine, but it has to open and close at every step of the raster (losing a bit of time). Then I tried to let the image opened at the beginning of the process but it is not stable. If I don't have too many workers, the application might work well, but if I increase the number of workers it will crash with the following error:

RasterioIOError: Caught RasterioIOError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "rasterio/_io.pyx", line 707, in rasterio._io.DatasetReaderBase._read
File "rasterio/shim_rasterioex.pxi", line 133, in rasterio._shim.io_multi_band
File "rasterio/_err.pyx", line 182, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: ./sentinel2_tiled.tif, band 1: IReadBlock failed at X offset 58, Y offset 92: TIFFReadEncodedTile() failed.

What do you recommend me to do?

Thank you.


Sean Gillies
 

Hi,

On Wed, Apr 29, 2020 at 2:54 AM Carlos García Rodríguez <carlogarro@...> wrote:

Hello, I have an application that calls rasterio to open rasters in its main process. This application allows me to parallelize the process, so it will call the function where rasterio open is many times in parallel.
If I open and close the image every time by doing

with rasterio.open(self.file) as src:
    raster = src.read(window=window)

The process goes fine, but it has to open and close at every step of the raster (losing a bit of time). Then I tried to let the image opened at the beginning of the process but it is not stable. If I don't have too many workers, the application might work well, but if I increase the number of workers it will crash with the following error:

RasterioIOError: Caught RasterioIOError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "rasterio/_io.pyx", line 707, in rasterio._io.DatasetReaderBase._read
File "rasterio/shim_rasterioex.pxi", line 133, in rasterio._shim.io_multi_band
File "rasterio/_err.pyx", line 182, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: ./sentinel2_tiled.tif, band 1: IReadBlock failed at X offset 58, Y offset 92: TIFFReadEncodedTile() failed.

What do you recommend me to do?

Thank you.

Dataset files can be accessed from multiple threads, but the dataset objects returned by rasterio.open can only be used by a single thread at a time. This is a constraint that we get from GDAL. You could try creating a set of dataset objects and then distribute them to the worker threads, making sure not to share them.

--
Sean Gillies