Topics

Can't read in a Landsat geotiff and use multiple threads to write out different windows at the same time


Ryan Avery
 

I moved this github issue over here since it seems like more of a usage/help question. Below is the issue with a reproducible example if you have a Landsat tif.
https://github.com/mapbox/rasterio/issues/1681

The program fails at different windows when writing out windows with threads. This issue is related to this issue https://github.com/mapnik/node-mapnik/issues/437#issuecomment-103806098

I tried setting VRT_SHARED_SOURCE=0 with the rasterion.Env() context manager to allow threads to not step on each other but it didn't do anything.

Have others encountered this issue and found a solution?


Sean Gillies
 

The VRT_SHARED_SOURCE option only affects the connection pool within the VRT driver code and that's not relevant to your code as far as I can see.

A single opened rasterio dataset cannot be safely used by multiple threads, concurrently. There is more about this limitation at https://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe.

Each of your threads will need exclusive use of a dataset object. The simplest way to achieve this is to call rasterio.open(..., sharing=False) in new threads to get a new dataset and then close it before the thread is joined. Another way is to create a suitably-sized pool of dataset objects (also using the sharing=False option) within your application and then assign these to your threads on a as-needed basis.

I hope this helps,

On Mon, May 6, 2019 at 7:02 PM <ravery@...> wrote:

I moved this github issue over here since it seems like more of a usage/help question. Below is the issue with a reproducible example if you have a Landsat tif.
https://github.com/mapbox/rasterio/issues/1681

The program fails at different windows when writing out windows with threads. This issue is related to this issue https://github.com/mapnik/node-mapnik/issues/437#issuecomment-103806098

I tried setting VRT_SHARED_SOURCE=0 with the rasterion.Env() context manager to allow threads to not step on each other but it didn't do anything.

Have others encountered this issue and found a solution?



--
Sean Gillies


Ryan Avery
 

Thanks Sean, very helpful and makes sense.


On Tue, May 7, 2019 at 9:30 AM Sean Gillies <sean.gillies@...> wrote:
The VRT_SHARED_SOURCE option only affects the connection pool within the VRT driver code and that's not relevant to your code as far as I can see.

A single opened rasterio dataset cannot be safely used by multiple threads, concurrently. There is more about this limitation at https://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe.

Each of your threads will need exclusive use of a dataset object. The simplest way to achieve this is to call rasterio.open(..., sharing=False) in new threads to get a new dataset and then close it before the thread is joined. Another way is to create a suitably-sized pool of dataset objects (also using the sharing=False option) within your application and then assign these to your threads on a as-needed basis.

I hope this helps,

On Mon, May 6, 2019 at 7:02 PM <ravery@...> wrote:

I moved this github issue over here since it seems like more of a usage/help question. Below is the issue with a reproducible example if you have a Landsat tif.
https://github.com/mapbox/rasterio/issues/1681

The program fails at different windows when writing out windows with threads. This issue is related to this issue https://github.com/mapnik/node-mapnik/issues/437#issuecomment-103806098

I tried setting VRT_SHARED_SOURCE=0 with the rasterion.Env() context manager to allow threads to not step on each other but it didn't do anything.

Have others encountered this issue and found a solution?



--
Sean Gillies



--
Ryan Avery
Graduate Student, WAVES Lab
Department of Geography
University of California, Santa Barbara