Re: Issue when using rasterio dataset inside Class with multiprocessing

Matthew Perry
 

An open Rasterio dataset object should not be passed between multiple processes or threads; the underlying GDALDataset is not thread safe. Additionally, the dataset's lifecycle should be made explicit - either by explicitly calling .close or opening as a context manager (recommended).

It's not clear what your intention is with the `worker` function but I can see two ways to approach it, depending on your goal

if each process simply needs access to the array of data, I would read all of the data in __init__ and close out the dataset before invoking any parallel workers. Then you're just sharing a numpy array instead of a stateful dataset object.

    def __init__(self):
        with rasterio.open('/Users/mperry/work/rasterio/tests/data/RGB.byte.tif') as src:
            data = src.read()
 

if you need to read different parts of the dataset from each process, you should pass the dataset path and open/close the a new dataset within each thread. You can't share a dataset object between threads/procs but you can create multiple datasets pointing to the same resource.

Hope this helps.

Join main@rasterio.groups.io to automatically receive all group messages.