Topics

Issue when using rasterio dataset inside Class with multiprocessing

luoyntech@...
 

I have a simple class which has a member variable of rasterio dataset. Inside the class, there is also a function wrapped by a python multiprocessing call. See below



This code is supposed to print 'Worker 0', 'Worker 1' and 'Worker 2'. However, when I actually ran this code, it printed nothing but exited normally.




I then tried to comment out the line reading the tif image using rasterio, which looks like this


This time it printed out text as I expected.




Is there any possible reason that causes this issue? Thanks!

Matthew Perry
 

An open Rasterio dataset object should not be passed between multiple processes or threads; the underlying GDALDataset is not thread safe. Additionally, the dataset's lifecycle should be made explicit - either by explicitly calling .close or opening as a context manager (recommended).

It's not clear what your intention is with the `worker` function but I can see two ways to approach it, depending on your goal

if each process simply needs access to the array of data, I would read all of the data in __init__ and close out the dataset before invoking any parallel workers. Then you're just sharing a numpy array instead of a stateful dataset object.

    def __init__(self):
        with rasterio.open('/Users/mperry/work/rasterio/tests/data/RGB.byte.tif') as src:
            data = src.read()
 

if you need to read different parts of the dataset from each process, you should pass the dataset path and open/close the a new dataset within each thread. You can't share a dataset object between threads/procs but you can create multiple datasets pointing to the same resource.

Hope this helps.