Issue when using rasterio dataset inside Class with multiprocessing
I have a simple class which has a member variable of rasterio dataset. Inside the class, there is also a function wrapped by a python multiprocessing call. See below
This code is supposed to print 'Worker 0', 'Worker 1' and 'Worker 2'. However, when I actually ran this code, it printed nothing but exited normally.
I then tried to comment out the line reading the tif image using rasterio, which looks like this
This time it printed out text as I expected.
Is there any possible reason that causes this issue? Thanks!
An open Rasterio dataset object should not be passed between multiple processes or threads; the underlying GDALDataset is not thread safe. Additionally, the dataset's lifecycle should be made explicit - either by explicitly calling .close or opening as a context manager (recommended).
It's not clear what your intention is with the `worker` function but I can see two ways to approach it, depending on your goal
if each process simply needs access to the array of data, I would read all of the data in __init__ and close out the dataset before invoking any parallel workers. Then you're just sharing a numpy array instead of a stateful dataset object.
with rasterio.open('/Users/mperry/work/rasterio/tests/data/RGB.byte.tif') as src:
data = src.read()
if you need to read different parts of the dataset from each process, you should pass the dataset path and open/close the a new dataset within each thread. You can't share a dataset object between threads/procs but you can create multiple datasets pointing to the same resource.
Hope this helps.