Define invalid regions of VRT file in a way that allows windowing


rasterio@...
 

Hi all,

I'm using a VRT file to represent a bunch of individual map tiles as one big image (following the help of Dion Haefner via this list - thanks again!). I noticed that some of the data is invalid but not marked as such (via the invalid data value). I'd like to define these regions as invalid using a file (e.g. a shapefile) in a way that would allow me to call `dataset.read(..., masked=True)` and get a masked array that masks out both data that has the dataset's invalid data value, and data that I manually define to be invalid in my file.

I found the examples on the site using Fiona to load a shapefile (https://rasterio.readthedocs.io/en/stable/topics/masking-by-shapefile.html) and use it to mask some data. I figured that would be useful, so I built a shapefile with my invalid regions as polygons. I'm now having trouble reading from my VRT file with these shapes masked out. I've got the following code:

import rasterio
import fiona

with rasterio.open(VRT_FILE) as dataset, fiona.open(SHAPE_FILE, "r") as shapefile:
shapes = [feature["geometry"] for feature in shapefile]
masked_data = rasterio.mask.mask(dataset, shapes, invert=True)
...

but this throws the error `MemoryError: Unable to allocate array with shape (216640, 282250) and data type bool`.

It seems the `rasterio.mask.mask` method tries to read the WHOLE dataset, and apparently has no way to define a window. Until now (before playing with shapefiles) I've been reading the dataset with `dataset.read(... window=Window(...))`, which lets me define a window to avoid reading the whole (huge) dataset at once. I'd like to do this for my shape-masked data too.

I have two questions:

- Is my approach to defining these invalid regions - i.e. with a shapefile - reasonable? Or is there a better way, ideally one that doesn't involve loading, manually masking and then writing a huge file back to disk?
- If my approach is indeed reasonable, then is there a way to create a `DatasetReader`-like object that supports `.read(...)` with a window parameter, using data masked from my shapefile?

Best wishes,


Sean

Join main@rasterio.groups.io to automatically receive all group messages.