Topics

Define invalid regions of VRT file in a way that allows windowing


rasterio@...
 

Hi all,

I'm using a VRT file to represent a bunch of individual map tiles as one big image (following the help of Dion Haefner via this list - thanks again!). I noticed that some of the data is invalid but not marked as such (via the invalid data value). I'd like to define these regions as invalid using a file (e.g. a shapefile) in a way that would allow me to call `dataset.read(..., masked=True)` and get a masked array that masks out both data that has the dataset's invalid data value, and data that I manually define to be invalid in my file.

I found the examples on the site using Fiona to load a shapefile (https://rasterio.readthedocs.io/en/stable/topics/masking-by-shapefile.html) and use it to mask some data. I figured that would be useful, so I built a shapefile with my invalid regions as polygons. I'm now having trouble reading from my VRT file with these shapes masked out. I've got the following code:

import rasterio
import fiona

with rasterio.open(VRT_FILE) as dataset, fiona.open(SHAPE_FILE, "r") as shapefile:
shapes = [feature["geometry"] for feature in shapefile]
masked_data = rasterio.mask.mask(dataset, shapes, invert=True)
...

but this throws the error `MemoryError: Unable to allocate array with shape (216640, 282250) and data type bool`.

It seems the `rasterio.mask.mask` method tries to read the WHOLE dataset, and apparently has no way to define a window. Until now (before playing with shapefiles) I've been reading the dataset with `dataset.read(... window=Window(...))`, which lets me define a window to avoid reading the whole (huge) dataset at once. I'd like to do this for my shape-masked data too.

I have two questions:

- Is my approach to defining these invalid regions - i.e. with a shapefile - reasonable? Or is there a better way, ideally one that doesn't involve loading, manually masking and then writing a huge file back to disk?
- If my approach is indeed reasonable, then is there a way to create a `DatasetReader`-like object that supports `.read(...)` with a window parameter, using data masked from my shapefile?

Best wishes,


Sean


Sean Gillies
 

Hi Sean,

You're correct, mask() does require a dataset object (no numpy array) and will read the entire dataset. Can you use multiple, smaller, VRTs? Using a shapefile is fine. If you want to remove the dependence on fiona, you could serialize the shapes to JSON and use Python's json module to decode them.

On Tue, Dec 10, 2019 at 10:21 AM rasterio via Groups.Io <rasterio=attackllama.com@groups.io> wrote:
Hi all,

I'm using a VRT file to represent a bunch of individual map tiles as one
big image (following the help of Dion Haefner via this list - thanks
again!). I noticed that some of the data is invalid but not marked as
such (via the invalid data value). I'd like to define these regions as
invalid using a file (e.g. a shapefile) in a way that would allow me to
call `dataset.read(..., masked=True)` and get a masked array that masks
out both data that has the dataset's invalid data value, and data that I
manually define to be invalid in my file.

I found the examples on the site using Fiona to load a shapefile
(https://rasterio.readthedocs.io/en/stable/topics/masking-by-shapefile.html)
and use it to mask some data. I figured that would be useful, so I built
a shapefile with my invalid regions as polygons. I'm now having trouble
reading from my VRT file with these shapes masked out. I've got the
following code:

import rasterio
import fiona

with rasterio.open(VRT_FILE) as dataset, fiona.open(SHAPE_FILE, "r") as
shapefile:
     shapes = [feature["geometry"] for feature in shapefile]
     masked_data = rasterio.mask.mask(dataset, shapes, invert=True)
     ...

but this throws the error `MemoryError: Unable to allocate array with
shape (216640, 282250) and data type bool`.

It seems the `rasterio.mask.mask` method tries to read the WHOLE
dataset, and apparently has no way to define a window. Until now (before
playing with shapefiles) I've been reading the dataset with
`dataset.read(... window=Window(...))`, which lets me define a window to
avoid reading the whole (huge) dataset at once. I'd like to do this for
my shape-masked data too.

I have two questions:

  - Is my approach to defining these invalid regions - i.e. with a
shapefile - reasonable? Or is there a better way, ideally one that
doesn't involve loading, manually masking and then writing a huge file
back to disk?
  - If my approach is indeed reasonable, then is there a way to create a
`DatasetReader`-like object that supports `.read(...)` with a window
parameter, using data masked from my shapefile?

Best wishes,


Sean


--
Sean Gillies