Topics

Creating rasterio dataset without IO

Nic Annau
 

Hello,

The `rasterio.mask.mask` function is great for masking a raster using a polygon, however, I'm hoping to eliminate unnecessary IO and provide `rasterio.mask.mask` with a modified raster instead of loading it from file. 

Specifically, I am loading a NetCDF file as an `xarray.Dataset`, and re-gridding that file before continuing to mask it. I would love to not have to write that re-gridded `xrray` object to file before re-loading it and masking it at its new resolution.

My ultimate question is: Is there a way to convert a `numpy.ndarray`, `xarray.Dataset`, or other convenient object type into a `rasterio.io.Dataset` without having to load that data from file?

think there might be a solution somewhere using `MemoryFile`, but implementing this is not clear to me. 

My workflow is currently something like this:
```
import rasterio
import xarray as xr

ds = xr.open_dataset('path/to/file.nc')
rds = regrid_ds(ds)

# my goal is to eliminate this step
rds.to_netcdf('path/to/regridded_file.nc')

# and eliminate this step
dataset = rasterio.open('path/to/regridded_file.nc')

# and create a dataset from another object
m = mask.mask(dataset, polygon)

# continue with analysis
```

Thanks for your time!
Nic

Alan Snow
 

Hi Nic,

I think you are looking for rioxarray. An example of what you want to do is here:
https://corteva.github.io/rioxarray/stable/examples/clip_geom.html

Best,
Alan

Brendan Ward
 

Nic,

if your goal is to mask the data, you can replicate some of the processing steps in the `mask` chain:
https://github.com/mapbox/rasterio/blob/master/rasterio/mask.py

Specifically, you'd need to figure out the transform and output shape for your mask based on your source data and polygon, then it is just a matter of calling geometry_mask with those parameters:
https://github.com/mapbox/rasterio/blob/1.1.3/rasterio/mask.py#L108-L109

Then apply that mask to your data, which you've already prepared in a prior step.

Having the dataset as a rasterio.io.Dataset gives us access to many of the properties we need to be able to calculate those parameters.  To create a mask we don't actually need to read data from the dataset, it's only when the mask is applied that we need those data.  This use case seems outside the intent for MemoryFile.

I hope that helps.

Nic Annau
 
Edited

Thanks for the suggestion.

Those functions are exactly what I would like to use, but I am having troubles with them. I think it's a rasterio issue. 

I have tried using rioxarray and XGeo, but the problem is the custom projection crs I must provide when clipping. rasterio isn't playing well with it, and I think it's worthy of opening a GitHub issue. These clipping functions seem to not like custom projections - and only work for projections with a nice EPSG reference. 

The error is: `CRSError: The PROJ4 dict could not be understood. OGR Error code 5`. The PROJ4 dict is fine so far as I can tell.

I'm purposely leaving this post sparse and will reference the GitHub issue (with more details) here. 

As for using `geometry_mask` from `rasterio.features`: It is working well! The only issues is figuring out how to define the Affine matrix without IO, and I think I'm able to do that using `rasterio.transform.from_bounds()`.

Thanks again for your time,
Nic