Memory error in rio calc that equivalent gdal_calc.py performs quickly without issue
lawr@...
Today I tried to replace a `gdal_calc.py` command with the equivalent `rio calc`, because I wanted a UInt8 output, and `gdal_calc.py` restricts me to UInt16.
The GDAL command is relatively quick on my 15m² raster of New Zealand, at about 2-3 minutes processing time on my machine, but the equivalent `rio calc` that I came up with seemed to be attempting to load everything into memory, and the command eventually crashed with a Memory Error after struggling for about ten minutes. Here's the command: ``` gdal_calc.py -A a.tif -B b.tif< \ gdalinfo for a.tif: ``` a.tif is a classification (0/null, 1, 2) representing two land use classes of interest. b.tif contains small integer values (0-21) representing intensity of some phenomenon. It's on the same grid as a.tif, and was made with the same GeoTiff driver creation options. The command essentially masks a.tif to pick class 1, then gets the value of b where the class is 1, nodata otherwise. I believe the equivalent `rio calc` command is the following: ``` rio calc "(* (where (!= (take A 1) 1) 0 1) (take B 1))" --dtype uint8 --name "A=a.tif" --name "B=b.tif" --masked --overwrite --co TILED=YES --co COMPRESS=LZW --co TFW=YES --co BLOCKXSIZE=128 --co BLOCKYSIZE=128 a.tif b.tif out.tif ``` (Incidentally, lisp syntax is a lot harder to read and write, IMO. Getting the brackets right was a challenge.) I understand that both are using numpy masked arrays internally, so I'm not sure why rio calc fails but gdal_calc.py does not. Can anyone offer any insight?
|
|
Sean Gillies
rio-calc does indeed read the entire file. The requirement to use it on files too large to fit in memory hasn't come up before. Modifying it to work on only one (or several, using a thread or process pool) block at a time would not be complicated. I'll add an issue about this to the project tracker.
On Thu, Apr 4, 2019 at 7:40 PM <lawr@...> wrote:
-- Sean Gillies
|
|