Topics

Memory error in rio calc that equivalent gdal_calc.py performs quickly without issue


lawr@...
 

Today I tried to replace a `gdal_calc.py` command with the equivalent `rio calc`, because I wanted a UInt8 output, and `gdal_calc.py` restricts me to UInt16.

 

The GDAL command is relatively quick on my 15m² raster of New Zealand, at about 2-3 minutes processing time on my machine, but the equivalent `rio calc` that I came up with seemed to be attempting to load everything into memory, and the command eventually crashed with a Memory Error after struggling for about ten minutes.

Here's the command:

```

gdal_calc.py -A a.tif -B b.tif< \
--outfile=out.tif --overwrite --calc="where(A != 1,0,1) * B" \
--format=GTiff --NoDataValue=0 --type=UInt16\
--co TILED=YES --co COMPRESS=LZW --co TFW=YES \
--co BLOCKXSIZE=128 --co BLOCKYSIZE=128
```

gdalinfo for a.tif:

```
Size is 66814, 96417
Coordinate System is:
PROJCS["NZGD_2000_New_Zealand_Transverse_Mercator",
    GEOGCS["GCS_NZGD_2000",
        DATUM["New_Zealand_Geodetic_Datum_2000",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6167"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",173],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",1600000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Origin = (1089805.593673222931102,6194224.566799569875002)
Pixel Size = (15.000000000000000,-15.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( 1089805.594, 6194224.567) (167d27'39.65"E, 34d16' 4.53"S)
Lower Left  ( 1089805.594, 4747969.567) (166d15'35.49"E, 47d13'23.08"S)
Upper Right ( 2092015.594, 6194224.567) (178d20'32.62"E, 34d16'36.05"S)
Lower Right ( 2092015.594, 4747969.567) (179d30' 5.76"E, 47d14'12.93"S)
Center      ( 1590910.594, 5471097.067) (172d53'31.45"E, 40d54'40.25"S)
Band 1 Block=128x128 Type=Byte, ColorInterp=Gray
  NoData Value=255
  Image Structure Metadata:
    NBITS=2
```

a.tif is a classification (0/null, 1, 2) representing two land use classes of interest.

b.tif contains small integer values (0-21) representing intensity of some phenomenon. It's on the same grid as a.tif, and was made with the same GeoTiff driver creation options.

The command essentially masks a.tif to pick class 1, then gets the value of b where the class is 1, nodata otherwise. I believe the equivalent `rio calc` command is the following:

```

rio calc "(* (where (!= (take A 1) 1) 0 1) (take B 1))" --dtype uint8 --name "A=a.tif" --name "B=b.tif" --masked --overwrite --co TILED=YES --co COMPRESS=LZW --co TFW=YES --co BLOCKXSIZE=128 --co BLOCKYSIZE=128 a.tif b.tif out.tif

```

(Incidentally, lisp syntax is a lot harder to read and write, IMO. Getting the brackets right was a challenge.)

I understand that both are using numpy masked arrays internally, so I'm not sure why rio calc fails but gdal_calc.py does not. Can anyone offer any insight?


Sean Gillies
 

rio-calc does indeed read the entire file. The requirement to use it on files too large to fit in memory hasn't come up before. Modifying it to work on only one (or several, using a thread or process pool) block at a time would not be complicated. I'll add an issue about this to the project tracker.

On Thu, Apr 4, 2019 at 7:40 PM <lawr@...> wrote:

Today I tried to replace a `gdal_calc.py` command with the equivalent `rio calc`, because I wanted a UInt8 output, and `gdal_calc.py` restricts me to UInt16.

 

The GDAL command is relatively quick on my 15m² raster of New Zealand, at about 2-3 minutes processing time on my machine, but the equivalent `rio calc` that I came up with seemed to be attempting to load everything into memory, and the command eventually crashed with a Memory Error after struggling for about ten minutes.

Here's the command:

```

gdal_calc.py -A a.tif -B b.tif< \
--outfile=out.tif --overwrite --calc="where(A != 1,0,1) * B" \
--format=GTiff --NoDataValue=0 --type=UInt16\
--co TILED=YES --co COMPRESS=LZW --co TFW=YES \
--co BLOCKXSIZE=128 --co BLOCKYSIZE=128
```

gdalinfo for a.tif:

```
Size is 66814, 96417
Coordinate System is:
PROJCS["NZGD_2000_New_Zealand_Transverse_Mercator",
    GEOGCS["GCS_NZGD_2000",
        DATUM["New_Zealand_Geodetic_Datum_2000",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6167"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",173],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",1600000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Origin = (1089805.593673222931102,6194224.566799569875002)
Pixel Size = (15.000000000000000,-15.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( 1089805.594, 6194224.567) (167d27'39.65"E, 34d16' 4.53"S)
Lower Left  ( 1089805.594, 4747969.567) (166d15'35.49"E, 47d13'23.08"S)
Upper Right ( 2092015.594, 6194224.567) (178d20'32.62"E, 34d16'36.05"S)
Lower Right ( 2092015.594, 4747969.567) (179d30' 5.76"E, 47d14'12.93"S)
Center      ( 1590910.594, 5471097.067) (172d53'31.45"E, 40d54'40.25"S)
Band 1 Block=128x128 Type=Byte, ColorInterp=Gray
  NoData Value=255
  Image Structure Metadata:
    NBITS=2
```

a.tif is a classification (0/null, 1, 2) representing two land use classes of interest.

b.tif contains small integer values (0-21) representing intensity of some phenomenon. It's on the same grid as a.tif, and was made with the same GeoTiff driver creation options.

The command essentially masks a.tif to pick class 1, then gets the value of b where the class is 1, nodata otherwise. I believe the equivalent `rio calc` command is the following:

```

rio calc "(* (where (!= (take A 1) 1) 0 1) (take B 1))" --dtype uint8 --name "A=a.tif" --name "B=b.tif" --masked --overwrite --co TILED=YES --co COMPRESS=LZW --co TFW=YES --co BLOCKXSIZE=128 --co BLOCKYSIZE=128 a.tif b.tif out.tif

```

(Incidentally, lisp syntax is a lot harder to read and write, IMO. Getting the brackets right was a challenge.)

I understand that both are using numpy masked arrays internally, so I'm not sure why rio calc fails but gdal_calc.py does not. Can anyone offer any insight?



--
Sean Gillies