Date   
Re: Memory error in rio calc that equivalent gdal_calc.py performs quickly without issue

Sean Gillies
 

rio-calc does indeed read the entire file. The requirement to use it on files too large to fit in memory hasn't come up before. Modifying it to work on only one (or several, using a thread or process pool) block at a time would not be complicated. I'll add an issue about this to the project tracker.

On Thu, Apr 4, 2019 at 7:40 PM <lawr@...> wrote:

Today I tried to replace a `gdal_calc.py` command with the equivalent `rio calc`, because I wanted a UInt8 output, and `gdal_calc.py` restricts me to UInt16.

 

The GDAL command is relatively quick on my 15m² raster of New Zealand, at about 2-3 minutes processing time on my machine, but the equivalent `rio calc` that I came up with seemed to be attempting to load everything into memory, and the command eventually crashed with a Memory Error after struggling for about ten minutes.

Here's the command:

```

gdal_calc.py -A a.tif -B b.tif< \
--outfile=out.tif --overwrite --calc="where(A != 1,0,1) * B" \
--format=GTiff --NoDataValue=0 --type=UInt16\
--co TILED=YES --co COMPRESS=LZW --co TFW=YES \
--co BLOCKXSIZE=128 --co BLOCKYSIZE=128
```

gdalinfo for a.tif:

```
Size is 66814, 96417
Coordinate System is:
PROJCS["NZGD_2000_New_Zealand_Transverse_Mercator",
    GEOGCS["GCS_NZGD_2000",
        DATUM["New_Zealand_Geodetic_Datum_2000",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6167"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",173],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",1600000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Origin = (1089805.593673222931102,6194224.566799569875002)
Pixel Size = (15.000000000000000,-15.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( 1089805.594, 6194224.567) (167d27'39.65"E, 34d16' 4.53"S)
Lower Left  ( 1089805.594, 4747969.567) (166d15'35.49"E, 47d13'23.08"S)
Upper Right ( 2092015.594, 6194224.567) (178d20'32.62"E, 34d16'36.05"S)
Lower Right ( 2092015.594, 4747969.567) (179d30' 5.76"E, 47d14'12.93"S)
Center      ( 1590910.594, 5471097.067) (172d53'31.45"E, 40d54'40.25"S)
Band 1 Block=128x128 Type=Byte, ColorInterp=Gray
  NoData Value=255
  Image Structure Metadata:
    NBITS=2
```

a.tif is a classification (0/null, 1, 2) representing two land use classes of interest.

b.tif contains small integer values (0-21) representing intensity of some phenomenon. It's on the same grid as a.tif, and was made with the same GeoTiff driver creation options.

The command essentially masks a.tif to pick class 1, then gets the value of b where the class is 1, nodata otherwise. I believe the equivalent `rio calc` command is the following:

```

rio calc "(* (where (!= (take A 1) 1) 0 1) (take B 1))" --dtype uint8 --name "A=a.tif" --name "B=b.tif" --masked --overwrite --co TILED=YES --co COMPRESS=LZW --co TFW=YES --co BLOCKXSIZE=128 --co BLOCKYSIZE=128 a.tif b.tif out.tif

```

(Incidentally, lisp syntax is a lot harder to read and write, IMO. Getting the brackets right was a challenge.)

I understand that both are using numpy masked arrays internally, so I'm not sure why rio calc fails but gdal_calc.py does not. Can anyone offer any insight?



--
Sean Gillies

Memory error in rio calc that equivalent gdal_calc.py performs quickly without issue

lawr@...
 

Today I tried to replace a `gdal_calc.py` command with the equivalent `rio calc`, because I wanted a UInt8 output, and `gdal_calc.py` restricts me to UInt16.

 

The GDAL command is relatively quick on my 15m² raster of New Zealand, at about 2-3 minutes processing time on my machine, but the equivalent `rio calc` that I came up with seemed to be attempting to load everything into memory, and the command eventually crashed with a Memory Error after struggling for about ten minutes.

Here's the command:

```

gdal_calc.py -A a.tif -B b.tif< \
--outfile=out.tif --overwrite --calc="where(A != 1,0,1) * B" \
--format=GTiff --NoDataValue=0 --type=UInt16\
--co TILED=YES --co COMPRESS=LZW --co TFW=YES \
--co BLOCKXSIZE=128 --co BLOCKYSIZE=128
```

gdalinfo for a.tif:

```
Size is 66814, 96417
Coordinate System is:
PROJCS["NZGD_2000_New_Zealand_Transverse_Mercator",
    GEOGCS["GCS_NZGD_2000",
        DATUM["New_Zealand_Geodetic_Datum_2000",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6167"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",173],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",1600000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Origin = (1089805.593673222931102,6194224.566799569875002)
Pixel Size = (15.000000000000000,-15.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( 1089805.594, 6194224.567) (167d27'39.65"E, 34d16' 4.53"S)
Lower Left  ( 1089805.594, 4747969.567) (166d15'35.49"E, 47d13'23.08"S)
Upper Right ( 2092015.594, 6194224.567) (178d20'32.62"E, 34d16'36.05"S)
Lower Right ( 2092015.594, 4747969.567) (179d30' 5.76"E, 47d14'12.93"S)
Center      ( 1590910.594, 5471097.067) (172d53'31.45"E, 40d54'40.25"S)
Band 1 Block=128x128 Type=Byte, ColorInterp=Gray
  NoData Value=255
  Image Structure Metadata:
    NBITS=2
```

a.tif is a classification (0/null, 1, 2) representing two land use classes of interest.

b.tif contains small integer values (0-21) representing intensity of some phenomenon. It's on the same grid as a.tif, and was made with the same GeoTiff driver creation options.

The command essentially masks a.tif to pick class 1, then gets the value of b where the class is 1, nodata otherwise. I believe the equivalent `rio calc` command is the following:

```

rio calc "(* (where (!= (take A 1) 1) 0 1) (take B 1))" --dtype uint8 --name "A=a.tif" --name "B=b.tif" --masked --overwrite --co TILED=YES --co COMPRESS=LZW --co TFW=YES --co BLOCKXSIZE=128 --co BLOCKYSIZE=128 a.tif b.tif out.tif

```

(Incidentally, lisp syntax is a lot harder to read and write, IMO. Getting the brackets right was a challenge.)

I understand that both are using numpy masked arrays internally, so I'm not sure why rio calc fails but gdal_calc.py does not. Can anyone offer any insight?

Re: Rewriting uint16 headers with rasterio / applying rio color makes them unreadable by Preview, Photoshop

Edward Boyda
 


Got it, thanks!!

On Wed, Apr 3, 2019 at 1:12 PM Sean Gillies <sean.gillies@...> wrote:
For GeoTIFF (at least), the photometric creation option is different from the color interpretation that we can get/set using the GDAL API. The file's layout and compression strategy is influenced by the photometric creation option, so it is needed up front and has a permanent impact on the file. Setting the color interpretation through the rasterio API (and thereby GDAL API) will change file metadata but will not change the file's structure. It's confusing, to be sure. Even Rouault (the expert) provides a bit more about it in this email to gdal-dev:


On Wed, Apr 3, 2019 at 10:31 AM Edward Boyda <boyda@...> wrote:

Thanks, Sean, that works for rio color.  I use the photometric creation option with gdal_translate but didn't connect the dots to rio color. 

I can see why, given all the possible creation options, this isn't mentioned explicitly in the docs. But maybe photometric interpretation makes a worthy special case - it's counterintuitive that an operation to adjust the color of an image doesn't by default preserve its header structure. 

Is there a similarly simple fix for setting colorinterp manually in update mode (my third example)? 

Much appreciated.
Ed


On Tue, Apr 2, 2019 at 1:55 PM Sean Gillies <sean.gillies@...> wrote:
Hi Ed,

Can you try the following variation on your first command?

$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5 --co photometric=RGB

Note the addition of "--co photometric=RGB". GDAL automatically sets the photometric tag (which other apps need) to RGB when the created image data type is uint8 (see https://www.gdal.org/frmt_gtiff.html) but does not do the same for other data types including uint16.


On Thu, Mar 28, 2019 at 5:48 PM Edward Boyda <boyda@...> wrote:

Hi all, I'm new at this, more of a computer vision person than a developer, so please bear with me....

I see the behavior I'm about to describe running pip-installed rasterio (1.0.22) on my Mac (OSX Mojave; homebrewed python and gdal) and also running rasterio (1.0.22) in a dockerized Ubuntu platform, on images from a variety of sources (DigitalGlobe, Planet, Landsat). 

Example 1: 
$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5

When I try to open the output file, uint16_brightened.tif, in Preview or Photoshop, I get a message like "Could not complete your request because of a problem parsing the TIFF file." (That's from Photoshop; Preview is equivalent.) 

Example 2:  
$ rio color -j -1 uint16_image.tif uint16_brightened_v2.tif gamma RGB 1.5

The number of cores has changed from the first example. Now the output, uint16_brightened_v2.tif, is readable by Photoshop but has had its color interpretation changed to (reading from rasterio):

(<ColorInterp.gray: 1>, <ColorInterp.undefined: 0>, <ColorInterp.undefined: 0>)

When I open the file with Preview or view the thumbnail with Mac Finder, there are dark vertical lines interspersed with the actual pixels, and about a third of the original pixels have been pushed out of the frame. See screenshot attached.

Example 3:
I take a file that has a (gray, undefined, undefined) color interpretation and try to change that to RGB, now in the interpreter:
>>> with rasterio.open('uint16_noCI.tif', 'r+') as f:
    f.colorinterp = (ColorInterp.red, ColorInterp.green, ColorInterp.blue)

Again the edited file is unreadable by Preview and Photoshop. 

A couple of caveats:

1) I can read the data from files output from any of the above examples with rasterio or skimage, and the resulting numpy array is uncorrputed. I can resave it with skimage, show it with matplolib, etc., and the image looks as expected.
2) With any of the above outputs, I can read and then rewrite a new_image.tif, using rasterio, and the resulting files open as expected with Photoshop and Preview. This is my current (obviously inefficient) workaround:

with rasterio.open('uint16_brightened.tif') as f:
    prof = f.profile
    img = f.read()

with rasterio.open('new_image.tif', 'w', photometric='rgb', **prof) as f:
    f.write(img)

As far as I know these failures happen only with uint16 images (at least not with uint8), and it would seem to have to do with the way color interpretation is written into the headers via the different write mechanisms.  Has anyone come across similar behavior? I've been reproducing and beating my head against this for months and would really appreciate a sanity check.

Since the Docker container is likely cleaner than what I've installed on my Mac, I've run the tests for the attached output there. Here is the Dockerfile and some possibly relevant release details: 

FROM ubuntu:latest

RUN apt-get update && apt-get install -y software-properties-common

RUN apt-get install -y python3-pip python3-dev build-essential

RUN pip3 install --upgrade pip

RUN apt-get install -y gdal-bin libgdal-dev python3-gdal

RUN apt-get install -y libssl-dev libffi-dev libcurl4-openssl-dev

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get install -y python3-tk


ADD ./requirements.txt /tmp/requirements.txt

RUN pip install -r /tmp/requirements.txt


---

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=18.04

python 3.6.6

pip 18.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)


affine==2.2.2

gbdx-auth==0.4.0

gbdxtools==0.16.0

GDAL==2.2.2

rasterio==1.0.22

rio-color==1.0.0

rio-mucho==1.0.0

Shapely==1.6.4

tifffile==2018.11.28



Thanks everyone!


Ed



--
Sean Gillies



--
Sean Gillies

Re: How to extract DATA_ENCODING creation option type from an input grib2 file for use in the output grib2 file

Shane Mill - NOAA Affiliate
 

In case anyone ends up needing to do this, I ended up using eccodes and you can retrieve and set 'packingType' which is the same as DATA_ENCODING with codes_get and codes_set

Thanks,
-Shane

Re: Rewriting uint16 headers with rasterio / applying rio color makes them unreadable by Preview, Photoshop

Sean Gillies
 

For GeoTIFF (at least), the photometric creation option is different from the color interpretation that we can get/set using the GDAL API. The file's layout and compression strategy is influenced by the photometric creation option, so it is needed up front and has a permanent impact on the file. Setting the color interpretation through the rasterio API (and thereby GDAL API) will change file metadata but will not change the file's structure. It's confusing, to be sure. Even Rouault (the expert) provides a bit more about it in this email to gdal-dev:


On Wed, Apr 3, 2019 at 10:31 AM Edward Boyda <boyda@...> wrote:

Thanks, Sean, that works for rio color.  I use the photometric creation option with gdal_translate but didn't connect the dots to rio color. 

I can see why, given all the possible creation options, this isn't mentioned explicitly in the docs. But maybe photometric interpretation makes a worthy special case - it's counterintuitive that an operation to adjust the color of an image doesn't by default preserve its header structure. 

Is there a similarly simple fix for setting colorinterp manually in update mode (my third example)? 

Much appreciated.
Ed


On Tue, Apr 2, 2019 at 1:55 PM Sean Gillies <sean.gillies@...> wrote:
Hi Ed,

Can you try the following variation on your first command?

$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5 --co photometric=RGB

Note the addition of "--co photometric=RGB". GDAL automatically sets the photometric tag (which other apps need) to RGB when the created image data type is uint8 (see https://www.gdal.org/frmt_gtiff.html) but does not do the same for other data types including uint16.


On Thu, Mar 28, 2019 at 5:48 PM Edward Boyda <boyda@...> wrote:

Hi all, I'm new at this, more of a computer vision person than a developer, so please bear with me....

I see the behavior I'm about to describe running pip-installed rasterio (1.0.22) on my Mac (OSX Mojave; homebrewed python and gdal) and also running rasterio (1.0.22) in a dockerized Ubuntu platform, on images from a variety of sources (DigitalGlobe, Planet, Landsat). 

Example 1: 
$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5

When I try to open the output file, uint16_brightened.tif, in Preview or Photoshop, I get a message like "Could not complete your request because of a problem parsing the TIFF file." (That's from Photoshop; Preview is equivalent.) 

Example 2:  
$ rio color -j -1 uint16_image.tif uint16_brightened_v2.tif gamma RGB 1.5

The number of cores has changed from the first example. Now the output, uint16_brightened_v2.tif, is readable by Photoshop but has had its color interpretation changed to (reading from rasterio):

(<ColorInterp.gray: 1>, <ColorInterp.undefined: 0>, <ColorInterp.undefined: 0>)

When I open the file with Preview or view the thumbnail with Mac Finder, there are dark vertical lines interspersed with the actual pixels, and about a third of the original pixels have been pushed out of the frame. See screenshot attached.

Example 3:
I take a file that has a (gray, undefined, undefined) color interpretation and try to change that to RGB, now in the interpreter:
>>> with rasterio.open('uint16_noCI.tif', 'r+') as f:
    f.colorinterp = (ColorInterp.red, ColorInterp.green, ColorInterp.blue)

Again the edited file is unreadable by Preview and Photoshop. 

A couple of caveats:

1) I can read the data from files output from any of the above examples with rasterio or skimage, and the resulting numpy array is uncorrputed. I can resave it with skimage, show it with matplolib, etc., and the image looks as expected.
2) With any of the above outputs, I can read and then rewrite a new_image.tif, using rasterio, and the resulting files open as expected with Photoshop and Preview. This is my current (obviously inefficient) workaround:

with rasterio.open('uint16_brightened.tif') as f:
    prof = f.profile
    img = f.read()

with rasterio.open('new_image.tif', 'w', photometric='rgb', **prof) as f:
    f.write(img)

As far as I know these failures happen only with uint16 images (at least not with uint8), and it would seem to have to do with the way color interpretation is written into the headers via the different write mechanisms.  Has anyone come across similar behavior? I've been reproducing and beating my head against this for months and would really appreciate a sanity check.

Since the Docker container is likely cleaner than what I've installed on my Mac, I've run the tests for the attached output there. Here is the Dockerfile and some possibly relevant release details: 

FROM ubuntu:latest

RUN apt-get update && apt-get install -y software-properties-common

RUN apt-get install -y python3-pip python3-dev build-essential

RUN pip3 install --upgrade pip

RUN apt-get install -y gdal-bin libgdal-dev python3-gdal

RUN apt-get install -y libssl-dev libffi-dev libcurl4-openssl-dev

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get install -y python3-tk


ADD ./requirements.txt /tmp/requirements.txt

RUN pip install -r /tmp/requirements.txt


---

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=18.04

python 3.6.6

pip 18.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)


affine==2.2.2

gbdx-auth==0.4.0

gbdxtools==0.16.0

GDAL==2.2.2

rasterio==1.0.22

rio-color==1.0.0

rio-mucho==1.0.0

Shapely==1.6.4

tifffile==2018.11.28



Thanks everyone!


Ed



--
Sean Gillies



--
Sean Gillies

Re: Rewriting uint16 headers with rasterio / applying rio color makes them unreadable by Preview, Photoshop

Edward Boyda
 


Thanks, Sean, that works for rio color.  I use the photometric creation option with gdal_translate but didn't connect the dots to rio color. 

I can see why, given all the possible creation options, this isn't mentioned explicitly in the docs. But maybe photometric interpretation makes a worthy special case - it's counterintuitive that an operation to adjust the color of an image doesn't by default preserve its header structure. 

Is there a similarly simple fix for setting colorinterp manually in update mode (my third example)? 

Much appreciated.
Ed


On Tue, Apr 2, 2019 at 1:55 PM Sean Gillies <sean.gillies@...> wrote:
Hi Ed,

Can you try the following variation on your first command?

$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5 --co photometric=RGB

Note the addition of "--co photometric=RGB". GDAL automatically sets the photometric tag (which other apps need) to RGB when the created image data type is uint8 (see https://www.gdal.org/frmt_gtiff.html) but does not do the same for other data types including uint16.


On Thu, Mar 28, 2019 at 5:48 PM Edward Boyda <boyda@...> wrote:

Hi all, I'm new at this, more of a computer vision person than a developer, so please bear with me....

I see the behavior I'm about to describe running pip-installed rasterio (1.0.22) on my Mac (OSX Mojave; homebrewed python and gdal) and also running rasterio (1.0.22) in a dockerized Ubuntu platform, on images from a variety of sources (DigitalGlobe, Planet, Landsat). 

Example 1: 
$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5

When I try to open the output file, uint16_brightened.tif, in Preview or Photoshop, I get a message like "Could not complete your request because of a problem parsing the TIFF file." (That's from Photoshop; Preview is equivalent.) 

Example 2:  
$ rio color -j -1 uint16_image.tif uint16_brightened_v2.tif gamma RGB 1.5

The number of cores has changed from the first example. Now the output, uint16_brightened_v2.tif, is readable by Photoshop but has had its color interpretation changed to (reading from rasterio):

(<ColorInterp.gray: 1>, <ColorInterp.undefined: 0>, <ColorInterp.undefined: 0>)

When I open the file with Preview or view the thumbnail with Mac Finder, there are dark vertical lines interspersed with the actual pixels, and about a third of the original pixels have been pushed out of the frame. See screenshot attached.

Example 3:
I take a file that has a (gray, undefined, undefined) color interpretation and try to change that to RGB, now in the interpreter:
>>> with rasterio.open('uint16_noCI.tif', 'r+') as f:
    f.colorinterp = (ColorInterp.red, ColorInterp.green, ColorInterp.blue)

Again the edited file is unreadable by Preview and Photoshop. 

A couple of caveats:

1) I can read the data from files output from any of the above examples with rasterio or skimage, and the resulting numpy array is uncorrputed. I can resave it with skimage, show it with matplolib, etc., and the image looks as expected.
2) With any of the above outputs, I can read and then rewrite a new_image.tif, using rasterio, and the resulting files open as expected with Photoshop and Preview. This is my current (obviously inefficient) workaround:

with rasterio.open('uint16_brightened.tif') as f:
    prof = f.profile
    img = f.read()

with rasterio.open('new_image.tif', 'w', photometric='rgb', **prof) as f:
    f.write(img)

As far as I know these failures happen only with uint16 images (at least not with uint8), and it would seem to have to do with the way color interpretation is written into the headers via the different write mechanisms.  Has anyone come across similar behavior? I've been reproducing and beating my head against this for months and would really appreciate a sanity check.

Since the Docker container is likely cleaner than what I've installed on my Mac, I've run the tests for the attached output there. Here is the Dockerfile and some possibly relevant release details: 

FROM ubuntu:latest

RUN apt-get update && apt-get install -y software-properties-common

RUN apt-get install -y python3-pip python3-dev build-essential

RUN pip3 install --upgrade pip

RUN apt-get install -y gdal-bin libgdal-dev python3-gdal

RUN apt-get install -y libssl-dev libffi-dev libcurl4-openssl-dev

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get install -y python3-tk


ADD ./requirements.txt /tmp/requirements.txt

RUN pip install -r /tmp/requirements.txt


---

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=18.04

python 3.6.6

pip 18.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)


affine==2.2.2

gbdx-auth==0.4.0

gbdxtools==0.16.0

GDAL==2.2.2

rasterio==1.0.22

rio-color==1.0.0

rio-mucho==1.0.0

Shapely==1.6.4

tifffile==2018.11.28



Thanks everyone!


Ed



--
Sean Gillies

Re: How to extract DATA_ENCODING creation option type from an input grib2 file for use in the output grib2 file

Shane Mill - NOAA Affiliate
 

Hey Sean,

Thanks for the response, I can definitely see why that would be problematic, especially for converting between formats. That makes perfect sense to me. It's unfortunate that it isn't originally provided in the GDAL API, but I'll look at other ways of getting around this with a native GRIB API or within the application.

Again, thanks as always for the context and background. It's very much appreciated.

Shane

Re: How to extract DATA_ENCODING creation option type from an input grib2 file for use in the output grib2 file

Sean Gillies
 

Hi Shane,

Before Rasterio 1.0, the project did record creation options in a custom tag (metadata) namespace. We discovered problems when converting from GeoTIFF to other formats: creation options are very format specific and few of the GeoTIFF ones make any sense for other formats. It turned out that if you, for example, translated from GeoTIFF to something else and then back to GeoTIFF, you could end up with recorded creation options that didn't reflect what existed in the data. We decided to remove this feature in Rasterio 1.0 and leave it up to developers to do it for themselves if they need it.

I see what you mean about the GRIB driver's DATA_ENCODING option. I don't see a good way to get this from the GDAL API. You might have to use a native GRIB API or begin to store the value of this creation option in your application or in custom namespace in your dataset metadata.


On Tue, Apr 2, 2019 at 3:22 PM Shane Mill - NOAA Affiliate via Groups.Io <shane.mill=noaa.gov@groups.io> wrote:
Hey everyone,

I know that grib2 isn't commonly used by this community so I apologize that I keep bringing it up... You can probably tell that I am super fond of it! Anyways, all kidding aside, I am very impressed with the abilities of Rasterio and how responsive this community has been.

In a previous topic, Sean and I discussed that creation options for a specific driver can be assigned when opening a grib file in writing mode.
ie. "with rasterio.open(file.grb2, 'w', driver='GRIB',dtype='float64'.....DATA_ENCODING='COMPLEX_PACKING',SPATIAL_DIFFERENCING_ORDER=2) as ds:"

The grib2 specific creation options can be found here:
https://www.gdal.org/frmt_grib.html

This works well in terms of creating a grib2 file encoded in a way that can be read by gdalinfo, toolsUI, and other grib2 readers. What I want/need to figure out now is how to be able to extract the DATA_ENCODING and SPATIAL_DIFFERENCING_ORDER from an input grib2 file so that these creation options don't need to be hardcoded when creating a new grib2 file. In other words, say that you have model data in an input grib2 file, and you create a new grib2 file with a new derived parameter that is derived from two bands in the input grib2 file. I would want the DATA_ENCODING to be the same for both the input and output grib2 file.

Many of the creation options are tracked in the metadata, but as far as I can tell, DATA_ENCODING is not. I'm wondering if it actually is somewhere and I'm just missing it. 

At this link (https://github.com/mapbox/rasterio/issues/405), Sean, you have indicated that the GDAL API doesn't track creation options, but you were able to develop this in Rasterio. Going back to https://www.gdal.org/frmt_grib.html, It looks like all of the Product Identification and Definition is either tracked in src.meta, src.profile, or src.tags(band#). It doesn't appear that Data Encoding is.

I just wanted to see if you (or anyone else) had any feedback on this or know if there is anyway of extracting the type of data encoding from an input file.

Thanks as always!
Shane





--
Sean Gillies

How to extract DATA_ENCODING creation option type from an input grib2 file for use in the output grib2 file

Shane Mill - NOAA Affiliate
 

Hey everyone,

I know that grib2 isn't commonly used by this community so I apologize that I keep bringing it up... You can probably tell that I am super fond of it! Anyways, all kidding aside, I am very impressed with the abilities of Rasterio and how responsive this community has been.

In a previous topic, Sean and I discussed that creation options for a specific driver can be assigned when opening a grib file in writing mode.
ie. "with rasterio.open(file.grb2, 'w', driver='GRIB',dtype='float64'.....DATA_ENCODING='COMPLEX_PACKING',SPATIAL_DIFFERENCING_ORDER=2) as ds:"

The grib2 specific creation options can be found here:
https://www.gdal.org/frmt_grib.html

This works well in terms of creating a grib2 file encoded in a way that can be read by gdalinfo, toolsUI, and other grib2 readers. What I want/need to figure out now is how to be able to extract the DATA_ENCODING and SPATIAL_DIFFERENCING_ORDER from an input grib2 file so that these creation options don't need to be hardcoded when creating a new grib2 file. In other words, say that you have model data in an input grib2 file, and you create a new grib2 file with a new derived parameter that is derived from two bands in the input grib2 file. I would want the DATA_ENCODING to be the same for both the input and output grib2 file.

Many of the creation options are tracked in the metadata, but as far as I can tell, DATA_ENCODING is not. I'm wondering if it actually is somewhere and I'm just missing it. 

At this link (https://github.com/mapbox/rasterio/issues/405), Sean, you have indicated that the GDAL API doesn't track creation options, but you were able to develop this in Rasterio. Going back to https://www.gdal.org/frmt_grib.html, It looks like all of the Product Identification and Definition is either tracked in src.meta, src.profile, or src.tags(band#). It doesn't appear that Data Encoding is.

I just wanted to see if you (or anyone else) had any feedback on this or know if there is anyway of extracting the type of data encoding from an input file.

Thanks as always!
Shane



Re: Rewriting uint16 headers with rasterio / applying rio color makes them unreadable by Preview, Photoshop

Sean Gillies
 

Hi Ed,

Can you try the following variation on your first command?

$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5 --co photometric=RGB

Note the addition of "--co photometric=RGB". GDAL automatically sets the photometric tag (which other apps need) to RGB when the created image data type is uint8 (see https://www.gdal.org/frmt_gtiff.html) but does not do the same for other data types including uint16.


On Thu, Mar 28, 2019 at 5:48 PM Edward Boyda <boyda@...> wrote:

Hi all, I'm new at this, more of a computer vision person than a developer, so please bear with me....

I see the behavior I'm about to describe running pip-installed rasterio (1.0.22) on my Mac (OSX Mojave; homebrewed python and gdal) and also running rasterio (1.0.22) in a dockerized Ubuntu platform, on images from a variety of sources (DigitalGlobe, Planet, Landsat). 

Example 1: 
$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5

When I try to open the output file, uint16_brightened.tif, in Preview or Photoshop, I get a message like "Could not complete your request because of a problem parsing the TIFF file." (That's from Photoshop; Preview is equivalent.) 

Example 2:  
$ rio color -j -1 uint16_image.tif uint16_brightened_v2.tif gamma RGB 1.5

The number of cores has changed from the first example. Now the output, uint16_brightened_v2.tif, is readable by Photoshop but has had its color interpretation changed to (reading from rasterio):

(<ColorInterp.gray: 1>, <ColorInterp.undefined: 0>, <ColorInterp.undefined: 0>)

When I open the file with Preview or view the thumbnail with Mac Finder, there are dark vertical lines interspersed with the actual pixels, and about a third of the original pixels have been pushed out of the frame. See screenshot attached.

Example 3:
I take a file that has a (gray, undefined, undefined) color interpretation and try to change that to RGB, now in the interpreter:
>>> with rasterio.open('uint16_noCI.tif', 'r+') as f:
    f.colorinterp = (ColorInterp.red, ColorInterp.green, ColorInterp.blue)

Again the edited file is unreadable by Preview and Photoshop. 

A couple of caveats:

1) I can read the data from files output from any of the above examples with rasterio or skimage, and the resulting numpy array is uncorrputed. I can resave it with skimage, show it with matplolib, etc., and the image looks as expected.
2) With any of the above outputs, I can read and then rewrite a new_image.tif, using rasterio, and the resulting files open as expected with Photoshop and Preview. This is my current (obviously inefficient) workaround:

with rasterio.open('uint16_brightened.tif') as f:
    prof = f.profile
    img = f.read()

with rasterio.open('new_image.tif', 'w', photometric='rgb', **prof) as f:
    f.write(img)

As far as I know these failures happen only with uint16 images (at least not with uint8), and it would seem to have to do with the way color interpretation is written into the headers via the different write mechanisms.  Has anyone come across similar behavior? I've been reproducing and beating my head against this for months and would really appreciate a sanity check.

Since the Docker container is likely cleaner than what I've installed on my Mac, I've run the tests for the attached output there. Here is the Dockerfile and some possibly relevant release details: 

FROM ubuntu:latest

RUN apt-get update && apt-get install -y software-properties-common

RUN apt-get install -y python3-pip python3-dev build-essential

RUN pip3 install --upgrade pip

RUN apt-get install -y gdal-bin libgdal-dev python3-gdal

RUN apt-get install -y libssl-dev libffi-dev libcurl4-openssl-dev

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get install -y python3-tk


ADD ./requirements.txt /tmp/requirements.txt

RUN pip install -r /tmp/requirements.txt


---

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=18.04

python 3.6.6

pip 18.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)


affine==2.2.2

gbdx-auth==0.4.0

gbdxtools==0.16.0

GDAL==2.2.2

rasterio==1.0.22

rio-color==1.0.0

rio-mucho==1.0.0

Shapely==1.6.4

tifffile==2018.11.28



Thanks everyone!


Ed



--
Sean Gillies

Adding internal API reference links to the narrative docs

Sean Gillies
 

Hi all,

In the Rasterio quickstart guide I've added some links to API documentation. For example, see the "open()" link under the 2nd code block in https://rasterio.readthedocs.io/en/latest/quickstart.html. I did this with the following ReST markup: :func:`~rasterio.open`. Sphinx has already indexed where rasterio.open is defined in the docs and completes the link. I hope you'll find this useful. Any such contributions that anybody would like to make would be gratefully received!

Note that the changes won't appear in the stable docs until we have another release.

--
Sean Gillies

Re: Rasterio and PROJ.6 ?

Sean Gillies
 

Vincent,

We haven't been testing with PROJ 6. It might work.

This commit https://github.com/mapbox/rasterio/commit/c5b0d911994ade823a7b5a969071c2c1bd860a11 should address the error message printed to your terminal.


On Fri, Mar 29, 2019 at 2:30 PM <vincent.sarago@...> wrote:
Dear Rasterio contributors ;-) 

I installed the latest version of GDAL 2.4.1 with PROJ4 version 6 and I'm seeing this message "proj_create: init=epsg:/init=IGNF: syntax not supported in non-PROJ4 emulation mode" when ever I use rasterio.
- Is this expected
- should I revert to GDAL 2.4.0 ?
- Does rasterio support PROJ version 6 ? 

$ rio --version
proj_create: init=epsg:/init=IGNF: syntax not supported in non-PROJ4 emulation mode
1.0.22
$ gdalinfo --version
GDAL 2.4.1, released 2019/03/15

$ proj --version
Rel. 6.0.0, March 1st, 2019

Thanks 
 



--
Sean Gillies

Re: Rasterio and PROJ.6 ?

vincent.sarago@...
 

I reinstalled rasterio and it fixed itself. I think it was caused by environment variables

Rasterio and PROJ.6 ?

vincent.sarago@...
 

Dear Rasterio contributors ;-) 

I installed the latest version of GDAL 2.4.1 with PROJ4 version 6 and I'm seeing this message "proj_create: init=epsg:/init=IGNF: syntax not supported in non-PROJ4 emulation mode" when ever I use rasterio.
- Is this expected
- should I revert to GDAL 2.4.0 ?
- Does rasterio support PROJ version 6 ? 

$ rio --version
proj_create: init=epsg:/init=IGNF: syntax not supported in non-PROJ4 emulation mode
1.0.22
$ gdalinfo --version
GDAL 2.4.1, released 2019/03/15

$ proj --version
Rel. 6.0.0, March 1st, 2019

Thanks 
 

Rewriting uint16 headers with rasterio / applying rio color makes them unreadable by Preview, Photoshop

Edward Boyda
 


Hi all, I'm new at this, more of a computer vision person than a developer, so please bear with me....

I see the behavior I'm about to describe running pip-installed rasterio (1.0.22) on my Mac (OSX Mojave; homebrewed python and gdal) and also running rasterio (1.0.22) in a dockerized Ubuntu platform, on images from a variety of sources (DigitalGlobe, Planet, Landsat). 

Example 1: 
$ rio color -j 1 uint16_image.tif uint16_brightened.tif gamma RGB 1.5

When I try to open the output file, uint16_brightened.tif, in Preview or Photoshop, I get a message like "Could not complete your request because of a problem parsing the TIFF file." (That's from Photoshop; Preview is equivalent.) 

Example 2:  
$ rio color -j -1 uint16_image.tif uint16_brightened_v2.tif gamma RGB 1.5

The number of cores has changed from the first example. Now the output, uint16_brightened_v2.tif, is readable by Photoshop but has had its color interpretation changed to (reading from rasterio):

(<ColorInterp.gray: 1>, <ColorInterp.undefined: 0>, <ColorInterp.undefined: 0>)

When I open the file with Preview or view the thumbnail with Mac Finder, there are dark vertical lines interspersed with the actual pixels, and about a third of the original pixels have been pushed out of the frame. See screenshot attached.

Example 3:
I take a file that has a (gray, undefined, undefined) color interpretation and try to change that to RGB, now in the interpreter:
>>> with rasterio.open('uint16_noCI.tif', 'r+') as f:
    f.colorinterp = (ColorInterp.red, ColorInterp.green, ColorInterp.blue)

Again the edited file is unreadable by Preview and Photoshop. 

A couple of caveats:

1) I can read the data from files output from any of the above examples with rasterio or skimage, and the resulting numpy array is uncorrputed. I can resave it with skimage, show it with matplolib, etc., and the image looks as expected.
2) With any of the above outputs, I can read and then rewrite a new_image.tif, using rasterio, and the resulting files open as expected with Photoshop and Preview. This is my current (obviously inefficient) workaround:

with rasterio.open('uint16_brightened.tif') as f:
    prof = f.profile
    img = f.read()

with rasterio.open('new_image.tif', 'w', photometric='rgb', **prof) as f:
    f.write(img)

As far as I know these failures happen only with uint16 images (at least not with uint8), and it would seem to have to do with the way color interpretation is written into the headers via the different write mechanisms.  Has anyone come across similar behavior? I've been reproducing and beating my head against this for months and would really appreciate a sanity check.

Since the Docker container is likely cleaner than what I've installed on my Mac, I've run the tests for the attached output there. Here is the Dockerfile and some possibly relevant release details: 

FROM ubuntu:latest

RUN apt-get update && apt-get install -y software-properties-common

RUN apt-get install -y python3-pip python3-dev build-essential

RUN pip3 install --upgrade pip

RUN apt-get install -y gdal-bin libgdal-dev python3-gdal

RUN apt-get install -y libssl-dev libffi-dev libcurl4-openssl-dev

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get install -y python3-tk


ADD ./requirements.txt /tmp/requirements.txt

RUN pip install -r /tmp/requirements.txt


---

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=18.04

python 3.6.6

pip 18.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)


affine==2.2.2

gbdx-auth==0.4.0

gbdxtools==0.16.0

GDAL==2.2.2

rasterio==1.0.22

rio-color==1.0.0

rio-mucho==1.0.0

Shapely==1.6.4

tifffile==2018.11.28



Thanks everyone!


Ed

Re: cannot find API reference anymore

Sean Gillies
 

This issue has been resolved. Check the project issue tracker's recently closed issues for details.

Re: issue with opening/closing datasets

Sean Gillies
 

Amine,

On Wed, Mar 27, 2019 at 11:24 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Dear Sean,

I am slightly confused. You state the following:

The following will work if you are calling the two functions in the same module.
with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

I have two issues with the above statement:
  •  Why are you closing the file if it is inside the with block? I thought one of the advantages of using the with  block is that files are closed implicitly?
  • This doesn't completely answer my question, since I would like to return  the file read object and pass it from function to function so that I can pry it open in each function and read/write the necessary values to and from it.
To be more explicit, can I do something like this?:

def open_dataset(filename):
    with rasterio.open(filename) as dataset:
        return dataset

def do_stuff_to dataset(dataset):
    dataset.write()
    return modified_dataset

def get_info_from_dataset(dataset):
    dataset.information
    return information

If so, then what are the disadvantages of using the with block inside a function and returning the object to be used outside? Is this good practice? If not what is the recommended way to write functions which use the rasterio library? This also extends to with blocks containing rasterio.Env(). Should I nest the with statement inside the function as stated above?

I'm sorry about the confusion. I would rather not comment on the structure of your application. Your original question was about the warnings being printed in your shell, yes? I pointed out that in the absence of a custom error/warning handler, GDAL prints these directly to your shell. Rasterio does not register any custom handlers when you import it because I want to avoid import side effects that complicate testing of rasterio's modules.

The rasterio.env.Env class does register a custom error/warning handler when its __enter__() method is called. Within a `with Env():` block you should not see anything printed to the shell: messages will go to Python's logger instead, or be turned into Python exceptions.

I recommend ensuring that there is an activated Env within your function. You could do this by putting `with rasterio.env.Env():` at the top of the functions, or by using a decorator. There are examples of each in the rasterio code. The rasterio.open function itself is so decorated and you could reuse that decorator, it is part of the public API.
 

I ask because I would like to avoid writing a dataset to physical file until I am done modifying it.  For instance, geopandas uses the GeoDataFrame construct which is stored in memory, and not attached to any physical file. Memory files in rasterio come close, but they are still attached to a temporary file.

It seems that in rasterio defining a raster object must be via 3 disjoint entities (numpy array, affine transform and CRS ). It would be nice to have one object which groups all these entities and is somehow detached from physical/temporary files. A sort of glorified numpy array with metadata (a "GeoNumpy" array), just like geopandas glorifies the pandas Dataframe with metadata.

Of course this is just a naive (but hopefully constructive) suggestion, and perhaps also due to the fact that I do not completely understand how the library works :). 

Regards,

Amine

There is a file in MemoryFile, yes, but it is a formatted file in memory, not on disk. See https://www.gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsimem for a brief explanation.

Other programmers have shown interest in a "GeoNumpy" class, and I've seen at least one project like this on GitHub. Georaster, I think. Rasterio doesn't provide such a class and I'm not ready to add one at this time. I believe t's better for Rasterio to focus on reading and writing formatted datasets and leave application-specific classes up to application developers.

--
Sean Gillies

Re: issue with opening/closing datasets

Amine Aboufirass
 

Dear Sean,

I am slightly confused. You state the following:

The following will work if you are calling the two functions in the same module.
with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

I have two issues with the above statement:
  •  Why are you closing the file if it is inside the with block? I thought one of the advantages of using the with  block is that files are closed implicitly?
  • This doesn't completely answer my question, since I would like to return  the file read object and pass it from function to function so that I can pry it open in each function and read/write the necessary values to and from it.
To be more explicit, can I do something like this?:

def open_dataset(filename):
    with rasterio.open(filename) as dataset:
        return dataset

def do_stuff_to dataset(dataset):
    dataset.write()
    return modified_dataset

def get_info_from_dataset(dataset):
    dataset.information
    return information

If so, then what are the disadvantages of using the with block inside a function and returning the object to be used outside? Is this good practice? If not what is the recommended way to write functions which use the rasterio library? This also extends to with blocks containing rasterio.Env(). Should I nest the with statement inside the function as stated above?

I ask because I would like to avoid writing a dataset to physical file until I am done modifying it.  For instance, geopandas uses the GeoDataFrame construct which is stored in memory, and not attached to any physical file. Memory files in rasterio come close, but they are still attached to a temporary file.

It seems that in rasterio defining a raster object must be via 3 disjoint entities (numpy array, affine transform and CRS ). It would be nice to have one object which groups all these entities and is somehow detached from physical/temporary files. A sort of glorified numpy array with metadata (a "GeoNumpy" array), just like geopandas glorifies the pandas Dataframe with metadata.

Of course this is just a naive (but hopefully constructive) suggestion, and perhaps also due to the fact that I do not completely understand how the library works :). 

Regards,

Amine




On Tue, Mar 26, 2019 at 2:56 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

On Tue, Mar 26, 2019 at 5:10 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi Sean, the issue is that I am writing functions where the output is often a rasterio dataset. I don't know if this can be accomplished using a with statement:
function open_raster(filename):
    rasterio_dataset_object = rasterio.open(filename)
    return rasterio_dataset_object

function do_stuff_with_raster(rasterio_dataset_object):
    ###do stuff with raster
    return rasterio_dataset_object

dataset = open_raster('raster.tif')
new_raster = do_stuff_with_raster(dataset)
new_raster.close() 

Thanks,
Amine

The following will work if you are calling the two functions in the same module.

with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

 
On Fri, Mar 22, 2019 at 5:25 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

I think you have made in error in pasting code into the GitHub issue. The code you've given will fail at dataset = memfile.open because you haven't assigned memfile yet.

The message you see printed comes straight from the GDAL library. You haven't configured any GDAL error or log message handler and so the messages go directly to your terminal. Message handlers are configured if you run your code within a `with rasterio.Env()` block.

    import rasterio

    with rasterio.Env():
        # your code here

Also if you do

    with memfile.open(...) as dataset:

you won't see this message.


On Fri, Mar 22, 2019 at 9:25 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine


_._,_._

--
Sean Gillies

Re: issue with opening/closing datasets

Sean Gillies
 

Hi Amine,

On Tue, Mar 26, 2019 at 5:10 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi Sean, the issue is that I am writing functions where the output is often a rasterio dataset. I don't know if this can be accomplished using a with statement:
function open_raster(filename):
    rasterio_dataset_object = rasterio.open(filename)
    return rasterio_dataset_object

function do_stuff_with_raster(rasterio_dataset_object):
    ###do stuff with raster
    return rasterio_dataset_object

dataset = open_raster('raster.tif')
new_raster = do_stuff_with_raster(dataset)
new_raster.close() 

Thanks,
Amine

The following will work if you are calling the two functions in the same module.

with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

 
On Fri, Mar 22, 2019 at 5:25 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

I think you have made in error in pasting code into the GitHub issue. The code you've given will fail at dataset = memfile.open because you haven't assigned memfile yet.

The message you see printed comes straight from the GDAL library. You haven't configured any GDAL error or log message handler and so the messages go directly to your terminal. Message handlers are configured if you run your code within a `with rasterio.Env()` block.

    import rasterio

    with rasterio.Env():
        # your code here

Also if you do

    with memfile.open(...) as dataset:

you won't see this message.


On Fri, Mar 22, 2019 at 9:25 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine


_._,_._

--
Sean Gillies

cannot find API reference anymore

Amine Aboufirass
 

Dear All, 

I am quite sure that there used to be extensive online documentation in the following website:

https://rasterio.readthedocs.io/en/stable/api/rasterio.transform.html

What happened to it? Why has it been deleted and will it be brought back at some point? 

Regards,

Amine