Date   

Re: Occasional "not recognized as a supported file format" errors when reading from S3

Jonas
 

OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Jonas
 
Edited

Here is some more insight on the issue. Remember, we are seeing this when reading small chunks from a cloud-optimized GeoTIFF on S3 with our Terracotta tile server.

On our latest encounters of the issue, we are getting it consistently on the same map tile but not on adjacent tiles that are reading the exact same GeoTIFF.
So this challenges the assumption that this is a race condition, does it not?

Btw, we also tried the solution from https://github.com/OSGeo/gdal/issues/1244#issuecomment-487164897 (CPL_VSIL_CURL_NON_CACHED="/vsis3/") to no avail.


Rasterio 1.0.24

Sean Gillies
 

Hi all,

Rasterio 1.0.24 is on PyPI now: https://pypi.org/project/rasterio/1.0.24/#files. This release fixes a pretty major bug that potentially affected multi-threaded programs or programs that reopened datasets while changing opening options like overview level (see issue #1504). Upgrade when you can.

The only other change to note is that the wheels on PyPI now include GDAL 2.4.1.

--
Sean Gillies


Re: gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)

Erick Palacios Moreno
 

Hi Sean,

Thank you for your answer.

Even answered that LUSTRE is causing this behaviour and my best option is building GDAL and libnetcdf against a libhdf5 version that plays well with LUSTRE.

In ubuntu xenial we didn't have this problems so we will discuss if it's an option using a different version of libhdf5.

Thank you,

Erick


----- Mensaje original -----
De: "Sean Gillies" <sean.gillies@gmail.com>
Para: main@rasterio.groups.io
Enviados: Miércoles, 5 de Junio 2019 13:25:55
Asunto: Re: [rasterio] gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)

Hi Erick,

I'm not familiar with Lustre and only slightly familiar with the details of
GDAL's netCDF driver. I think, since the problem manifests with gdalinfo as
well as rasterio programs, that the best source of help will be the
gdal-dev list: https://lists.osgeo.org/mailman/listinfo/gdal-dev/. There
have been recent discussions related to HDF5 and netCDF4 there and GDAL's
developer, Even Rouault, will probably have some insights. I hate to
redirect you to another email list, but gdal-dev seems to be the best place
to ask in this case. When you get help there, I'll make sure to follow up
here.

On Wed, Jun 5, 2019 at 9:22 AM <epalacios@conabio.gob.mx> wrote:

Hi,

My team and I have been facing some problems when reading netcdf files in
`LUSTRE` filesystem with `ncdump` tool regarding an HDF5 1.10.1 issue in
`ubuntu:bionic` (see for example: https://github.com/ALPSCore/ALPSCore/issues/410)


We made a workaround by installing `netcdf-bin` using repositories of
`ubuntu:xenial` in `bionic`. Although `ncdump` have successfully read the
netcdf file, we couldn't fix it (sort of...) for `gdalinfo` and `rasterio`
for example:

this works:

ncdump /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc|head
-n 20

gdalinfo /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc
|head -n 20

gdalinfo -sd 1 /LUSTRE/MADMEX/dir_test_mount/
madmex_003_37_-32_1996-01-01.nc |head -n 20

but this neither of next lines work:

gdalinfo
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

gdalinfo
NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

```
ERROR 4:
NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean:
No such file or directory
gdalinfo failed - unable to open
'NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean'.
```

This ultimately leads to error with `rio insp` in both next lines :

rio insp
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

rio insp
NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

```
ERROR:root:Exception caught during processing
Traceback (most recent call last):
File "rasterio/_base.pyx", line 214, in
rasterio._base.DatasetBase.__init__
File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError:
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean:
No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/rasterio/rio/insp.py", line
77, in insp
with rasterio.open(input, mode) as src:
File "/usr/local/lib/python3.6/dist-packages/rasterio/env.py", line 423,
in wrapper
return f(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py", line
216, in open
s = DatasetReader(path, driver=driver, **kwargs)
File "rasterio/_base.pyx", line 216, in
rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError:
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean:
No such file or directory
Aborted!

```

But this works for both `gdalinfo` and `rio insp` tools:

```
gdalinfo hdf5:/LUSTRE/MADMEX/dir_test_mount/
madmex_003_37_-32_1996-01-01.nc://blue_mean

Driver: HDF5Image/HDF5 Dataset
Files: /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is `'
Metadata:
Conventions=CF-1.6, ACDD-1.3
date_created=2019-04-26T21:07:47.595416
geospatial_bounds=POLYGON ((-98.8618335626467
19.8093029971201,-98.8722811078025 19.3587577175966,-98.3949028300917
19.3481841431337,-98.382861955109 19.7986884105649,-98.8618335626467
19.8093029971201))
geospatial_bounds_crs=EPSG:4326
...
```

```
rio insp hdf5:/LUSTRE/MADMEX/dir_test_mount/
madmex_003_37_-32_1996-01-01.nc://blue_mean

/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py:216:
NotGeoreferencedWarning: Dataset has no geotransform set. The identity
matrix may be returned.
s = DatasetReader(path, driver=driver, **kwargs)
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
src.meta
{'driver': 'HDF5Image', 'dtype': 'int16', 'nodata': None, 'width': 1667,
'height': 1667, 'count': 1, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
0.0, 1.0, 0.0)}

a=src.read()
a
array([[[632, 603, 586, ..., 505, 483, 471],
[567, 538, 536, ..., 468, 478, 523],
[614, 580, 537, ..., 540, 633, 675],
...,
[828, 810, 804, ..., 275, 268, 299],
[857, 844, 823, ..., 310, 290, 307],
[840, 854, 836, ..., 320, 288, 294]]], dtype=int16)

```

This is related with LUSTRE FS because if we copy our NetCDF file to /root
location then:

```
this works:

gdalinfo netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean

Driver: netCDF/Network Common Data Format
Files: /root/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is:
PROJCS["unnamed",
GEOGCS["WGS 84",
...
```

```
rio insp netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
src.meta
{'driver': 'netCDF', 'dtype': 'int16', 'nodata': -32767.0, 'width': 1667,
'height': 1667, 'count': 1, 'crs':
CRS.from_wkt('PROJCS["unnamed",GEOGCS["WGS
84",DATUM["unknown",SPHEROID["WGS84",6378137,6556752.3141]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",17.5],PARAMETER["standard_parallel_2",29.5],PARAMETER["latitude_of_origin",12],PARAMETER["central_meridian",-102],PARAMETER["false_easting",2500000],PARAMETER["false_northing",0]]'),
'transform': Affine(30.0, 0.0, 2827530.0,
0.0, -30.0, 876410.0)}
a=src.read()
a
array([[[632, 603, 586, ..., 505, 483, 471],
[567, 538, 536, ..., 468, 478, 523],
[614, 580, 537, ..., 540, 633, 675],
...,
[828, 810, 804, ..., 275, 268, 299],
[857, 844, 823, ..., 310, 290, 307],
[840, 854, 836, ..., 320, 288, 294]]], dtype=int16)
```

We are using `ubuntu:bionic` docker image so you can reproduce this error
using next lines:

```
sudo docker run -v /LUSTRE/:/LUSTRE/ \
-v /LUSTRE/MADMEX/dir_test_mount:/temporal \
--name mounting_volume_test_bionic \
--hostname antares3-datacube \
-dit ubuntu:bionic /bin/bash
```

enter to docker container:

```
sudo docker exec -it mounting_volume_test_bionic bash
```

and execute:

```
apt-get update

export DEBIAN_FRONTEND=noninteractive && echo "America/Mexico_City" >
/etc/timezone && apt-get install -y tzdata
apt-get update && apt-get install -y \
wget curl \
openssh-server \
openssl \
sudo \
nano \
software-properties-common \
git \
vim \
vim-gtk \
htop \
build-essential \
libssl-dev \
libffi-dev \
cmake \
python3-dev \
python3-pip \
python3-setuptools \
ca-certificates \
postgresql-client \
libudunits2-dev \
nodejs && pip3 install --upgrade pip

#install netcdf:

echo -e 'deb http://security.ubuntu.com/ubuntu/ xenial-security
universe\ndeb http://archive.ubuntu.com/ubuntu/ xenial universe' >>
/etc/apt/sources.list
apt update
apt install -t xenial-security libcurl3-gnutls
apt install -y -t xenial netcdf-bin
ncdump /LUSTRE/MADMEX/madmex_003_37_-32_1996-01-01.nc | head -n 30


#Install spatial libraries
add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable && apt-get -qq
update

apt-get install -y \
ncview \
libproj-dev \
libgeos-dev \
gdal-bin \
libgdal-dev
pip install numpy && pip install GDAL==$(gdal-config --version)
--global-option=build_ext --global-option='-I/usr/include/gdal' && pip
install rasterio
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
export GDAL_DATA=/usr/share/gdal/
```

Hope you can help us

Cheers,

Erick


--
Sean Gillies


Re: gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)

Sean Gillies
 

Hi Erick,

I'm not familiar with Lustre and only slightly familiar with the details of GDAL's netCDF driver. I think, since the problem manifests with gdalinfo as well as rasterio programs, that the best source of help will be the gdal-dev list: https://lists.osgeo.org/mailman/listinfo/gdal-dev/. There have been recent discussions related to HDF5 and netCDF4 there and GDAL's developer, Even Rouault, will probably have some insights. I hate to redirect you to another email list, but gdal-dev seems to be the best place to ask in this case. When you get help there, I'll make sure to follow up here.


On Wed, Jun 5, 2019 at 9:22 AM <epalacios@...> wrote:
Hi,

 My team and I have been facing some problems when reading netcdf files in `LUSTRE` filesystem with `ncdump` tool regarding an HDF5 1.10.1 issue in `ubuntu:bionic` (see for example: https://github.com/ALPSCore/ALPSCore/issues/410)

We made a workaround by installing `netcdf-bin` using repositories of `ubuntu:xenial` in `bionic`. Although `ncdump` have successfully read the netcdf file, we couldn't fix it (sort of...) for `gdalinfo` and  `rasterio` for example:

this works:

ncdump /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc|head -n 20

gdalinfo /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc |head -n 20

gdalinfo -sd 1 /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc |head -n 20

but this neither of next lines work:

gdalinfo netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

gdalinfo NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

```
ERROR 4: NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean: No such file or directory
gdalinfo failed - unable to open 'NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean'.
```

This ultimately leads to error with `rio insp`  in both next lines :

rio insp netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

rio insp NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

```
ERROR:root:Exception caught during processing
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 214, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/rasterio/rio/insp.py", line 77, in insp
    with rasterio.open(input, mode) as src:
  File "/usr/local/lib/python3.6/dist-packages/rasterio/env.py", line 423, in wrapper
    return f(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py", line 216, in open
    s = DatasetReader(path, driver=driver, **kwargs)
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean: No such file or directory
Aborted!

```

But this works for both `gdalinfo` and `rio insp` tools:

```
gdalinfo hdf5:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc://blue_mean

Driver: HDF5Image/HDF5 Dataset
Files: /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is `'
Metadata:
  Conventions=CF-1.6, ACDD-1.3
  date_created=2019-04-26T21:07:47.595416
  geospatial_bounds=POLYGON ((-98.8618335626467 19.8093029971201,-98.8722811078025 19.3587577175966,-98.3949028300917 19.3481841431337,-98.382861955109 19.7986884105649,-98.8618335626467 19.8093029971201))
  geospatial_bounds_crs=EPSG:4326
...
```

```
rio insp hdf5:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc://blue_mean

/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py:216: NotGeoreferencedWarning: Dataset has no geotransform set. The identity matrix may be returned.
  s = DatasetReader(path, driver=driver, **kwargs)
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> src.meta
{'driver': 'HDF5Image', 'dtype': 'int16', 'nodata': None, 'width': 1667, 'height': 1667, 'count': 1, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0)}

>>> a=src.read()
>>> a
array([[[632, 603, 586, ..., 505, 483, 471],
        [567, 538, 536, ..., 468, 478, 523],
        [614, 580, 537, ..., 540, 633, 675],
        ...,
        [828, 810, 804, ..., 275, 268, 299],
        [857, 844, 823, ..., 310, 290, 307],
        [840, 854, 836, ..., 320, 288, 294]]], dtype=int16)

```

This is related with LUSTRE FS because if we copy our NetCDF file to /root location then:

```
this works:

gdalinfo netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean

Driver: netCDF/Network Common Data Format
Files: /root/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is:
PROJCS["unnamed",
    GEOGCS["WGS 84",
...
```

```
rio insp netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> src.meta
{'driver': 'netCDF', 'dtype': 'int16', 'nodata': -32767.0, 'width': 1667, 'height': 1667, 'count': 1, 'crs': CRS.from_wkt('PROJCS["unnamed",GEOGCS["WGS 84",DATUM["unknown",SPHEROID["WGS84",6378137,6556752.3141]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",17.5],PARAMETER["standard_parallel_2",29.5],PARAMETER["latitude_of_origin",12],PARAMETER["central_meridian",-102],PARAMETER["false_easting",2500000],PARAMETER["false_northing",0]]'), 'transform': Affine(30.0, 0.0, 2827530.0,
       0.0, -30.0, 876410.0)}
>>> a=src.read()
>>> a
array([[[632, 603, 586, ..., 505, 483, 471],
        [567, 538, 536, ..., 468, 478, 523],
        [614, 580, 537, ..., 540, 633, 675],
        ...,
        [828, 810, 804, ..., 275, 268, 299],
        [857, 844, 823, ..., 310, 290, 307],
        [840, 854, 836, ..., 320, 288, 294]]], dtype=int16)
```

We are using `ubuntu:bionic` docker image so you can reproduce this error using next lines:

```
sudo docker run -v /LUSTRE/:/LUSTRE/ \
-v /LUSTRE/MADMEX/dir_test_mount:/temporal \
--name mounting_volume_test_bionic \
--hostname antares3-datacube \
-dit ubuntu:bionic /bin/bash
```

enter to docker container:

```
sudo docker exec -it mounting_volume_test_bionic bash
```

and execute:

```
apt-get update

export DEBIAN_FRONTEND=noninteractive && echo "America/Mexico_City" > /etc/timezone && apt-get install -y tzdata
apt-get update && apt-get install -y \
            wget curl \
            openssh-server \
            openssl \
            sudo \
            nano \
            software-properties-common \
            git \
            vim \
            vim-gtk \
            htop \
            build-essential \
            libssl-dev \
            libffi-dev \
            cmake \
            python3-dev \
            python3-pip \
            python3-setuptools \
            ca-certificates \
            postgresql-client \
            libudunits2-dev \
            nodejs && pip3 install --upgrade pip

#install netcdf:

echo -e 'deb http://security.ubuntu.com/ubuntu/ xenial-security universe\ndeb http://archive.ubuntu.com/ubuntu/ xenial universe' >> /etc/apt/sources.list
apt update
apt install -t xenial-security libcurl3-gnutls
apt install -y -t xenial netcdf-bin
ncdump /LUSTRE/MADMEX/madmex_003_37_-32_1996-01-01.nc | head -n 30


#Install spatial libraries
add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable && apt-get -qq update

apt-get install -y \
            ncview \
            libproj-dev \
            libgeos-dev \
            gdal-bin \
            libgdal-dev
pip install numpy && pip install GDAL==$(gdal-config --version) --global-option=build_ext --global-option='-I/usr/include/gdal' && pip install rasterio
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
export GDAL_DATA=/usr/share/gdal/
```

Hope you can help us

Cheers,

Erick



--
Sean Gillies


Re: Clipping a raster (grib2), end up with zeros on left edge when cropping with mask

Shane Mill - NOAA Affiliate
 

Hey Sean,

As always, thanks for the feedback. So to follow up, it turns out that I had to add "filled=False". In my particular situation, the user drags a bounding box over the original raster and is returned with the subsetted raster. With 'filled=False' in out_image, out_transform = mask(src, geoms, filled=False, crop=True), i get the desired result.

Thanks!
Shane Mill

On Fri, May 24, 2019 at 10:00 AM Sean Gillies <sean.gillies@...> wrote:
Hi Shane,

It looks like your feature geometries extend beyond the extent of your raster, yes? In that case, crop can create an extra row or column of pixels, which will be empty. This seems like a bug to me, or undefined behavior, at least. I suggest passing crop=False in your situation.

On Thu, May 23, 2019 at 8:52 AM Shane Mill - NOAA Affiliate via Groups.Io <shane.mill=noaa.gov@groups.io> wrote:
Hi everyone,

I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before?
I think the issue is where the mask function is used with the cropping. 

In the picture below, the blue edge shows the zeros.
And for context, here is the driving code that is performing the action:





--
Sean Gillies



--
Shane Mill

Meteorological Application Developer, AceInfo Solutions

Meteorological Development Laboratory (NWS)

Phone: 301-427-9452


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Sean Gillies
 

I found the VRT multi-threading guidance at https://gdal.org/drivers/raster/vrt.html#multi-threading-issues. It moved during the site migration.

This smells like a GDAL bug to me, perhaps a manifestation of https://github.com/OSGeo/gdal/issues/1244 or https://github.com/OSGeo/gdal/issues/1031.


On Mon, May 27, 2019 at 2:50 AM Dion Häfner <dion.haefner@...> wrote:

Hey Sean,

thank you very much for the detailed assessment. I think this already sheds some light on some of the problems.

To answer your questions:

- The /vsis3/ identifiers are correct and work most of the time, we only see these errors from time to time.

- This is just a default read from a private S3 bucket. I don't know which protocol boto (or GDAL?) uses under the hood.

- In the first case, there is indeed multithreading involved, so this is a hot lead. I will have a look at the documentation as soon as I can find it somewhere :) It working most of the time would of course suggest some sort of race condition.

In the second case, the setup is actually more complicated. We experimented with "cloud-optimized" VRT files, that we created by

1. gdalbuildvrt on many small-ish COG

2. gdaladdo on the resulting VRT into an external .vrt.ovr file

We could then serve these COG mosaics up as a single dataset via Terracotta, but apparently the read from the VRT would fail when we zoomed in too far.

Thank you for your time!

Dion

On 24/05/2019 20.49, Sean Gillies via Groups.Io wrote:
Hi Dion,

That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem.

First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all?

Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs.

Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated.

On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 



--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner <dion.haefner@...>
 

Hey Sean,

thank you very much for the detailed assessment. I think this already sheds some light on some of the problems.

To answer your questions:

- The /vsis3/ identifiers are correct and work most of the time, we only see these errors from time to time.

- This is just a default read from a private S3 bucket. I don't know which protocol boto (or GDAL?) uses under the hood.

- In the first case, there is indeed multithreading involved, so this is a hot lead. I will have a look at the documentation as soon as I can find it somewhere :) It working most of the time would of course suggest some sort of race condition.

In the second case, the setup is actually more complicated. We experimented with "cloud-optimized" VRT files, that we created by

1. gdalbuildvrt on many small-ish COG

2. gdaladdo on the resulting VRT into an external .vrt.ovr file

We could then serve these COG mosaics up as a single dataset via Terracotta, but apparently the read from the VRT would fail when we zoomed in too far.

Thank you for your time!

Dion


On 24/05/2019 20.49, Sean Gillies via Groups.Io wrote:
Hi Dion,

That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem.

First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all?

Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs.

Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated.





If the error occurs every time 

On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 


--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Sean Gillies
 

Hi Dion,

That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem.

First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all?

Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs.

Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated.





If the error occurs every time 


On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 



--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner <dion.haefner@...>
 

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 


Re: Clipping a raster (grib2), end up with zeros on left edge when cropping with mask

Sean Gillies
 

Hi Shane,

It looks like your feature geometries extend beyond the extent of your raster, yes? In that case, crop can create an extra row or column of pixels, which will be empty. This seems like a bug to me, or undefined behavior, at least. I suggest passing crop=False in your situation.


On Thu, May 23, 2019 at 8:52 AM Shane Mill - NOAA Affiliate via Groups.Io <shane.mill=noaa.gov@groups.io> wrote:
Hi everyone,

I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before?
I think the issue is where the mask function is used with the cropping. 

In the picture below, the blue edge shows the zeros.
And for context, here is the driving code that is performing the action:





--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

vincent.sarago@...
 

This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 


Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner <dion.haefner@...>
 

Dear rasterio group,

 

(I initially posted this at https://github.com/mapbox/rasterio/issues/1686)

Lately, we have encountered a strange bug in Terracotta. It basically always leads to errors like these:

(from DHI-GRAS/terracotta#139)

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 213, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsis3/italy-composite/rasters/italy_2018_red.tif' not recognized as a supported file format.

or

(from DHI-GRAS/terracotta#10 (comment))

Traceback (most recent call last):
  File "rasterio/_io.pyx", line 698, in rasterio._io.DatasetReaderBase._read
  File "rasterio/shim_rasterioex.pxi", line 133, in rasterio._shim.io_multi_band
  File "rasterio/_err.pyx", line 182, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: IReadBlock failed at X offset 0, Y offset 0: '/vsis3/bucket/prefix/tile330.tif' does not exist in the file system, and is not recognized as a supported dataset name.

The errors occur on different versions of rasterio, although anecdotally it wasn't a problem pre-1.0.15. It also seems to occur both during rasterio.open, and when actually reading tiles via WarpedVRT.read.

The problem is that we have only observed it with huge raster files, and we haven't been able to reproduce this reliably, or in a way where I could share it with you.

Does anyone have any intuition why this might be happening / what we could look at to debug this?

 

Cheers,
Dion

 


Clipping a raster (grib2), end up with zeros on left edge when cropping with mask

Shane Mill - NOAA Affiliate
 

Hi everyone,

I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before?
I think the issue is where the mask function is used with the cropping. 

In the picture below, the blue edge shows the zeros.
And for context, here is the driving code that is performing the action:




Re: problem with window sizes when parallelizing a funciton

javier lopatin
 

Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.



--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


Re: problem with window sizes when parallelizing a funciton

Alan Snow
 

You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.


Re: problem with window sizes when parallelizing a funciton

Sean Gillies
 

Alan: the function does indeed write to a new file with blocksizes set appropriately.

Javier: the TIFF spec states that tile width (and this goes for height, I presume) must be a multiple of 16. Neither GDAL's GeoTIFF format page nor Rasterio docs make this clear. I'll add a note. I think that if you try 512 instead of 128, you'll have success.


On Wed, May 22, 2019 at 7:29 AM Alan Snow <alansnow21@...> wrote:
Hi Javier,

The blocksize is not a dynamic attribute. It represents the blocksize of the file on disk. So, if you want to change the blocksize, you will need to write it to a  new file with a new blocksize.

Hopefully this helps,
Alan



--
Sean Gillies


Re: problem with window sizes when parallelizing a funciton

Alan Snow
 

Hi Javier,

The blocksize is not a dynamic attribute. It represents the blocksize of the file on disk. So, if you want to change the blocksize, you will need to write it to a  new file with a new blocksize.

Hopefully this helps,
Alan


problem with window sizes when parallelizing a funciton

javier lopatin
 

Hi all,
I'm currently using rasterio to process time series analysis of data. Because the raster tiles are too big, I'm trying to implement a windowed function with parallel processing. I tried your tutorial on concurrent processing (
https://github.com/mapbox/rasterio/blob/master/docs/topics/concurrency.rst) and it works beautify with my own functions. I'm just wondering why it is only working with window sizes of 128 (profile.update(blockxsize=128,blockysize=128tiled=True)​), just like in the example. If I change these values to e.g. 200 or 500 it does not work anymore. I'm currently trying with a 3,000 X 3,000 raster size. Because loops are slow, I assume that increasing a bit the window size could be helpful. The error message that I receive if I change blockxsize is:

Traceback (most recent call last):
  File "TSA.py", line 252, in <module>
    main(args.inputImage, args.outputImage, args.j)
  File "TSA.py", line 225, in main
    parallel_process(infile, outfile, n_jobs)
  File "TSA.py", line 187, in parallel_process
    with rasterio.open(outfile, "w", **profile) as dst:
  File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/env.py", line 398, in wrapper
    return f(*args, **kwds)
  File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/__init__.py", line 226, in open
    **kwargs)
  File "rasterio/_io.pyx", line 1129, in rasterio._io.DatasetWriterBase.__init__
  File "rasterio/_err.pyx", line 194, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: _TIFFVSetField:/home/javier/Documents/SF_delta/Sentinel/TSA/X-004_Y-001/2015-2019_001-365_LEVEL4_TSA_SEN2L_EVI_C0_S0_FAVG_TY_C95T_FBY_TSA.tif: Bad value 500 for "TileWidth" tag
 

The used function is below (see the whole script at 
https://github.com/JavierLopatin/Python-Remote-Sensing-Scripts/blob/master/TSA.py):

def parallel_process(infile, outfile, n_jobs):
    """
    Process infile block-by-block with parallel processing
    and write to a new file.
    """
    from tqdm import tqdm # progress bar
 
    with rasterio.Env():
 
        with rasterio.open(infile) as src:
 
            # Create a destination dataset based on source params. The
            # destination will be tiled, and we'll process the tiles
            # concurrently.
            profile = src.profile
            profile.update(blockxsize=128, blockysize=128,
                           count=6, dtype='float64', tiled=True)
 
            with rasterio.open(outfile, "w", **profile) as dst:
 
                # Materialize a list of destination block windows
                # that we will use in several statements below.
                windows = [window for ij, window in dst.block_windows()]
 
                # This generator comprehension gives us raster data
                # arrays for each window. Later we will zip a mapping
                # of it with the windows list to get (window, result)
                # pairs.
                data_gen = (src.read(window=window) for window in windows)
 
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
 
                    # We map the TSA() function over the raster
                    # data generator, zip the resulting iterator with
                    # the windows list, and as pairs come back we
                    # write data to the destination dataset.
                    for window, result in zip(
                        tqdm(windows), executor.map(TSA, data_gen)
                    ):
                        dst.write(result, window=window)

Hope you guys can help. 

Cheers, Javier


Re: Python rasterio for saveing GeoTIFF files and read in ArcGIS or QGIS

Gabriel Cotlier
 

Dear Sean,
Thank you very much for your help.
I appreciate it.
Regards
Gabriel


On Monday, May 20, 2019, Sean Gillies <sean.gillies@...> wrote:
Thank you for providing a complete example of your code and the ArcMap error message. Without those details, it can be hard to provide help.

When you open the files for writing, there is one thing missing: the *transform* keyword argument that determines the spatial extent of the dataset. I think that in your case, you will have success if you copy the value from the source datasets:

filename = "20180308_133037_1024_3B_AnalyticMS.tif"

# Copy the transform and crs objects from the source dataset.
with rasterio.open(filename) as src:
    dst_transform = src.transform
    dst_crs = src.crs

# Pass the copied values to rasterio.open().
new_dataset = rasterio.open(
       'ReflectanceB1.tif',
       'w',
       driver='GTiff',
       height=band_blue_reflectance.shape[0],
       width=band_blue_reflectance.shape[1],
       count=1,
       dtype=band_blue_reflectance.dtype,
       crs=dst_crs,
       transform=dst_transform
)    
    
new_dataset.write(band_blue_reflectance, 1)
new_dataset.close()


On Sat, May 18, 2019 at 12:39 PM <gabiklm01@...> wrote:
After following the steps in the tutorial of the Planet website (https://developers.planet.com/tutorials/convert-planetscope-imagery-from-radiance-to-reflectance/), I want to save each individual band of a RapidEye image Level 3B as a separate GeoTIFF layer file to be able to open it in any other GIS-software such as QGIS, ArcGIS other. I could not succeed in saving each band separately as GeoTIFF layer file. Here is the code:
 
```
#!/usr/bin/env python
# coding: utf-8
 
import rasterio
import numpy as np
 
filename = "20180308_133037_1024_3B_AnalyticMS.tif"
 
# Load red and NIR bands - note all PlanetScope 4-band images have band order BGRN
with rasterio.open(filename) as src:
    band_blue_radiance = src.read(1)
    
with rasterio.open(filename) as src:
    band_green_radiance = src.read(2)
 
with rasterio.open(filename) as src:
    band_red_radiance = src.read(3)
 
with rasterio.open(filename) as src:
    band_nir_radiance = src.read(4)
 
from xml.dom import minidom
 
xmldoc = minidom.parse("20180308_133037_1024_3B_AnalyticMS_metadata.xml")
nodes = xmldoc.getElementsByTagName("ps:bandSpecificMetadata")
 
# XML parser refers to bands by numbers 1-4
coeffs = {}
for node in nodes:
    bn = node.getElementsByTagName("ps:bandNumber")[0].firstChild.data
    if bn in ['1', '2', '3', '4']:
        i = int(bn)
        value = node.getElementsByTagName("ps:reflectanceCoefficient")[0].firstChild.data
        coeffs[i] = float(value)
 
print "Conversion coefficients:", coeffs
 
# Multiply the Digital Number (DN) values in each band by the TOA reflectance coefficients
band_blue_reflectance = band_blue_radiance * coeffs[1]
band_green_reflectance = band_green_radiance * coeffs[2]
band_red_reflectance = band_red_radiance * coeffs[3]
band_nir_reflectance = band_nir_radiance * coeffs[4]
 
import numpy as np
print "Red band radiance is from {} to {}".format(np.amin(band_red_radiance), np.amax(band_red_radiance))
print "Red band reflectance is from {} to {}".format(np.amin(band_red_reflectance), np.amax(band_red_reflectance))
 
print "Before Scaling, red band reflectance is from {} to {}".format(np.amin(band_red_reflectance), np.amax(band_red_reflectance))
 
# Here we include a fixed scaling factor. This is common practice.
scale = 10000
blue_ref_scaled = scale * band_blue_reflectance
green_ref_scaled = scale * band_green_reflectance
red_ref_scaled = scale * band_red_reflectance
nir_ref_scaled = scale * band_nir_reflectance
 
print "After Scaling, red band reflectance is from {} to {}".format(np.amin(red_ref_scaled), np.amax(red_ref_scaled))
```
Here I tried unsuccessfully to save each individual band as GeotiFF file.
 
I have tried, any idea how to do the save of each band correctly to further open in QGIS orArGIS for calculating NDVI there?
 
```
dst_crs='EPSG:32720'
 
##  saving bands individualy  
new_dataset = rasterio.open(
       'ReflectanceB1.tif',
       'w',
       driver='GTiff',
       height=band_blue_reflectance.shape[0],
       width=band_blue_reflectance.shape[1],
       count=1,
       dtype=band_blue_reflectance.dtype,
       crs=dst_crs,
)    
    
new_dataset.write(band_blue_reflectance, 1)
new_dataset.close()
   
   
new_dataset = rasterio.open(
       'ReflectanceB2.tif',
       'w',
       driver='GTiff',
       height=band_red_reflectance.shape[0],
       width=band_red_reflectance.shape[1],
       count=1,
       dtype=band_red_reflectance.dtype,
       crs=dst_crs,
)    
    
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
 
new_dataset = rasterio.open(
       'ReflectanceB3.tif',
       'w',
       driver='GTiff',
       height=band_red_reflectance.shape[0],
       width=band_red_reflectance.shape[1],
       count=1,
       dtype=band_red_reflectance.dtype,
       crs=dst_crs,
)    
    
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()    
 
 
new_dataset = rasterio.open(
       'ReflectanceB4.tif',
       'w',
       driver='GTiff',
       height=band_red_reflectance.shape[0],
       width=band_red_reflectance.shape[1],
       count=1,
       dtype=band_red_reflectance.dtype,
       crs=dst_crs,
)    
    
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
```
I'm using as input image "20180308_133037_1024_3B_AnalyticMS.tif" instead of "20180308_133037_1024_3B_AnalyticMS_SR.tif" don't know if is the problem
 
I'm getting resulting GeoTIFF file but when I open it in ArcMap I get this error
 


The GeoTIFF band are there and can be seen at ArcMap screen...   but the error appear anyway... might be related to coordinate systems to be defined possibly while saving in rasterio? And need to have it correctly projected for clipping with shapefiles.



--
Sean Gillies

501 - 520 of 736