Date   

Re: problem with window sizes when parallelizing a funciton

Dion Häfner
 

Hey Javier,

this is not really a rasterio problem, mostly a Python peculiarity.

In general, you cannot pass functions defined in a local closure to a different process. Just move your main function to global scope and you should be fine. You can use functools.partial to fix some optional arguments of your main function and achieve the same thing as with your closure.

Best,
Dion

On 13/06/2019 15.05, javier lopatin via Groups.Io wrote:
Hi Guys!

I'm still working with the parallelization of raster processes with rasterio, and I came across a problem that you might know about. I've some raster stacks of time series data from which I want to obtain a interpolation signature. In practice I've a function to process the raster chunks in parallel with rasterio (calling a function main()):

def _parallel_process(inData, outData, main, count,  n_jobs=4, chuckSize=256):
    with rasterio.Env():
        with rasterio.open(inData) as src:
            profile = src.profile
            profile.update(blockxsize=chuckSize, blockysize=chuckSize,
                           count=count, dtype='float64', tiled=True)
            with rasterio.open(outData, "w", **profile) as dst:
                windows = [window for ij, window in dst.block_windows()]
                data_gen = (src.read(window=window) for window in windows)
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
                    for window, result in zip(
                        tqdm(windows), executor.map(main, data_gen)
                    ):
                        dst.write(result, window=window)

And I have another function that calls _parallel_process:

def PhenoShape(inData, outData, dates=None, nan_replace=None, rollWindow=None, nGS=46, chuckSize=256, n_jobs=4):
    def main(dstack):
        # load raster as a xarray
        xarray = xr.DataArray(dstack)
        xarray.coords['dim_0'] = dates
        xarray.coords['doy'] = xarray.dim_0.dt.dayofyear
        # sort basds according to day-of-the-year
        xarray = xarray.sortby('doy')
        # rearrange time dimension for smoothing and interpolation
        xarray['time'] = xarray['doy']
        # turn a value to NaN
        if nan_replace is not None:
            xarray = xarray.where(xarray.values != nan_replace)
        # rolling average using moving window
        if rollWindow is not None:
            xarray = xarray.rolling(dim_0=rollWindow, center=True).mean()
        # prepare inputs to getPheno
        x = xarray.doy.values
        y = xarray.values
        # get phenology shape accross the time axis
        return np.apply_along_axis(_getPheno, 0, y, x, nGS)
    # apply PhenoShape with parallel processing
    try:
        _parallel_process(inData, outData, main, nGS, n_jobs, chuckSize)
    except AttributeError:
        print('ERROR somehow.. :(')
 
This works really fine! But when I import the script as a library (functions inside phenology.py):

import phenology as phen
phen.PhenoShape(inData=inData, outData=outData, dates=dates, rollWindow=3,
    nan_replace=-32767, nGS=46, chuckSize=256, n_jobs=4)

I get the following error:

  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'

Any ideas? The main idea here is to call this functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.

El jue., 23 may. 2019 a las 11:13, javier lopatin via Groups.Io (<javierlopatin=gmail.com@groups.io>) escribió:
Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


Re: problem with window sizes when parallelizing a funciton

javier lopatin
 

Hi Guys!

I'm still working with the parallelization of raster processes with rasterio, and I came across a problem that you might know about. I've some raster stacks of time series data from which I want to obtain a interpolation signature. In practice I've a function to process the raster chunks in parallel with rasterio (calling a function main()):

def _parallel_process(inData, outData, main, count,  n_jobs=4, chuckSize=256):
    with rasterio.Env():
        with rasterio.open(inData) as src:
            profile = src.profile
            profile.update(blockxsize=chuckSize, blockysize=chuckSize,
                           count=count, dtype='float64', tiled=True)
            with rasterio.open(outData, "w", **profile) as dst:
                windows = [window for ij, window in dst.block_windows()]
                data_gen = (src.read(window=window) for window in windows)
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
                    for window, result in zip(
                        tqdm(windows), executor.map(main, data_gen)
                    ):
                        dst.write(result, window=window)

And I have another function that calls _parallel_process:

def PhenoShape(inData, outData, dates=None, nan_replace=None, rollWindow=None, nGS=46, chuckSize=256, n_jobs=4):
    def main(dstack):
        # load raster as a xarray
        xarray = xr.DataArray(dstack)
        xarray.coords['dim_0'] = dates
        xarray.coords['doy'] = xarray.dim_0.dt.dayofyear
        # sort basds according to day-of-the-year
        xarray = xarray.sortby('doy')
        # rearrange time dimension for smoothing and interpolation
        xarray['time'] = xarray['doy']
        # turn a value to NaN
        if nan_replace is not None:
            xarray = xarray.where(xarray.values != nan_replace)
        # rolling average using moving window
        if rollWindow is not None:
            xarray = xarray.rolling(dim_0=rollWindow, center=True).mean()
        # prepare inputs to getPheno
        x = xarray.doy.values
        y = xarray.values
        # get phenology shape accross the time axis
        return np.apply_along_axis(_getPheno, 0, y, x, nGS)
    # apply PhenoShape with parallel processing
    try:
        _parallel_process(inData, outData, main, nGS, n_jobs, chuckSize)
    except AttributeError:
        print('ERROR somehow.. :(')
 
This works really fine! But when I import the script as a library (functions inside phenology.py):

import phenology as phen
phen.PhenoShape(inData=inData, outData=outData, dates=dates, rollWindow=3,
    nan_replace=-32767, nGS=46, chuckSize=256, n_jobs=4)

I get the following error:

  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'

Any ideas? The main idea here is to call this functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.

El jue., 23 may. 2019 a las 11:13, javier lopatin via Groups.Io (<javierlopatin=gmail.com@groups.io>) escribió:
Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.



--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...



--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Jonas
 

OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Jonas
 
Edited

Here is some more insight on the issue. Remember, we are seeing this when reading small chunks from a cloud-optimized GeoTIFF on S3 with our Terracotta tile server.

On our latest encounters of the issue, we are getting it consistently on the same map tile but not on adjacent tiles that are reading the exact same GeoTIFF.
So this challenges the assumption that this is a race condition, does it not?

Btw, we also tried the solution from https://github.com/OSGeo/gdal/issues/1244#issuecomment-487164897 (CPL_VSIL_CURL_NON_CACHED="/vsis3/") to no avail.


Rasterio 1.0.24

Sean Gillies
 

Hi all,

Rasterio 1.0.24 is on PyPI now: https://pypi.org/project/rasterio/1.0.24/#files. This release fixes a pretty major bug that potentially affected multi-threaded programs or programs that reopened datasets while changing opening options like overview level (see issue #1504). Upgrade when you can.

The only other change to note is that the wheels on PyPI now include GDAL 2.4.1.

--
Sean Gillies


Re: gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)

Erick Palacios Moreno
 

Hi Sean,

Thank you for your answer.

Even answered that LUSTRE is causing this behaviour and my best option is building GDAL and libnetcdf against a libhdf5 version that plays well with LUSTRE.

In ubuntu xenial we didn't have this problems so we will discuss if it's an option using a different version of libhdf5.

Thank you,

Erick


----- Mensaje original -----
De: "Sean Gillies" <sean.gillies@gmail.com>
Para: main@rasterio.groups.io
Enviados: Miércoles, 5 de Junio 2019 13:25:55
Asunto: Re: [rasterio] gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)

Hi Erick,

I'm not familiar with Lustre and only slightly familiar with the details of
GDAL's netCDF driver. I think, since the problem manifests with gdalinfo as
well as rasterio programs, that the best source of help will be the
gdal-dev list: https://lists.osgeo.org/mailman/listinfo/gdal-dev/. There
have been recent discussions related to HDF5 and netCDF4 there and GDAL's
developer, Even Rouault, will probably have some insights. I hate to
redirect you to another email list, but gdal-dev seems to be the best place
to ask in this case. When you get help there, I'll make sure to follow up
here.

On Wed, Jun 5, 2019 at 9:22 AM <epalacios@conabio.gob.mx> wrote:

Hi,

My team and I have been facing some problems when reading netcdf files in
`LUSTRE` filesystem with `ncdump` tool regarding an HDF5 1.10.1 issue in
`ubuntu:bionic` (see for example: https://github.com/ALPSCore/ALPSCore/issues/410)


We made a workaround by installing `netcdf-bin` using repositories of
`ubuntu:xenial` in `bionic`. Although `ncdump` have successfully read the
netcdf file, we couldn't fix it (sort of...) for `gdalinfo` and `rasterio`
for example:

this works:

ncdump /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc|head
-n 20

gdalinfo /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc
|head -n 20

gdalinfo -sd 1 /LUSTRE/MADMEX/dir_test_mount/
madmex_003_37_-32_1996-01-01.nc |head -n 20

but this neither of next lines work:

gdalinfo
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

gdalinfo
NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

```
ERROR 4:
NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean:
No such file or directory
gdalinfo failed - unable to open
'NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean'.
```

This ultimately leads to error with `rio insp` in both next lines :

rio insp
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

rio insp
NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:
blue_mean

```
ERROR:root:Exception caught during processing
Traceback (most recent call last):
File "rasterio/_base.pyx", line 214, in
rasterio._base.DatasetBase.__init__
File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError:
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean:
No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/rasterio/rio/insp.py", line
77, in insp
with rasterio.open(input, mode) as src:
File "/usr/local/lib/python3.6/dist-packages/rasterio/env.py", line 423,
in wrapper
return f(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py", line
216, in open
s = DatasetReader(path, driver=driver, **kwargs)
File "rasterio/_base.pyx", line 216, in
rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError:
netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean:
No such file or directory
Aborted!

```

But this works for both `gdalinfo` and `rio insp` tools:

```
gdalinfo hdf5:/LUSTRE/MADMEX/dir_test_mount/
madmex_003_37_-32_1996-01-01.nc://blue_mean

Driver: HDF5Image/HDF5 Dataset
Files: /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is `'
Metadata:
Conventions=CF-1.6, ACDD-1.3
date_created=2019-04-26T21:07:47.595416
geospatial_bounds=POLYGON ((-98.8618335626467
19.8093029971201,-98.8722811078025 19.3587577175966,-98.3949028300917
19.3481841431337,-98.382861955109 19.7986884105649,-98.8618335626467
19.8093029971201))
geospatial_bounds_crs=EPSG:4326
...
```

```
rio insp hdf5:/LUSTRE/MADMEX/dir_test_mount/
madmex_003_37_-32_1996-01-01.nc://blue_mean

/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py:216:
NotGeoreferencedWarning: Dataset has no geotransform set. The identity
matrix may be returned.
s = DatasetReader(path, driver=driver, **kwargs)
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
src.meta
{'driver': 'HDF5Image', 'dtype': 'int16', 'nodata': None, 'width': 1667,
'height': 1667, 'count': 1, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
0.0, 1.0, 0.0)}

a=src.read()
a
array([[[632, 603, 586, ..., 505, 483, 471],
[567, 538, 536, ..., 468, 478, 523],
[614, 580, 537, ..., 540, 633, 675],
...,
[828, 810, 804, ..., 275, 268, 299],
[857, 844, 823, ..., 310, 290, 307],
[840, 854, 836, ..., 320, 288, 294]]], dtype=int16)

```

This is related with LUSTRE FS because if we copy our NetCDF file to /root
location then:

```
this works:

gdalinfo netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean

Driver: netCDF/Network Common Data Format
Files: /root/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is:
PROJCS["unnamed",
GEOGCS["WGS 84",
...
```

```
rio insp netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
src.meta
{'driver': 'netCDF', 'dtype': 'int16', 'nodata': -32767.0, 'width': 1667,
'height': 1667, 'count': 1, 'crs':
CRS.from_wkt('PROJCS["unnamed",GEOGCS["WGS
84",DATUM["unknown",SPHEROID["WGS84",6378137,6556752.3141]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",17.5],PARAMETER["standard_parallel_2",29.5],PARAMETER["latitude_of_origin",12],PARAMETER["central_meridian",-102],PARAMETER["false_easting",2500000],PARAMETER["false_northing",0]]'),
'transform': Affine(30.0, 0.0, 2827530.0,
0.0, -30.0, 876410.0)}
a=src.read()
a
array([[[632, 603, 586, ..., 505, 483, 471],
[567, 538, 536, ..., 468, 478, 523],
[614, 580, 537, ..., 540, 633, 675],
...,
[828, 810, 804, ..., 275, 268, 299],
[857, 844, 823, ..., 310, 290, 307],
[840, 854, 836, ..., 320, 288, 294]]], dtype=int16)
```

We are using `ubuntu:bionic` docker image so you can reproduce this error
using next lines:

```
sudo docker run -v /LUSTRE/:/LUSTRE/ \
-v /LUSTRE/MADMEX/dir_test_mount:/temporal \
--name mounting_volume_test_bionic \
--hostname antares3-datacube \
-dit ubuntu:bionic /bin/bash
```

enter to docker container:

```
sudo docker exec -it mounting_volume_test_bionic bash
```

and execute:

```
apt-get update

export DEBIAN_FRONTEND=noninteractive && echo "America/Mexico_City" >
/etc/timezone && apt-get install -y tzdata
apt-get update && apt-get install -y \
wget curl \
openssh-server \
openssl \
sudo \
nano \
software-properties-common \
git \
vim \
vim-gtk \
htop \
build-essential \
libssl-dev \
libffi-dev \
cmake \
python3-dev \
python3-pip \
python3-setuptools \
ca-certificates \
postgresql-client \
libudunits2-dev \
nodejs && pip3 install --upgrade pip

#install netcdf:

echo -e 'deb http://security.ubuntu.com/ubuntu/ xenial-security
universe\ndeb http://archive.ubuntu.com/ubuntu/ xenial universe' >>
/etc/apt/sources.list
apt update
apt install -t xenial-security libcurl3-gnutls
apt install -y -t xenial netcdf-bin
ncdump /LUSTRE/MADMEX/madmex_003_37_-32_1996-01-01.nc | head -n 30


#Install spatial libraries
add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable && apt-get -qq
update

apt-get install -y \
ncview \
libproj-dev \
libgeos-dev \
gdal-bin \
libgdal-dev
pip install numpy && pip install GDAL==$(gdal-config --version)
--global-option=build_ext --global-option='-I/usr/include/gdal' && pip
install rasterio
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
export GDAL_DATA=/usr/share/gdal/
```

Hope you can help us

Cheers,

Erick


--
Sean Gillies


Re: gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)

Sean Gillies
 

Hi Erick,

I'm not familiar with Lustre and only slightly familiar with the details of GDAL's netCDF driver. I think, since the problem manifests with gdalinfo as well as rasterio programs, that the best source of help will be the gdal-dev list: https://lists.osgeo.org/mailman/listinfo/gdal-dev/. There have been recent discussions related to HDF5 and netCDF4 there and GDAL's developer, Even Rouault, will probably have some insights. I hate to redirect you to another email list, but gdal-dev seems to be the best place to ask in this case. When you get help there, I'll make sure to follow up here.


On Wed, Jun 5, 2019 at 9:22 AM <epalacios@...> wrote:
Hi,

 My team and I have been facing some problems when reading netcdf files in `LUSTRE` filesystem with `ncdump` tool regarding an HDF5 1.10.1 issue in `ubuntu:bionic` (see for example: https://github.com/ALPSCore/ALPSCore/issues/410)

We made a workaround by installing `netcdf-bin` using repositories of `ubuntu:xenial` in `bionic`. Although `ncdump` have successfully read the netcdf file, we couldn't fix it (sort of...) for `gdalinfo` and  `rasterio` for example:

this works:

ncdump /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc|head -n 20

gdalinfo /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc |head -n 20

gdalinfo -sd 1 /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc |head -n 20

but this neither of next lines work:

gdalinfo netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

gdalinfo NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

```
ERROR 4: NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean: No such file or directory
gdalinfo failed - unable to open 'NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean'.
```

This ultimately leads to error with `rio insp`  in both next lines :

rio insp netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

rio insp NetCDF:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean

```
ERROR:root:Exception caught during processing
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 214, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/rasterio/rio/insp.py", line 77, in insp
    with rasterio.open(input, mode) as src:
  File "/usr/local/lib/python3.6/dist-packages/rasterio/env.py", line 423, in wrapper
    return f(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py", line 216, in open
    s = DatasetReader(path, driver=driver, **kwargs)
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: netcdf:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc:blue_mean: No such file or directory
Aborted!

```

But this works for both `gdalinfo` and `rio insp` tools:

```
gdalinfo hdf5:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc://blue_mean

Driver: HDF5Image/HDF5 Dataset
Files: /LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is `'
Metadata:
  Conventions=CF-1.6, ACDD-1.3
  date_created=2019-04-26T21:07:47.595416
  geospatial_bounds=POLYGON ((-98.8618335626467 19.8093029971201,-98.8722811078025 19.3587577175966,-98.3949028300917 19.3481841431337,-98.382861955109 19.7986884105649,-98.8618335626467 19.8093029971201))
  geospatial_bounds_crs=EPSG:4326
...
```

```
rio insp hdf5:/LUSTRE/MADMEX/dir_test_mount/madmex_003_37_-32_1996-01-01.nc://blue_mean

/usr/local/lib/python3.6/dist-packages/rasterio/__init__.py:216: NotGeoreferencedWarning: Dataset has no geotransform set. The identity matrix may be returned.
  s = DatasetReader(path, driver=driver, **kwargs)
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> src.meta
{'driver': 'HDF5Image', 'dtype': 'int16', 'nodata': None, 'width': 1667, 'height': 1667, 'count': 1, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0)}

>>> a=src.read()
>>> a
array([[[632, 603, 586, ..., 505, 483, 471],
        [567, 538, 536, ..., 468, 478, 523],
        [614, 580, 537, ..., 540, 633, 675],
        ...,
        [828, 810, 804, ..., 275, 268, 299],
        [857, 844, 823, ..., 310, 290, 307],
        [840, 854, 836, ..., 320, 288, 294]]], dtype=int16)

```

This is related with LUSTRE FS because if we copy our NetCDF file to /root location then:

```
this works:

gdalinfo netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean

Driver: netCDF/Network Common Data Format
Files: /root/madmex_003_37_-32_1996-01-01.nc
Size is 1667, 1667
Coordinate System is:
PROJCS["unnamed",
    GEOGCS["WGS 84",
...
```

```
rio insp netcdf:/root/madmex_003_37_-32_1996-01-01.nc:blue_mean
Rasterio 1.0.23 Interactive Inspector (Python 3.6.7)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> src.meta
{'driver': 'netCDF', 'dtype': 'int16', 'nodata': -32767.0, 'width': 1667, 'height': 1667, 'count': 1, 'crs': CRS.from_wkt('PROJCS["unnamed",GEOGCS["WGS 84",DATUM["unknown",SPHEROID["WGS84",6378137,6556752.3141]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",17.5],PARAMETER["standard_parallel_2",29.5],PARAMETER["latitude_of_origin",12],PARAMETER["central_meridian",-102],PARAMETER["false_easting",2500000],PARAMETER["false_northing",0]]'), 'transform': Affine(30.0, 0.0, 2827530.0,
       0.0, -30.0, 876410.0)}
>>> a=src.read()
>>> a
array([[[632, 603, 586, ..., 505, 483, 471],
        [567, 538, 536, ..., 468, 478, 523],
        [614, 580, 537, ..., 540, 633, 675],
        ...,
        [828, 810, 804, ..., 275, 268, 299],
        [857, 844, 823, ..., 310, 290, 307],
        [840, 854, 836, ..., 320, 288, 294]]], dtype=int16)
```

We are using `ubuntu:bionic` docker image so you can reproduce this error using next lines:

```
sudo docker run -v /LUSTRE/:/LUSTRE/ \
-v /LUSTRE/MADMEX/dir_test_mount:/temporal \
--name mounting_volume_test_bionic \
--hostname antares3-datacube \
-dit ubuntu:bionic /bin/bash
```

enter to docker container:

```
sudo docker exec -it mounting_volume_test_bionic bash
```

and execute:

```
apt-get update

export DEBIAN_FRONTEND=noninteractive && echo "America/Mexico_City" > /etc/timezone && apt-get install -y tzdata
apt-get update && apt-get install -y \
            wget curl \
            openssh-server \
            openssl \
            sudo \
            nano \
            software-properties-common \
            git \
            vim \
            vim-gtk \
            htop \
            build-essential \
            libssl-dev \
            libffi-dev \
            cmake \
            python3-dev \
            python3-pip \
            python3-setuptools \
            ca-certificates \
            postgresql-client \
            libudunits2-dev \
            nodejs && pip3 install --upgrade pip

#install netcdf:

echo -e 'deb http://security.ubuntu.com/ubuntu/ xenial-security universe\ndeb http://archive.ubuntu.com/ubuntu/ xenial universe' >> /etc/apt/sources.list
apt update
apt install -t xenial-security libcurl3-gnutls
apt install -y -t xenial netcdf-bin
ncdump /LUSTRE/MADMEX/madmex_003_37_-32_1996-01-01.nc | head -n 30


#Install spatial libraries
add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable && apt-get -qq update

apt-get install -y \
            ncview \
            libproj-dev \
            libgeos-dev \
            gdal-bin \
            libgdal-dev
pip install numpy && pip install GDAL==$(gdal-config --version) --global-option=build_ext --global-option='-I/usr/include/gdal' && pip install rasterio
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
export GDAL_DATA=/usr/share/gdal/
```

Hope you can help us

Cheers,

Erick



--
Sean Gillies


Re: Clipping a raster (grib2), end up with zeros on left edge when cropping with mask

Shane Mill - NOAA Affiliate
 

Hey Sean,

As always, thanks for the feedback. So to follow up, it turns out that I had to add "filled=False". In my particular situation, the user drags a bounding box over the original raster and is returned with the subsetted raster. With 'filled=False' in out_image, out_transform = mask(src, geoms, filled=False, crop=True), i get the desired result.

Thanks!
Shane Mill

On Fri, May 24, 2019 at 10:00 AM Sean Gillies <sean.gillies@...> wrote:
Hi Shane,

It looks like your feature geometries extend beyond the extent of your raster, yes? In that case, crop can create an extra row or column of pixels, which will be empty. This seems like a bug to me, or undefined behavior, at least. I suggest passing crop=False in your situation.

On Thu, May 23, 2019 at 8:52 AM Shane Mill - NOAA Affiliate via Groups.Io <shane.mill=noaa.gov@groups.io> wrote:
Hi everyone,

I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before?
I think the issue is where the mask function is used with the cropping. 

In the picture below, the blue edge shows the zeros.
And for context, here is the driving code that is performing the action:





--
Sean Gillies



--
Shane Mill

Meteorological Application Developer, AceInfo Solutions

Meteorological Development Laboratory (NWS)

Phone: 301-427-9452


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Sean Gillies
 

I found the VRT multi-threading guidance at https://gdal.org/drivers/raster/vrt.html#multi-threading-issues. It moved during the site migration.

This smells like a GDAL bug to me, perhaps a manifestation of https://github.com/OSGeo/gdal/issues/1244 or https://github.com/OSGeo/gdal/issues/1031.


On Mon, May 27, 2019 at 2:50 AM Dion Häfner <dion.haefner@...> wrote:

Hey Sean,

thank you very much for the detailed assessment. I think this already sheds some light on some of the problems.

To answer your questions:

- The /vsis3/ identifiers are correct and work most of the time, we only see these errors from time to time.

- This is just a default read from a private S3 bucket. I don't know which protocol boto (or GDAL?) uses under the hood.

- In the first case, there is indeed multithreading involved, so this is a hot lead. I will have a look at the documentation as soon as I can find it somewhere :) It working most of the time would of course suggest some sort of race condition.

In the second case, the setup is actually more complicated. We experimented with "cloud-optimized" VRT files, that we created by

1. gdalbuildvrt on many small-ish COG

2. gdaladdo on the resulting VRT into an external .vrt.ovr file

We could then serve these COG mosaics up as a single dataset via Terracotta, but apparently the read from the VRT would fail when we zoomed in too far.

Thank you for your time!

Dion

On 24/05/2019 20.49, Sean Gillies via Groups.Io wrote:
Hi Dion,

That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem.

First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all?

Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs.

Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated.

On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 



--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner
 

Hey Sean,

thank you very much for the detailed assessment. I think this already sheds some light on some of the problems.

To answer your questions:

- The /vsis3/ identifiers are correct and work most of the time, we only see these errors from time to time.

- This is just a default read from a private S3 bucket. I don't know which protocol boto (or GDAL?) uses under the hood.

- In the first case, there is indeed multithreading involved, so this is a hot lead. I will have a look at the documentation as soon as I can find it somewhere :) It working most of the time would of course suggest some sort of race condition.

In the second case, the setup is actually more complicated. We experimented with "cloud-optimized" VRT files, that we created by

1. gdalbuildvrt on many small-ish COG

2. gdaladdo on the resulting VRT into an external .vrt.ovr file

We could then serve these COG mosaics up as a single dataset via Terracotta, but apparently the read from the VRT would fail when we zoomed in too far.

Thank you for your time!

Dion


On 24/05/2019 20.49, Sean Gillies via Groups.Io wrote:
Hi Dion,

That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem.

First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all?

Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs.

Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated.





If the error occurs every time 

On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 


--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Sean Gillies
 

Hi Dion,

That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem.

First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all?

Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs.

Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated.





If the error occurs every time 


On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 



--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner
 

We've only tried rasterio wheels.

On 23/05/2019 21.23, vincent.sarago via Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 


Re: Clipping a raster (grib2), end up with zeros on left edge when cropping with mask

Sean Gillies
 

Hi Shane,

It looks like your feature geometries extend beyond the extent of your raster, yes? In that case, crop can create an extra row or column of pixels, which will be empty. This seems like a bug to me, or undefined behavior, at least. I suggest passing crop=False in your situation.


On Thu, May 23, 2019 at 8:52 AM Shane Mill - NOAA Affiliate via Groups.Io <shane.mill=noaa.gov@groups.io> wrote:
Hi everyone,

I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before?
I think the issue is where the mask function is used with the cropping. 

In the picture below, the blue edge shows the zeros.
And for context, here is the driving code that is performing the action:





--
Sean Gillies


Re: Occasional "not recognized as a supported file format" errors when reading from S3

vincent.sarago@...
 

This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ? 


Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner
 

Dear rasterio group,

 

(I initially posted this at https://github.com/mapbox/rasterio/issues/1686)

Lately, we have encountered a strange bug in Terracotta. It basically always leads to errors like these:

(from DHI-GRAS/terracotta#139)

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 213, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsis3/italy-composite/rasters/italy_2018_red.tif' not recognized as a supported file format.

or

(from DHI-GRAS/terracotta#10 (comment))

Traceback (most recent call last):
  File "rasterio/_io.pyx", line 698, in rasterio._io.DatasetReaderBase._read
  File "rasterio/shim_rasterioex.pxi", line 133, in rasterio._shim.io_multi_band
  File "rasterio/_err.pyx", line 182, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: IReadBlock failed at X offset 0, Y offset 0: '/vsis3/bucket/prefix/tile330.tif' does not exist in the file system, and is not recognized as a supported dataset name.

The errors occur on different versions of rasterio, although anecdotally it wasn't a problem pre-1.0.15. It also seems to occur both during rasterio.open, and when actually reading tiles via WarpedVRT.read.

The problem is that we have only observed it with huge raster files, and we haven't been able to reproduce this reliably, or in a way where I could share it with you.

Does anyone have any intuition why this might be happening / what we could look at to debug this?

 

Cheers,
Dion

 


Clipping a raster (grib2), end up with zeros on left edge when cropping with mask

Shane Mill - NOAA Affiliate
 

Hi everyone,

I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before?
I think the issue is where the mask function is used with the cropping. 

In the picture below, the blue edge shows the zeros.
And for context, here is the driving code that is performing the action:




Re: problem with window sizes when parallelizing a funciton

javier lopatin
 

Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.



--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


Re: problem with window sizes when parallelizing a funciton

Alan Snow
 

You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.


Re: problem with window sizes when parallelizing a funciton

Sean Gillies
 

Alan: the function does indeed write to a new file with blocksizes set appropriately.

Javier: the TIFF spec states that tile width (and this goes for height, I presume) must be a multiple of 16. Neither GDAL's GeoTIFF format page nor Rasterio docs make this clear. I'll add a note. I think that if you try 512 instead of 128, you'll have success.


On Wed, May 22, 2019 at 7:29 AM Alan Snow <alansnow21@...> wrote:
Hi Javier,

The blocksize is not a dynamic attribute. It represents the blocksize of the file on disk. So, if you want to change the blocksize, you will need to write it to a  new file with a new blocksize.

Hopefully this helps,
Alan



--
Sean Gillies


Re: problem with window sizes when parallelizing a funciton

Alan Snow
 

Hi Javier,

The blocksize is not a dynamic attribute. It represents the blocksize of the file on disk. So, if you want to change the blocksize, you will need to write it to a  new file with a new blocksize.

Hopefully this helps,
Alan

461 - 480 of 698