Date   

Re: concurrent.futures.ThreadPoolExecutor for s3 COG reads fails - CPLReleaseMutex: Error

Sean Gillies
 

Hi Darren,

I've seen this error message before, and opened an issue at https://github.com/OSGeo/gdal/issues/2278. I haven't tried to reproduce the error with GDAL 3.3 yet.

On Fri, May 7, 2021 at 2:51 PM <dweber.consulting@...> wrote:
This could be similar to https://github.com/mapbox/rasterio/issues/1686

In an AWS-Lambda python 3.7 runtime, there is an error like:
 
    CPLReleaseMutex: Error = 1 (Operation not permitted)
    [WARNING] 2021-05-07T17:38:33.315Z 3e8b4b55-9066-426c-9319-540e39e17709
    CPLE_AppDefined in HTTP response code on https://bucket-xxx-us-west-2.s3.amazonaws.com/prefix/geotiff.tif: 0
    2021-05-07 17:38:33,366 | ERROR | '/vsis3/bucket-xxx-us-west-2/prefix/geotiff.tif' not recognized as a supported file format.

The python concurrency that wraps the rasterio/gdal reader is a common pattern like:

    dataset = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_task = {}
        for task in peril_queries:
            future = executor.submit(extract_geotiff_data, task)
            future_to_task[future] = task
    for future in concurrent.futures.as_completed(future_to_task):
        try:
            data = future.result()
            dataset.extend(data)
        except Exception as exc:
            task = future_to_task[future]
            LOGGER.error("Failed Task: %s", task)
            raise
 

Where the 'extract_geotiff_data' function is basically given a task to read an s3-COG,
and that function encapsulates everything to setup a rasterio.Env with gdal env-var
settings, opening the s3 object and reading data from it.

What is causing the `CPLReleaseMutex: Error` and how is it debugged and avoided?

TIA,
Darren

--
Sean Gillies


Re: Reconciling transforms by from_origin and from_bounds

Luke
 
Edited

rasterio.transform.from_origin(westnorthxsizeysize)

Return an Affine transformation given upper left and pixel sizes.

https://rasterio.readthedocs.io/en/latest/api/rasterio.transform.html#rasterio.transform.from_origin

After changing your height calculation from bottom - top to top - bottom, for your coords I get similar transforms:

import math
from rasterio.transform import (from_origin, from_bounds, array_bounds)
from rasterio.coords import BoundingBox

bounds = BoundingBox(
left=1679712.0, bottom=5860848.0, right=1862208.0, top=6029312.0)
width = math.ceil((bounds.right - bounds.left) /
100) # resolution of 100
height = math.ceil((bounds.top - bounds.bottom) / 100) # resolution of 100

T_o = from_origin(bounds.left, bounds.top, 100, 100)

T_b = from_bounds(*bounds
, width, height)

print(T_o)
print(T_b)

print(array_bounds(height, width, T_o))
print(array_bounds(height, width, T_b))

| 100.00, 0.00, 1679712.00| | 0.00,-100.00, 6029312.00| | 0.00, 0.00, 1.00| | 100.00, 0.00, 1679712.00| | 0.00,-99.98, 6029312.00| | 0.00, 0.00, 1.00| (1679712.0, 5860812.0, 1862212.0, 6029312.0) (1679712.0, 5860848.0, 1862208.0, 6029312.0)


Reconciling transforms by from_origin and from_bounds

ayr035@...
 

I am trying to reconcile the transforms that I get by using transform.from_origin and transform.from_bounds

bounds = BoundingBox(left=1679712.0, bottom=5860848.0, right=1862208.0, top=6029312.0)
width = math.ceil((bounds.right - bounds.left) / 100) # resolution of 100
height = math.ceil((bounds.bottom - bounds.top) / 100)
# resolution of 100

T_o = transform.from_origin(bounds.left, bounds.top, width, height)
Affine(1825.0, 0.0, 1679712.0,
      0.0, -1685.0, 6029312.0)

T_b = transform.from_bounds(*bounds, width, height)
Affine(99.99780821917808, 0.0, 1679712.0,
      0.0, -99.9786350148368, 6029312.0)

The transforms seem to differ quite significantly. If we compute the array bounds, I expect to recover the original bounds
transform.array_bounds(height, width, T_o) ->
(1679712.0, 3190087.0, 5010337.0, 6029312.0)   # bottom and right bounds are wrong?
transform.array_bounds(height, width, T_b) -> (
1679712.0, 5860848.0, 1862208.0, 6029312.0)   # These are my original bounds

Am I misusing transform.from_origin?


concurrent.futures.ThreadPoolExecutor for s3 COG reads fails - CPLReleaseMutex: Error

dweber.consulting@...
 

This could be similar to https://github.com/mapbox/rasterio/issues/1686

In an AWS-Lambda python 3.7 runtime, there is an error like:
 
    CPLReleaseMutex: Error = 1 (Operation not permitted)
    [WARNING] 2021-05-07T17:38:33.315Z 3e8b4b55-9066-426c-9319-540e39e17709
    CPLE_AppDefined in HTTP response code on https://bucket-xxx-us-west-2.s3.amazonaws.com/prefix/geotiff.tif: 0
    2021-05-07 17:38:33,366 | ERROR | '/vsis3/bucket-xxx-us-west-2/prefix/geotiff.tif' not recognized as a supported file format.

The python concurrency that wraps the rasterio/gdal reader is a common pattern like:

    dataset = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_task = {}
        for task in peril_queries:
            future = executor.submit(extract_geotiff_data, task)
            future_to_task[future] = task
    for future in concurrent.futures.as_completed(future_to_task):
        try:
            data = future.result()
            dataset.extend(data)
        except Exception as exc:
            task = future_to_task[future]
            LOGGER.error("Failed Task: %s", task)
            raise
 

Where the 'extract_geotiff_data' function is basically given a task to read an s3-COG,
and that function encapsulates everything to setup a rasterio.Env with gdal env-var
settings, opening the s3 object and reading data from it.

What is causing the `CPLReleaseMutex: Error` and how is it debugged and avoided?

TIA,
Darren


Re: Handle to underlying GDAL dataset

Sean Gillies
 

Hi Søren,

It's not possible for a GDAL dataset handle to be shared between rasterio and GDAL's Python bindings. I discourage people from combining these modules. Unless one is extremely careful and understands the internals of each, it's a great way to create perplexing bugs.

Rasterio's dataset.read() works at a higher level than GDAL's DatasetReadAsArray(). The latter is effectively the C function that rasterio uses internally. Rasterio has data type and nodata checking overhead that GDAL's function lacks and perhaps some logic that could be more optimized. Is the overhead really 30%? Would you be willing to use Python's timeit to race the last two lines of your code against each other? I would expect the difference to diminish as the size of the window grows.


On Tue, May 4, 2021 at 7:15 AM soren.rasmussen via groups.io <soren.rasmussen=alexandra.dk@groups.io> wrote:
Hi!

I would like to get a handle for the underlying GDAL dataset behind a RasterIO dataset.
Is this possible somehow?
 
The reason I want to do this is that, in some cases, GDAL operations are significantly faster. I have a case, where reading a subset of channels in a window with the just-released GDAL 3.3.0 is approximately 25% faster than RasterIO.
It would be nice to be able to use GDAL operations without having to open the dataset with both RasterIO and GDAL


from osgeo import gdal, gdal_array

rio_ds = rasterio.open(path)

gdal_ds = gdal.Open(path)
bands = [1,2,3]
resampling = rasterio.enums.Resampling.bilinear
win = ds.window(*bbox)
xoff, yoff, xsize, ysize = map(lambda x: int(round(x)), win.col_off, win.row_off, win.width, win.height)
rio_img = rio_ds.read(window=win, indexes=bands, out_shape=out_shape, resampling=resampling)
gdal_img = gdal_array.DatasetReadAsArray(gdal_ds, xoff=xoff, yoff=yoff, win_xsize=xsize, win_ysize=ysize, buf_xsize=out_shape[1], buf_ysize=out_shape[0], resample_alg=resampling, band_list=bands)
(Note that band_list is new in GDAL 3.3.0)

Best, Søren



--
Sean Gillies


Handle to underlying GDAL dataset

soren.rasmussen@...
 

Hi!

I would like to get a handle for the underlying GDAL dataset behind a RasterIO dataset.
Is this possible somehow?
 
The reason I want to do this is that, in some cases, GDAL operations are significantly faster. I have a case, where reading a subset of channels in a window with the just-released GDAL 3.3.0 is approximately 25% faster than RasterIO.
It would be nice to be able to use GDAL operations without having to open the dataset with both RasterIO and GDAL


from osgeo import gdal, gdal_array

rio_ds = rasterio.open(path)

gdal_ds = gdal.Open(path)
bands = [1,2,3]
resampling = rasterio.enums.Resampling.bilinear
win = ds.window(*bbox)
xoff, yoff, xsize, ysize = map(lambda x: int(round(x)), win.col_off, win.row_off, win.width, win.height)
rio_img = rio_ds.read(window=win, indexes=bands, out_shape=out_shape, resampling=resampling)
gdal_img = gdal_array.DatasetReadAsArray(gdal_ds, xoff=xoff, yoff=yoff, win_xsize=xsize, win_ysize=ysize, buf_xsize=out_shape[1], buf_ysize=out_shape[0], resample_alg=resampling, band_list=bands)
(Note that band_list is new in GDAL 3.3.0)

Best, Søren


Re: Need Help

Sean Gillies
 

Hello,

I don't know about MSAVI2, but once you have a numpy array of 0 and 1, you can mask another array using https://numpy.org/doc/stable/reference/generated/numpy.where.html. Rasterio only reads and writes datasets, it doesn't have built in analysis functions. All that is left up to the user and numpy.

For band names, if you create a new dataset in "w" mode or open an existing dataset in "r+" mode, you can pass a sequence of strings as shown in https://github.com/mapbox/rasterio/blob/master/tests/test_descriptions.py#L14.

On Sat, May 1, 2021 at 7:05 AM <serignemansour.diene@...> wrote:
First : I want to know how could i change the name of raster images’s band? Because i have a multispectral image with 8 band and the name is none when i do image.descrptions. 

Second : 
On this multispectral image, I want to compute an MSAVI2 spectral index and extract all non-ground pixels to mask another image.

MSAVI2 allows to mask the ground by giving it values 0 or 1.

Best regards

--
Sean Gillies


Need Help

serignemansour.diene@...
 

First : I want to know how could i change the name of raster images’s band? Because i have a multispectral image with 8 band and the name is none when i do image.descrptions. 

Second : 
On this multispectral image, I want to compute an MSAVI2 spectral index and extract all non-ground pixels to mask another image.

MSAVI2 allows to mask the ground by giving it values 0 or 1.

Best regards


snippet for minimum bounding snapped window?

Gregory, Matthew
 

Hi all,

Often I'm given unsnapped coordinates and want to find the minimum bounding snapped box based on a rasterio Dataset. I'm guessing someone here has a better way of doing it or I'm missing a method somewhere.

Here's a function with some comments:

def get_minimum_snapped_extent(src, bounds):
# Get window from bounds (assume bounds is sequence of
# left, bottom, right, top
w1 = src.window(*bounds)

# Round to include top-left corner (op="floor" is default)
w2 = w1.round_offsets()

# Union these to get a new window
w3 = rasterio.windows.union(w1, w2)

# Round shape to buffer to bottom-right corner
w4 = w3.round_shape(op="ceil")

# Return the coordinates associated with this window
return src.window_bounds(w4)

Obviously, I don't need all the temp variables, but is there a more succinct way of doing this?

thanks, matt


Rasterio 1.2.2

Sean Gillies
 

Hi all,

Rasterio 1.2.2 source distribution and wheels for selected platforms are on PyPI now. The wheels include GDAL 3.2.2 and PROJ 7.2.1. See 

--
Sean Gillies


Re: Help Saving .Tif File after Mask

Simon Tarr
 
Edited

Hi Luke, thanks for the reply. Your code has helped me move past my previous error.
I had to delete the indexes argument when calling `write' but apart from that, it works like a charm.

Thank you for your help!


On Wed, Apr 7, 2021 at 02:41 AM, Luke wrote:
with rasterio.open('/data/my_output.tif', 'w', 
               driver='GTiff',
               height=masked_raster.shape[1], 
               width=masked_raster.shape[2], 
               count=1, 
               dtype=rasterio.ubyte, 
               crs=file.crs, 
               transform=transform
              ) as outfile:
            outfile.write(masked_raster, indexes=1) 


Re: Using rasterio to mask/crop a raster results in AttributeError

Luke
 

You can also use json:

import json
poly = json.loads(obj.job_loc.geojson)

 

 


Re: Help Saving .Tif File after Mask

Luke
 
Edited

rasterio.mask.mask returns a two element tuple - element 0 is the array, element 1 is the transform.  Try something like (untested):

# Mask Raster
masked_raster, transform = rasterio.mask.mask(file, poly, crop=True, filled=True, invert=False)
 
with rasterio.open('/data/my_output.tif', 'w', 
               driver='GTiff',
               height=masked_raster.shape[1], 
               width=masked_raster.shape[2], 
               count=1, 
               dtype=rasterio.ubyte, 
               crs=file.crs, 
               transform=transform
              ) as outfile:
            outfile.write(masked_raster, indexes=1) 
 


Help Saving .Tif File after Mask

Simon Tarr
 

I have just used rasterio.mask.mask() to crop a larger raster using a Django PolygonField() polygon:

# Open large raster file
file = rasterio.open('/data/corine2018_100m/test.tif') print(file) <open DatasetReader name='/data/corine2018_100m/test.tif' mode='r'>

# Dummy Django model object
obj = newJob.objects.create(job_loc="SRID=4326;POLYGON ((0.9063720703125029 52.32023207609735, 0.8239746093749998 52.10819209746323, 1.170043945312496 52.14191683166823, 1.170043945312496 52.31351619974807, 0.9063720703125029 52.32023207609735))")poly = ast.literal_eval(obj.job_loc.geojson)

# Grab PolygonField(), serialise to GeoJSON, put into a list
poly = ast.literal_eval(obj.job_loc.geojson)
poly = [poly]
print(poly)
[{'type': 'Polygon', 'coordinates': [[[0.906372070312503, 52.32023207609735], [0.823974609375, 52.10819209746323], [1.170043945312496, 52.14191683166823], [1.170043945312496, 52.31351619974807], [0.906372070312503, 52.32023207609735]]]}]

# Mask Raster
masked_raster = rasterio.mask.mask(file, poly, crop=True, filled=True, invert=False)
print(masked_raster)
(array([[[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
...,
[ 0, 12, 12, ..., 0, 0, 0],
[ 0, 12, 12, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0]]], dtype=uint8), Affine(0.0012806021160224688, 0.0, 0.8235730268502728,
0.0, -0.0012806021160261063, 52.32090505556345))

I now need to save the object masked_raster to disk. I have tried with the following code:

with rasterio.open('/data/my_output.tif', 'w', 
driver='GTiff',
height=masked_raster[0].shape[1],
width=masked_raster[0].shape[2],
count=1,
dtype=rasterio.ubyte,
crs=file.crs,
transform= # Not sure what this should map to!
) as outfile:
outfile.write(x, indexes=1)

But I get the error:

 File "rasterio/_io.pyx", line 1338, in rasterio._io.DatasetWriterBase.write
rasterio.errors.InvalidArrayError: Positional argument arr must be an array-like object

I'm clearly passing the wrong data from masked_raster and I don't understand what I'm supposed to input for the transform argument.
Any help would be gratefully received!

 


Re: Using rasterio to mask/crop a raster results in AttributeError

Simon Tarr
 

Thank you. I used the below to remove the quotes:

import ast
poly = ast.literal_eval(obj.job_loc.geojson)


Re: Using rasterio to mask/crop a raster results in AttributeError

Yves Moisan
 

Hi,


You have a len = 1 list the only element of which is a string.  That's what the error message tells you, essentially.  Get rid of the quotes and you'l have a one-item list the which is your dict :


 var2 = [{ "type": "Polygon", "coordinates": [ [ [ 0.906372070312503, 52.320232076097348 ], [ 0.823974609375, 52.108192097463231 ], [ 1.170043945312496, 52.141916831668233 ], [ 1.170043945312496, 52.313516199748072 ], [ 0.906372070312503, 52.320232076097348 ] ] ] }]
>>> type(var2)
<class 'list'>
>>> type(var2[0])
<class 'dict'>

Cheers,

I'm trying to use rasterio to mask a raster. According to the mask documentation, I need to load a raster file in 'r' mode, which I have done like so:

file = rasterio.open('/data/corine2018_100m/test.tif')
print(file)
<open DatasetReader name='/data/corine2018_100m/test.tif' mode='r'>

I also need a shapes which is an iterable GeoJSON-like object. I have GeoJSON serialised a Polygon object from a Django PolygonField() and added it to a list (my interpretation of the documentation's 'iterable object'):

obj = newJob.objects.create(job_loc="SRID=4326;POLYGON ((0.9063720703125029 52.32023207609735, 0.8239746093749998 52.10819209746323, 1.170043945312496 52.14191683166823, 1.170043945312496 52.31351619974807, 0.9063720703125029 52.32023207609735))")

poly = [obj.job_loc.geojson]
print(poly)
['{ "type": "Polygon", "coordinates": [ [ [ 0.906372070312503, 52.320232076097348 ], [ 0.823974609375, 52.108192097463231 ], [ 1.170043945312496, 52.141916831668233 ], [ 1.170043945312496, 52.313516199748072 ], [ 0.906372070312503, 52.320232076097348 ] ] ] }']

print(type(poly))
<class 'list'>

I then attempt the mask with:

masked_band, masked_transform = rasterio.mask.mask(file, poly, crop=True)

However, I get the error:

AttributeError: 'str' object has no attribute 'get'

I have tried following this (i.e. adding my JSON to a list) but I still get the error. Can someone help me mask this raster?



Using rasterio to mask/crop a raster results in AttributeError

Simon Tarr
 

I'm trying to use rasterio to mask a raster. According to the mask documentation, I need to load a raster file in 'r' mode, which I have done like so:

file = rasterio.open('/data/corine2018_100m/test.tif')
print(file)
<open DatasetReader name='/data/corine2018_100m/test.tif' mode='r'>

I also need a shapes which is an iterable GeoJSON-like object. I have GeoJSON serialised a Polygon object from a Django PolygonField() and added it to a list (my interpretation of the documentation's 'iterable object'):

obj = newJob.objects.create(job_loc="SRID=4326;POLYGON ((0.9063720703125029 52.32023207609735, 0.8239746093749998 52.10819209746323, 1.170043945312496 52.14191683166823, 1.170043945312496 52.31351619974807, 0.9063720703125029 52.32023207609735))")

poly = [obj.job_loc.geojson]
print(poly)
['{ "type": "Polygon", "coordinates": [ [ [ 0.906372070312503, 52.320232076097348 ], [ 0.823974609375, 52.108192097463231 ], [ 1.170043945312496, 52.141916831668233 ], [ 1.170043945312496, 52.313516199748072 ], [ 0.906372070312503, 52.320232076097348 ] ] ] }']

print(type(poly))
<class 'list'>

I then attempt the mask with:

masked_band, masked_transform = rasterio.mask.mask(file, poly, crop=True)

However, I get the error:

AttributeError: 'str' object has no attribute 'get'

I have tried following this (i.e. adding my JSON to a list) but I still get the error. Can someone help me mask this raster?



Is it possible to build a .vrt file from multiple files with Rasterio?

Pierrick Rambaud
 

I would like to build a vrt file from multiple dataset. I know the gdalbuildvrt but i don't find how I'm supposed to do the same using rasterio vrt object.

Does anyone have a simple example ?

Is it even possible ? 

 

My use case is very simple, when I export a big image from GEE it's tiled in smaller .tif. instead of merging them all and get a 10 Gb .tif, I would like to create a .vrt to gather them.


Re: Clarification on usage with QGIS

Ari Meyer
 

That's good to hear, Christina -- thanks!
Ari

On Thu, Mar 25, 2021 at 6:59 PM Ratcliff, Christina (A&F, Waite Campus) <christina.ratcliff@...> wrote:
Hi Ari,

Yes, I've also confirmed  my plugin & rasterio runs in QGIS 3.18 without any problems.

I have similar GDAL & GDAL backwards compatibility packages installed and haven't experienced issues with QGIS so it should be fine. I'm not a big user of Grass or SAGA, so can't comment on them.

Christina


Re: Clarification on usage with QGIS

Ratcliff, Christina (A&F, Waite Campus)
 

Hi Ari,

Yes, I've also confirmed  my plugin & rasterio runs in QGIS 3.18 without any problems.

I have similar GDAL & GDAL backwards compatibility packages installed and haven't experienced issues with QGIS so it should be fine. I'm not a big user of Grass or SAGA, so can't comment on them.

Christina

61 - 80 of 828