Date   

Moving forward on RPC related coordinate transforms

Yann-Sebastien Tremblay-Johnston
 

A few months ago rasterio._transform._rpc_transform was introduced in 1.2.0, along with RPC support in reproject. See below for related pull request:

Thanks for the guidance Sean!

I'm wondering now what is a sensible way forward in terms of introducing the functionality in _rpc_trasform to higher level functions. Currently methods like rasterio.transform.xy and rasterio.transform.rowcol are fairly affine-centric. For example an instance of Affine is required as the first argument to each method (self.transform is supplied for the mixin case used by dataset objects). Since _rpc_transform leverages GDALCreateRPCTransformer and may require additional paramaters it doesn't easily fit into the current function signature of xy and rowcol.

xy and rowcol could be overloaded to compute coordinate transforms using

  1. Affine transformation
  2. GCPs (compute transform from GCPs goto 1.)
  3. RPCs (not currently supported)

I can think of two ways of achieving this. The transform argument of xy and rowcol could be modified to accept Affine or rasterio.rpc.RPC (or list of rasterio.control.GroundControlPoint as a convenience); the transform argument is changed to a keyword argument (and thus optional, partially reversing https://github.com/mapbox/rasterio/pull/829) along with rpcs and gcps keyword arguments and kwargs for options to pass to GDALCreateRPCTransformer. These seem like changes that would need to wait until at least 2.0.

Alternatively, implement alternate lower level-but still user facing-methods e.g. rasterio.transform.rpc_xy and rasterio.transform.rpc_rowcol so that _rpc_transform is exposed but does not interfere with the current coordinate transform methods; implement a more general Transformer object similar to https://github.com/pyproj4/pyproj/blob/6b860dec5db6612328b7adffc0bddb0386101c38/pyproj/transformer.py#L171 such that we can cache the underlying GDAL transformer object.

Thoughts? Sebastien


Re: add `VSINetworkStats*` methods in Rasterio

Alan Snow
 

It will be much simpler to add after this is merged: https://github.com/mapbox/rasterio/pull/2016


Re: add `VSINetworkStats*` methods in Rasterio

vincent.sarago@...
 

Thanks Sean,

I've had a look over the week end and I'm not 100% to understand how to add the API in rasterio.

The gdal API `VSINetworkStatsReset` and `VSINetworkStatsGetAsSerializedJSON` are only available for gdal >= 3.2. Correct me if I'm wrong but I think this means we will need to add a `_shim32.pyx` file and then adapt setup.py. This seems a bit heavy just for `one` method?

I think we will agree that we don't want to add more `_shim` files every time we have a new gdal version ;-)

solutions:
- create a new _shim32.pyx
- backport `VSINetwork**` api in gdal 3.0 and 3.1 


Re: add `VSINetworkStats*` methods in Rasterio

Sean Gillies
 

Salut Vincent!

On Fri, Dec 25, 2020 at 2:43 PM <vincent.sarago@...> wrote:
In the latest GDAL two new method have been added:
https://github.com/OSGeo/gdal/blob/687cfeab298df9524a36846de83ca6b5d8494c6d/gdal/port/cpl_vsi.h#L377-L378

It would be really nice to have access to those directly in rasterio but i'm not sure how the implementation will look like.

```
with rasterio.network() as log:  # gdal.NetworkStatsReset() on init or delete?
    # operation

    stats = log.data # gdal.NetworkStatsGetAsSerializedJSON() 
```

Yes, I think this is the form it should take. Like pytest's caplog: https://docs.pytest.org/en/stable/logging.html#caplog-fixture.

In a conversation with Even, I suggested that GDAL should just log more details about network requests and leave analysis up to us, but he felt that collecting and organizing the data would be a win for users, and I think that's a good call.

--
Sean Gillies


add `VSINetworkStats*` methods in Rasterio

vincent.sarago@...
 

In the latest GDAL two new method have been added:
https://github.com/OSGeo/gdal/blob/687cfeab298df9524a36846de83ca6b5d8494c6d/gdal/port/cpl_vsi.h#L377-L378

It would be really nice to have access to those directly in rasterio but i'm not sure how the implementation will look like.

```
with rasterio.network() as log:  # gdal.NetworkStatsReset() on init or delete?
    # operation

    stats = log.data # gdal.NetworkStatsGetAsSerializedJSON() 
```


Re: Add Co-Register example to Rasterio Advanced Topics page

Sean Gillies
 

Hi,

On Wed, Oct 28, 2020 at 1:13 PM <afinkmiller@...> wrote:
Hi all,

We use rasterio quite a bit to "co-register" and merge multiple rasters to the CRS, extent, and cell size of a reference raster. Raster co-registration is useful in a number of situations such as matching rasters to a Digital Elevation Model (DEM) or matching raster data to ML labels. 

In Rasterio, a basic version of this operation that includes merging more than one input raster can be completed in a few dozen lines of code via WarpedVRT. The example is large enough to make rewriting it unwieldy but small enough to not necessarily warrant a separate library. I'm curious if this is a generic, broadly applicable, and commonly used enough to warrant inclusion at https://rasterio.readthedocs.io/en/latest/topics/. If so, I'd be happy to open a pull request to Rasterio docs.

Thanks!

Sounds good to me. Examples based on running code are the best and I'm eager to see it.

--
Sean Gillies


Add Co-Register example to Rasterio Advanced Topics page

afinkmiller@...
 

Hi all,

We use rasterio quite a bit to "co-register" and merge multiple rasters to the CRS, extent, and cell size of a reference raster. Raster co-registration is useful in a number of situations such as matching rasters to a Digital Elevation Model (DEM) or matching raster data to ML labels. 

In Rasterio, a basic version of this operation that includes merging more than one input raster can be completed in a few dozen lines of code via WarpedVRT. The example is large enough to make rewriting it unwieldy but small enough to not necessarily warrant a separate library. I'm curious if this is a generic, broadly applicable, and commonly used enough to warrant inclusion at https://rasterio.readthedocs.io/en/latest/topics/. If so, I'd be happy to open a pull request to Rasterio docs.

Thanks!


Re: Rasterio dataset

Alan Snow
 

Maybe this issue is more along the lines of what you are looking for: https://github.com/pydata/xarray/issues/4142


Re: Rasterio dataset

adrien.wehrle@...
 

Hi Alan,

thank you, but it seems the option I was talking about is not implemented... Or I missed it?

best,
Adrien


Re: Rasterio dataset

Alan Snow
 

Hi Adrien,

You may be interested in: https://github.com/corteva/rioxarray/

Best,
Alan


Rasterio dataset

adrien.wehrle@...
 

Hi all,

New in the group! I've been thiking about a tool allowing the opening of several files at the same time creating a rasterio dataset in the same idea as e.g. an xarray dataset. Has it already been discussed for implementation? 

best,
Adrien


Re: Rasterio Segmentation Fault in Apache

Angus Dickey
 

Sean,

Thanks for the response. I was playing with the '--no-binary' option and posted another comment before I saw your response. I can confirm in my case it does fix the issue and allows deploying to Apache with mod_ssl enabled.

I read the issue about the conflicting library names and why they are named 'lib-rasterio-hash.so.version'. Definitely related, but a slightly different problem as we don't have control of what Apache is doing  outside of the Python space. Apache (my Apache install anyway) is using openssl 1.1.1 and Rasterio is using openssl 1.0.2u. They are both likely running in the same process and possibly writing incompatible data to the same memory address. Maybe something like this issue in psycopg2 that also uses openssl? Doesn't sound like there is a silver bullet solution from reading through that issue.

Anyway the '--no-binary' workaround is a valid option.

Thanks again,

Angus

 

 

 


Re: Rasterio Segmentation Fault in Apache

Sean Gillies
 

Hi Angus,

On Wed, Aug 26, 2020 at 11:09 AM <angus@...> wrote:

Hi,

I am porting an existing Flask application from the GDAL python bindings to Rasterio and have found some interesting behaviour. Rasterio works great and I really like the API, the GDAL Python API is nice too but the way it works always looks a little out of place in Python code. Anyway, the Rasterio/Flask app works in the Flask development server but when it is deployed to Apache (with mod_wsgi) it causes Apache to segfault and die (the GDAL version works fine in the same environment):

 AH00051: child pid 21925 exit signal Segmentation fault (11), possible coredump in /etc/apache2

I am using Ubuntu 18.04, Python 3.6.9, Rasterio 1.1.5, Apache 2.4.29, & mod_wsgi 4.5.17. Rasterio is installed in a venv with the standard pip install Rasterio. I did some digging around and found out it is the libcurl bundled with Rasterio (and used by libgdal) that causes the issue:
 
Thread 4 "apache2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3ff4eb1700 (LWP 7316)]
0x00007f3fde64cdab in ssl3_cleanup_key_block () from target:/var/www/app-name/env/lib/python3.6/site-packages/rasterio/../rasterio.libs/./libcurl-rasterio-ea538880.so.4.4.0

This is only a problem when using SSL with Apache (unfortunately this is a pretty common configuration). I am taking a guess here but could this be because Apache's mod_ssl and Rasterio's libcurl are using conflicting versions of libssl? It is so convenient having manylinux builds like Rasterio has, but could this be the problem?
 
Anybody have a similar issue? Any insight would be appreciated.
 
Thanks,
 
Angus

Thanks for the report! We've had a somewhat related problem before between rasterio and fiona, which is why the libcurl libs are named as they are: https://github.com/mapbox/rasterio/issues/1876#issuecomment-592138915. These libcurl libs statically link openssl 1.0.2u. The build configuration for openssl is here: https://github.com/matthew-brett/multibuild/blob/6b0ddb5281f59d976c8026c082c9d73faf274790/library_builders.sh#L320-L328 and the config for curl is here: https://github.com/matthew-brett/multibuild/blob/6b0ddb5281f59d976c8026c082c9d73faf274790/library_builders.sh#L287-L304.

One workaround is to avoid the manylinux wheels and install rasterio like `pip install --no-binary rasterio rasterio`.
 
--
Sean Gillies


Re: Rasterio Segmentation Fault in Apache

Angus Dickey
 

Now that I think about it this probably should have been posted to the general list as it is not explicitly dev related, sorry about that.

If anybody is experiencing an issue like this in the future one workaround is to force pip to build the wheel instead of using the manylinux binary wheel:

pip install rasterio --no-binary

Another workaround is to disable Apache's mod_ssl and use a load balancer, proxy, or some other methods to get SSL.

Angus


Rasterio Segmentation Fault in Apache

Angus Dickey
 

Hi,

I am porting an existing Flask application from the GDAL python bindings to Rasterio and have found some interesting behaviour. Rasterio works great and I really like the API, the GDAL Python API is nice too but the way it works always looks a little out of place in Python code. Anyway, the Rasterio/Flask app works in the Flask development server but when it is deployed to Apache (with mod_wsgi) it causes Apache to segfault and die (the GDAL version works fine in the same environment):

 AH00051: child pid 21925 exit signal Segmentation fault (11), possible coredump in /etc/apache2

I am using Ubuntu 18.04, Python 3.6.9, Rasterio 1.1.5, Apache 2.4.29, & mod_wsgi 4.5.17. Rasterio is installed in a venv with the standard pip install Rasterio. I did some digging around and found out it is the libcurl bundled with Rasterio (and used by libgdal) that causes the issue:
 
Thread 4 "apache2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3ff4eb1700 (LWP 7316)]
0x00007f3fde64cdab in ssl3_cleanup_key_block () from target:/var/www/app-name/env/lib/python3.6/site-packages/rasterio/../rasterio.libs/./libcurl-rasterio-ea538880.so.4.4.0

This is only a problem when using SSL with Apache (unfortunately this is a pretty common configuration). I am taking a guess here but could this be because Apache's mod_ssl and Rasterio's libcurl are using conflicting versions of libssl? It is so convenient having manylinux builds like Rasterio has, but could this be the problem?
 
Anybody have a similar issue? Any insight would be appreciated.
 
Thanks,
 
Angus
 
 
 
 


Re: Add Reader for direct file access to VSILFILE

Sean Gillies
 

On Wed, Aug 5, 2020 at 7:53 AM <vincent.sarago@...> wrote:
Hi Sean,

The beauty of using VSIFile is that you don't need to care about which lib to use to access the data.

In the example we use a web-hosted file, but what if we need this for a file on S3, on gcp, locally... 

Bytes on S3 and GCP are accessed by HTTP, too. And for local files there is Python's io module.

If a universal abstraction for all files, anywhere, is the best possible design, I suspect it would be built into all the software we use already. But it isn't :)
 

I agree that adding a new abstraction is not perfect. I was looking into adding this directly into `MemoryFile` because it's already using the VSI* functions but it won't be as clean as having a separate abstraction. 

To back up a bit: it looks to me like you want to inspect the structure of TIFFs to make up for a deficiency in GDAL right? This https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py#L196-L208 is where GDAL works around a lack in its own API. GDAL should have API methods to show this metadata. I'm -1 on rasterio having a new class that wouldn't need to exist if we fixed GDAL.

--
Sean Gillies


Re: Add Reader for direct file access to VSILFILE

vincent.sarago@...
 

Hi Sean,

The beauty of using VSIFile is that you don't need to care about which lib to use to access the data.

In the example we use a web-hosted file, but what if we need this for a file on S3, on gcp, locally... 

I agree that adding a new abstraction is not perfect. I was looking into adding this directly into `MemoryFile` because it's already using the VSI* functions but it won't be as clean as having a separate abstraction. 

Vincent 


Re: Add Reader for direct file access to VSILFILE

Sean Gillies
 

Hi Vincent,

On Tue, Aug 4, 2020 at 9:34 AM <vincent.sarago@...> wrote:
ref: https://github.com/cogeotiff/rio-cogeo/issues/151

To resolve the issue above we need to have access to the binary content of the file directly. Sadly rasterio doesn't have this kind of access publicly available (please correct me if I'm wrong). 

If the contributors agree I'd love to add a simple Reader to give this kind of access 

Here is a sketch: 

cdef class VSIFileBase:

def __init__(self, path):
"""
Direct access to file.
 
Parameters
----------
path : str
The filepath to open

"""
cdef VSILFILE *fp = NULL

path = parse_path(path).as_vsi()
self._path = path.encode('utf-8')

self._vsif = VSIFOpenL(self._path, "r")
if self._vsif == NULL:
raise IOError("Failed to openfile.")

self.closed = False

def close(self):
if self._vsif != NULL:
VSIFCloseL(self._vsif)
self._vsif = NULL
self.closed = True

def seek(self, offset, whence=0):
return VSIFSeekL(self._vsif, offset, whence)

def tell(self):
if self._vsif != NULL:
return VSIFTellL(self._vsif)
else:
return 0

def read(self, size):
"""Read size bytes from MemoryFile."""
cdef bytes result
cdef unsigned char *buffer = NULL
cdef vsi_l_offset buffer_len = 0

buffer = <unsigned char *>CPLMalloc(size)

try:
objects_read = VSIFReadL(buffer, 1, size, self._vsif)
result = <bytes>buffer[:objects_read]

finally:
CPLFree(buffer)

return result
Example:

import rasterio
from rasterio.io import VSIFile
import struct
 
    signature = struct.unpack('B' * 4, src_dst.read(4))
    src_dst.seek(8)
    print(src_dst.read(100).decode('LATIN1'))

GDAL_STRUCTURAL_METADATA_SIZE=000140 bytes
LAYOUT=IFDS_BEFORE_DATA
BLOCK_ORDER=ROW_MAJOR
BLOCK_LEADE
Since we can make ranged requests using Python HTTP clients (and this is what GDAL's vsicurl handler uses, too), wouldn't it be more direct to do the following?

>>> from urllib3 import PoolManager
>>> http = PoolManager()
>>> resp = http.request("GET", "https://prod-is-usgs-sb-prod-publish.s3.amazonaws.com/5e7d36c1e4b01d5092751e09/Whiskeytown_2019-06-03_DSM_25cm_hll.tif", headers={"Range": "bytes=8-108"})
>>> resp
<urllib3.response.HTTPResponse object at 0x1069b3ef0>
>>> resp.status
206
>>> resp.data
b'GDAL_STRUCTURAL_METADATA_SIZE=000140 bytes\nLAYOUT=IFDS_BEFORE_DATA\nBLOCK_ORDER=ROW_MAJOR\nBLOCK_LEADER'

I'm hesitant to add a new abstraction to rasterio when existing ones will do. MemoryFile is a unique case because there wasn't an existing way to reach formatted GDAL datasets in memory. I don't see that we need a VSIFIle as much as MemoryFile.

--
Sean Gillies


Add Reader for direct file access to VSILFILE

vincent.sarago@...
 

ref: https://github.com/cogeotiff/rio-cogeo/issues/151

To resolve the issue above we need to have access to the binary content of the file directly. Sadly rasterio doesn't have this kind of access publicly available (please correct me if I'm wrong). 

If the contributors agree I'd love to add a simple Reader to give this kind of access 

Here is a sketch: 

cdef class VSIFileBase:

def __init__(self, path):
"""
Direct access to file.
 
Parameters
----------
path : str
The filepath to open

"""
cdef VSILFILE *fp = NULL

path = parse_path(path).as_vsi()
self._path = path.encode('utf-8')

self._vsif = VSIFOpenL(self._path, "r")
if self._vsif == NULL:
raise IOError("Failed to openfile.")

self.closed = False

def close(self):
if self._vsif != NULL:
VSIFCloseL(self._vsif)
self._vsif = NULL
self.closed = True

def seek(self, offset, whence=0):
return VSIFSeekL(self._vsif, offset, whence)

def tell(self):
if self._vsif != NULL:
return VSIFTellL(self._vsif)
else:
return 0

def read(self, size):
"""Read size bytes from MemoryFile."""
cdef bytes result
cdef unsigned char *buffer = NULL
cdef vsi_l_offset buffer_len = 0

buffer = <unsigned char *>CPLMalloc(size)

try:
objects_read = VSIFReadL(buffer, 1, size, self._vsif)
result = <bytes>buffer[:objects_read]

finally:
CPLFree(buffer)

return result
Example:

import rasterio
from rasterio.io import VSIFile
import struct
 
with VSIFile("https://prod-is-usgs-sb-prod-publish.s3.amazonaws.com/5e7d36c1e4b01d5092751e09/Whiskeytown_2019-06-03_DSM_25cm_hll.tif") as src_dst:
    signature = struct.unpack('B' * 4, src_dst.read(4))
    src_dst.seek(8)
    print(src_dst.read(100).decode('LATIN1'))

GDAL_STRUCTURAL_METADATA_SIZE=000140 bytes
LAYOUT=IFDS_BEFORE_DATA
BLOCK_ORDER=ROW_MAJOR
BLOCK_LEADE


Re: Adding resampling option to rasterio.merge

Sean Gillies
 

Hi Guillaume,

I'm so sorry I lost track of this email. Your proposal sounds fine to me.

Best wishes,


On Tue, Apr 14, 2020 at 10:49 AM Guillaume Lostis <g.lostis@...> wrote:

Hi,

I am using rasterio.merge to merge several datasets and downsample them in a single call thanks to the res argument of merge().

However I would like to control the resampling method that is being used in the call to read(), by exposing a resampling argument in the merge() function, which would default to Resampling.nearest.

Would that be fine? I prefer asking here before opening a PR on Github.

Thanks,

Guillaume Lostis



--
Sean Gillies

1 - 20 of 156