Date   

Re: Rasterio.features.geometry_window error when geojson has GeometryCollection

filthyhands@...
 

Thanks again Brendan, very helpful. 


Re: Rasterio.features.geometry_window error when geojson has GeometryCollection

Brendan Ward
 

The target release date that will include this fix is 5/1/2020 (https://github.com/mapbox/rasterio/issues/1911)

If you need changes sooner, you have a few options:
1) clone the repo and checkout the maint-1.1 branch, and rebuild the Cython files.  This can be a bit involved depending on your OS / env; you'll need a compiler, GDAL, Cython, etc.  (https://github.com/mapbox/rasterio/blob/master/CONTRIBUTING.rst#development-environment)

2) coerce each GeometryCollection to FeatureCollection

3) use shapely to union each GeometryCollection into a single geometry object before passing into bounds()

4) calculate bounds on each individual geometry within a GeometryCollection, then union the bounds.  Here's how it was done in the Cython file (https://github.com/mapbox/rasterio/pull/1915/files#diff-83bc1acea1f8d435fd6c4798a58fc072R411-R426); you can do the same thing on your end.


Re: MemoryFile loses Profile information

ciaran.evans@...
 

I think the problem is that I'm trying to do the conversion in memory, not on disk. If I've read your latest reply correctly, I'd convert to JPEG2000 on disk and then copy to S3?

Can I do the conversion via rasterio.shutil.copy in memory?


Re: Rasterio.features.geometry_window error when geojson has GeometryCollection

filthyhands@...
 

How can I patch my install of Rasterio to incorporate your changes? 


Re: MemoryFile loses Profile information

Sean Gillies
 

Hi,

On Wed, Apr 29, 2020 at 9:40 AM Guillaume Lostis <g.lostis@...> wrote:

Like Ciaran I'm not very familiar with GDAL's python bindings, but I've tried running a little snippet which I believe does the same thing as what was done with rasterio: it writes a JP2 to a vsimem in-memory file, and then writes the bytes to a file on disk.

from osgeo import gdal

mem, out = "/vsimem/B01.jp2", "new_B01.jp2"
gdal.Translate(mem, "B01.jp2")

# Taken from https://lists.osgeo.org/pipermail/gdal-dev/2016-August/045030.html
f = gdal.VSIFOpenL(mem, "rb")
gdal.VSIFSeekL(f, 0, 2)  # seek to end
size = gdal.VSIFTellL(f)
gdal.VSIFSeekL(f, 0, 0)  # seek to beginning
data = gdal.VSIFReadL(1, size, f)
gdal.Unlink(mem)

with open(out, "wb") as ff:
    ff.write(data)

When I look at the gdalinfo of new_B01.jp2, it has a CRS and a transform, so I am not able to reproduce the behavior we have with rasterio.

(Side note: Interestingly enough, the pixel values of new_B01.jp2 have slightly changed with respect to the original B01.jp2 file, so there is some data loss somewhere in the process. But maybe that is expected of the JP2 format and could be avoided by passing extra arguments to gdal.Translate? I mainly have experience with writing GeoTIFFs, writing JP2s is new to me)

Guillaume

I've only now remembered (not being a regular jp2 user) that JPEG2000 is a create-copy format (see https://gdal.org/drivers/raster/index.html) and as such is not suited for uses cases like

    with rasterio.open("file.jp2", "w", driver="JP2OpenJPEG", ...) as dataset:
        dataset.write(data)

Rasterio tries to abstract over the differences between "create" and "create-copy" formats, but might falter in some cases. We're testing JPEG and PNG in the test suite, but not JPEG2000. If you're constructing a dataset from the bands of multiple other datasets, you should probably be using GeoTIFF as a format, it's well suited for this. And then convert to JPEG2000 before uploading to S3 using rasterio.shutil.copy, which calls on GDAL's GDALCreateCopy and is mostly equivalent to gdal_translate.

--
Sean Gillies


Re: Rasterio.features.geometry_window error when geojson has GeometryCollection

filthyhands@...
 

That's great Brendan, much appreciated. 


Re: MemoryFile loses Profile information

Even Rouault
 

On mercredi 29 avril 2020 08:40:07 CEST Guillaume Lostis wrote:

> Like Ciaran I'm not very familiar with GDAL's python bindings, but I've

> tried running a little snippet which I believe does the same thing as what

> was done with `rasterio`: it writes a JP2 to a `vsimem` in-memory file, and

> then writes the bytes to a file on disk.

>

> ```python

> from osgeo import gdal

>

> mem, out = "/vsimem/B01.jp2", "new_B01.jp2"

> gdal.Translate(mem, "B01.jp2")

>

> # Taken from

> https://lists.osgeo.org/pipermail/gdal-dev/2016-August/045030.html f =

> gdal.VSIFOpenL(mem, "rb")

> gdal.VSIFSeekL(f, 0, 2) # seek to end

> size = gdal.VSIFTellL(f)

> gdal.VSIFSeekL(f, 0, 0) # seek to beginning

> data = gdal.VSIFReadL(1, size, f)

> gdal.Unlink(mem)

>

> with open(out, "wb") as ff:

> ff.write(data)

> ```

>

> When I look at the `gdalinfo` of `new_B01.jp2`, it has a CRS and a

> transform, so I am not able to reproduce the behavior we have with

> `rasterio`.

>

> (Side note: Interestingly enough, the pixel values of `new_B01.jp2` have

> slightly changed with respect to the original `B01.jp2` file, so there is

> some data loss somewhere in the process. But maybe that is expected of the

> JP2 format and could be avoided by passing extra arguments to

> `gdal.Translate`?

 

By default, the JP2OpenJPEG driver uses a lossy compression (my personal take on JPEG2000 is that it has very limited interested when used in its lossless profile. Better use more conventional algorithms that are way faster).

See https://gdal.org/drivers/raster/jp2openjpeg.html#lossless-compression for options to pass to get a lossless JPEG2000 as output.

 

So all in all that should be something like

 

gdal.Translate(mem, "B01.jp2",

format = "JP2OpenJPEG", # in case you have several JP2K drivers available

options = ["REVERSIBLE=YES", "QUALITY=100"])

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


Re: Rasterio.features.geometry_window error when geojson has GeometryCollection

Brendan Ward
 

This looks like an oversight when we implemented bounds().

Rasterio currently supports FeatureCollection or individual geometry objects in this operation.

Logged here: https://github.com/mapbox/rasterio/issues/1914

Working on a quick fix now...


Re: MemoryFile loses Profile information

Guillaume Lostis
 

Like Ciaran I'm not very familiar with GDAL's python bindings, but I've tried running a little snippet which I believe does the same thing as what was done with rasterio: it writes a JP2 to a vsimem in-memory file, and then writes the bytes to a file on disk.

from osgeo import gdal

mem, out = "/vsimem/B01.jp2", "new_B01.jp2"
gdal.Translate(mem, "B01.jp2")

# Taken from https://lists.osgeo.org/pipermail/gdal-dev/2016-August/045030.html
f = gdal.VSIFOpenL(mem, "rb")
gdal.VSIFSeekL(f, 0, 2)  # seek to end
size = gdal.VSIFTellL(f)
gdal.VSIFSeekL(f, 0, 0)  # seek to beginning
data = gdal.VSIFReadL(1, size, f)
gdal.Unlink(mem)

with open(out, "wb") as ff:
    ff.write(data)

When I look at the gdalinfo of new_B01.jp2, it has a CRS and a transform, so I am not able to reproduce the behavior we have with rasterio.

(Side note: Interestingly enough, the pixel values of new_B01.jp2 have slightly changed with respect to the original B01.jp2 file, so there is some data loss somewhere in the process. But maybe that is expected of the JP2 format and could be avoided by passing extra arguments to gdal.Translate? I mainly have experience with writing GeoTIFFs, writing JP2s is new to me)

Guillaume


Re: MemoryFile loses Profile information

ciaran.evans@...
 

Would you say it's worth raising an issue on GDAL then?

I'm also not familiar with the Python GDAL bindings, Rasterio is my window in that haha.


Re: Read using multithreading

Sean Gillies
 

Hi,

On Wed, Apr 29, 2020 at 2:54 AM Carlos García Rodríguez <carlogarro@...> wrote:

Hello, I have an application that calls rasterio to open rasters in its main process. This application allows me to parallelize the process, so it will call the function where rasterio open is many times in parallel.
If I open and close the image every time by doing

with rasterio.open(self.file) as src:
    raster = src.read(window=window)

The process goes fine, but it has to open and close at every step of the raster (losing a bit of time). Then I tried to let the image opened at the beginning of the process but it is not stable. If I don't have too many workers, the application might work well, but if I increase the number of workers it will crash with the following error:

RasterioIOError: Caught RasterioIOError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "rasterio/_io.pyx", line 707, in rasterio._io.DatasetReaderBase._read
File "rasterio/shim_rasterioex.pxi", line 133, in rasterio._shim.io_multi_band
File "rasterio/_err.pyx", line 182, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: ./sentinel2_tiled.tif, band 1: IReadBlock failed at X offset 58, Y offset 92: TIFFReadEncodedTile() failed.

What do you recommend me to do?

Thank you.

Dataset files can be accessed from multiple threads, but the dataset objects returned by rasterio.open can only be used by a single thread at a time. This is a constraint that we get from GDAL. You could try creating a set of dataset objects and then distribute them to the worker threads, making sure not to share them.

--
Sean Gillies


Re: MemoryFile loses Profile information

Sean Gillies
 



On Wed, Apr 29, 2020 at 1:38 AM <ciaran.evans@...> wrote:

As with Guillaume's answer, it's a Sentinel 2 band and has no side-car file, when I write straight to a file, the georeferencing is maintained.

Since there's no JP2 specific code in MemoryFile and things work with the GeoTIFF driver, I suspect there's a subtle bug involving the vsimem system and one of the JP2 drivers. It would be useful to try to reproduce this with GDAL's Python bindings, but I don't have them installed and won't be able to try that.

--
Sean Gillies


Rasterio.features.geometry_window error when geojson has GeometryCollection

filthyhands@...
 



I'm attempting to isolate areas in a rasterio raster object, using Fiona to open a geojson and then window the parts I want to look at. 

Where a geojson entry is a geometry collection rasterio.features.geometry_window throws a key error. It's expecting each entry in the geojson to hold a 'coordinates' key but in this geojson the 'coordinates' are inside a list a level below 'geometryCollection'

A workaround could be to dismantle the geometry collections into separate objects. It would work but it's not ideal as the collections are there to group various styles of object together.

Does anyone have experience with this issue, or can suggest another solution? My code etc below.

Linestring: 
[{'type': 'LineString',
  'coordinates': [(144.937247997, -26.3759745069999, 0.0),

Geometry Collection: 
{'type': 'GeometryCollection',
  'geometries': [{'type': 'LineString',
    'coordinates': [(145.619783254, -38.5831492689999, 0.0),


My code:
with fiona.open('./Methane gas detection docs/NationalOnshoreGasPipelines.geojson') as fi: shapes = [each['geometry']for each in fi] # isolate a pipeline area pipes = rasterio.features.geometry_window(win, shapes, pad_x = 0.001, pad_y=0.001)



Error:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-183-b9dff7a7db24> in <module>
     10 
     11     # isolate a pipeline area
---> 12     pipes = rasterio.features.geometry_window(win, shapes, pad_x = 0.001, pad_y=0.001)
     13 
     14     # print(shapes)

~\Anaconda3\envs\sat\lib\site-packages\rasterio\features.py in geometry_window(dataset, shapes, pad_x, pad_y, north_up, rotated, pixel_precision)
    419 
    420     if not rotated:
--> 421         all_bounds = [bounds(shape, north_up=north_up) for shape in shapes]
    422         lefts, bottoms, rights, tops = zip(*all_bounds)
    423 

~\Anaconda3\envs\sat\lib\site-packages\rasterio\features.py in <listcomp>(.0)
    419 
    420     if not rotated:
--> 421         all_bounds = [bounds(shape, north_up=north_up) for shape in shapes]
    422         lefts, bottoms, rights, tops = zip(*all_bounds)
    423 

~\Anaconda3\envs\sat\lib\site-packages\rasterio\features.py in bounds(geometry, north_up, transform)
    374 
    375     geom = geometry.get('geometry') or geometry
--> 376     return _bounds(geom, north_up=north_up, transform=transform)
    377 
    378 

rasterio\_features.pyx in rasterio._features._bounds()

KeyError: 'coordinates'


Read using multithreading

Carlos García Rodríguez
 

Hello, I have an application that calls rasterio to open rasters in its main process. This application allows me to parallelize the process, so it will call the function where rasterio open is many times in parallel.
If I open and close the image every time by doing

with rasterio.open(self.file) as src:
    raster = src.read(window=window)

The process goes fine, but it has to open and close at every step of the raster (losing a bit of time). Then I tried to let the image opened at the beginning of the process but it is not stable. If I don't have too many workers, the application might work well, but if I increase the number of workers it will crash with the following error:

RasterioIOError: Caught RasterioIOError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "rasterio/_io.pyx", line 707, in rasterio._io.DatasetReaderBase._read
File "rasterio/shim_rasterioex.pxi", line 133, in rasterio._shim.io_multi_band
File "rasterio/_err.pyx", line 182, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: ./sentinel2_tiled.tif, band 1: IReadBlock failed at X offset 58, Y offset 92: TIFFReadEncodedTile() failed.

What do you recommend me to do?

Thank you.


Re: Speed up reading rasters

Carlos García Rodríguez
 

I have already fixed it. One of the main problems was that the tiled tiff was configured with size 256 and I was reading with windows of sizes 224. From here comes the disparity of the times of reading.
Also, I found faster to read pixel-based instead of band based. But it depends on the application.

Thank you all for your help!


Re: MemoryFile loses Profile information

ciaran.evans@...
 

As with Guillaume's answer, it's a Sentinel 2 band and has no side-car file, when I write straight to a file, the georeferencing is maintained.


Re: MemoryFile loses Profile information

Guillaume Lostis
 

Hi,

I guess the B01.jp2 file mentioned is a Sentinel-2 band, which is standalone, there is no side-car file attached to it.

I have tried downloading a single B01.jp2 file from a SAFE and running Vincent Sarago's snippet on it and I get the same result, the CRS and transform are lost when the jp2 is written through a MemoryFile.

Guillaume Lostis 

On Wed, 29 Apr 2020, 00:24 Sean Gillies via groups.io, <sean=mapbox.com@groups.io> wrote:
Hi,

On Tue, Apr 28, 2020 at 8:02 AM <ciaran.evans@...> wrote:

I tried this with GDAL 2.4.2 and GDAL 3.0 too

If anyone can point me to where I might look further to diagnose this, I can create an issue with some useful info, whether that's in Rasterio/GDAL :)


To confirm: you're right, there's no need to seek after writing using mem.write() because the dataset API doesn't change the MemoryFile's stream position. It remains at 0, the beginning of the file.

Is it possible that your B01.jp2 has auxiliary files? If so, they can be lost because MemoryFile.read() only returns bytes from the primary file and will overlook auxiliaries. For example, see the code below using rasterio's test data.

$ rio insp tests/data/RGB.byte.tif
>>> from rasterio.io import MemoryFile
>>> profile = src.profile
>>> del profile["tiled"]
>>> del profile["interleave"]
>>> profile["driver"] = "PNG"
>>> with MemoryFile(filename="lolwut.png") as memfile:
...     with memfile.open(**profile) as dataset_1:
...         dataset_1.write(src.read())
...         print(dataset_1.files)
...         print(dataset_1.profile)
...     with memfile.open() as dataset_2:
...         print(dataset_2.files)
...         print(dataset_2.profile)
...     with open("/tmp/lolwut.png", "wb") as f:
...         f.write(memfile.read())
...     with rasterio.open("/tmp/lolwut.png") as dataset_3:
...         print(dataset_3.files)
...         print(dataset_3.profile)
...
[]
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0), 'tiled': False}
['/vsimem/lolwut.png', '/vsimem/lolwut.png.aux.xml']
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0), 'tiled': False, 'interleave': 'pixel'}
775558
['/tmp/lolwut.png']
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0), 'tiled': False, 'interleave': 'pixel'}

dataset_1.files is an empty list because no files are written to the /vsimem virtual filesystem until dataset_1 is closed.

dataset_2.files show two files: the primary /vsimem/lolwut.png file and its auxiliary  '/vsimem/lolwut.png.aux.xml' file, containing the georeferencing information.

dataset_3.files shows only one file, the auxiliary file has been lost by memfile.read(). MemoryFile isn't any good for multiple-file formats or multiple-file format variants. It's fine for a profile of GeoTIFFs that keep their georeferencing and masks and overviews in a single file.

--
Sean Gillies


Re: MemoryFile loses Profile information

Sean Gillies
 

Hi,

On Tue, Apr 28, 2020 at 8:02 AM <ciaran.evans@...> wrote:

I tried this with GDAL 2.4.2 and GDAL 3.0 too

If anyone can point me to where I might look further to diagnose this, I can create an issue with some useful info, whether that's in Rasterio/GDAL :)


To confirm: you're right, there's no need to seek after writing using mem.write() because the dataset API doesn't change the MemoryFile's stream position. It remains at 0, the beginning of the file.

Is it possible that your B01.jp2 has auxiliary files? If so, they can be lost because MemoryFile.read() only returns bytes from the primary file and will overlook auxiliaries. For example, see the code below using rasterio's test data.

$ rio insp tests/data/RGB.byte.tif
>>> from rasterio.io import MemoryFile
>>> profile = src.profile
>>> del profile["tiled"]
>>> del profile["interleave"]
>>> profile["driver"] = "PNG"
>>> with MemoryFile(filename="lolwut.png") as memfile:
...     with memfile.open(**profile) as dataset_1:
...         dataset_1.write(src.read())
...         print(dataset_1.files)
...         print(dataset_1.profile)
...     with memfile.open() as dataset_2:
...         print(dataset_2.files)
...         print(dataset_2.profile)
...     with open("/tmp/lolwut.png", "wb") as f:
...         f.write(memfile.read())
...     with rasterio.open("/tmp/lolwut.png") as dataset_3:
...         print(dataset_3.files)
...         print(dataset_3.profile)
...
[]
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0), 'tiled': False}
['/vsimem/lolwut.png', '/vsimem/lolwut.png.aux.xml']
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0), 'tiled': False, 'interleave': 'pixel'}
775558
['/tmp/lolwut.png']
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0), 'tiled': False, 'interleave': 'pixel'}

dataset_1.files is an empty list because no files are written to the /vsimem virtual filesystem until dataset_1 is closed.

dataset_2.files show two files: the primary /vsimem/lolwut.png file and its auxiliary  '/vsimem/lolwut.png.aux.xml' file, containing the georeferencing information.

dataset_3.files shows only one file, the auxiliary file has been lost by memfile.read(). MemoryFile isn't any good for multiple-file formats or multiple-file format variants. It's fine for a profile of GeoTIFFs that keep their georeferencing and masks and overviews in a single file.

--
Sean Gillies


Re: MemoryFile loses Profile information

ciaran.evans@...
 

I tried this with GDAL 2.4.2 and GDAL 3.0 too

If anyone can point me to where I might look further to diagnose this, I can create an issue with some useful info, whether that's in Rasterio/GDAL :)


Re: Speed up reading rasters

Carlos García Rodríguez
 

I would also like to add that the first tiled read is not necessarily slow... 


El lun., 27 abr. 2020 20:06, Carlos García Rodríguez via groups.io <carlogarro=gmail.com@groups.io> escribió:
So, do you think it should be a good idea to increase the cache memory? If so, how to do it? I have plenty of ram so that should not be a problem. On the other side I checked I there is some relation between tiles proximity and time and didn't find it. You can see the position of each tile in the image. 
El lun., 27 abr. 2020 17:20, Sean Gillies <sean.gillies@...> escribió:
Hi,

On Mon, Apr 27, 2020 at 2:36 AM <carlogarro@...> wrote:
Hello, thank you so much for your recommendation, it speed it up x5. Very useful. Now I am having a problem that i do not understand.

I have the following script, where i access 10 random tiles of my raster. train_data is a vector [4822,2] of pixels position in the raster.
for i in range(10):
    idx = np.random.randint(4822)
    x_idx = train_data[idx][1]
    y_idx = train_data[idx][0]
    window = Window(y_idx, x_idx, 224, 224)
    start_time = time.time()
    with rasterio.open('./sentinel2_tiled.tif') as src:
        sentinel2 = src.read(window=window)
    end_time = (time.time() - start_time)

I do not understand why the times of loading a window are so different, as can be seen in the following image. Do you have some explanation?



 Thank you once more!

I can't say for sure about the time differences because I don't know much about your data files or your computer. However, know this: GDAL's I/O system caches blocks of raster data in memory, the size of the cache is generally 5% of your computers memory, and windowed reads may or may not be served directly from the cache depending on their size and adjacency to previously read data.

--
Sean Gillies

221 - 240 of 740