Date   
Re: Fast X-array rasterization

Philipp Kats
 

So what I am trying to achieve in the end is the same as basic `rasterio.feature.rasterize`, but with more flexible aggregation strategies, like median, mean, percentile and std - currently it supports only `add` and `replace`.

Re: Rasterio 1.0.3.post1

Sean Gillies
 

Even,

Thanks for the pointer!

In configuring my GDAL 2.3.1 builds, pkg-config and openjpeg were found:
checking for pkg-config... (cached) /usr/local/bin/pkg-config
checking pkg-config is at least version 0.25... yes
checking for OPENJPEG... yes
checking for opj_setup_decoder in -lopenjp2... yes

(full logs here:

and I'm able to open GDAL's jp2 test files with the 1.0.3.post1 wheel (for python 3.6 and OSX).

Denis, I think your issue is different than the one I've reported here. What's your operating system and did you install from the sdist or install a binary wheel?

On Mon, Sep 10, 2018 at 12:17 PM Even Rouault <even.rouault@...> wrote:
On lundi 10 septembre 2018 14:29:39 CEST Sean Gillies wrote:
> Thanks for the report, Denis. I had upgraded GDAL to 2.3.1 in these wheel
> builds, and was blaming my inability to access these jp2 files on a lack of
> credentials, but maybe something has changed between Rasterio and JP2 in
> 2.3.0? I'm going to revert to 2.2.4, the version in the 1.0.2 wheels.

Sean,

Your wheels use openjpeg for GDAL JPEG2000 capability, right ? There has been
a change in GDAL ./configure in 2.3.0 related to openjpeg detection. It now
requires pkg-config to be available to detect openjpeg.

Even


--
Sean Gillies

Re: Rasterio 1.0.3.post1

Denis Rykov
 

OS: Ubuntu 16.04.5
Wheel: rasterio-1.0.3.post1-cp35-cp35m-manylinux1_x86_64.whl

Full example:

$ pip install -U rasterio
Collecting rasterio
  Using cached https://files.pythonhosted.org/packages/48/9c/ed4e2b5b3150ee39052bf580de0e67c8c2e5b58eb89833a174c8e8694d5f/rasterio-1.0.3.post1-cp35-cp35m-manylinux1_x86_64.whl
Requirement already satisfied, skipping upgrade: attrs in ./env/lib/python3.5/site-packages (from rasterio) (17.4.0)
Requirement already satisfied, skipping upgrade: click-plugins in ./env/lib/python3.5/site-packages (from rasterio) (1.0.3)
Requirement already satisfied, skipping upgrade: snuggs>=1.4.1 in ./env/lib/python3.5/site-packages (from rasterio) (1.4.1)
Requirement already satisfied, skipping upgrade: cligj in ./env/lib/python3.5/site-packages (from rasterio) (0.4.0)
Requirement already satisfied, skipping upgrade: affine in ./env/lib/python3.5/site-packages (from rasterio) (2.2.0)
Requirement already satisfied, skipping upgrade: numpy in ./env/lib/python3.5/site-packages (from rasterio) (1.14.2)
Requirement already satisfied, skipping upgrade: click>=3.0 in ./env/lib/python3.5/site-packages (from click-plugins->rasterio) (6.7)
Requirement already satisfied, skipping upgrade: pyparsing in ./env/lib/python3.5/site-packages (from snuggs>=1.4.1->rasterio) (2.2.0)
Installing collected packages: rasterio
  Found existing installation: rasterio 1.0.2
    Uninstalling rasterio-1.0.2:
      Successfully uninstalled rasterio-1.0.2
Successfully installed rasterio-1.0.3.post1

$ python -c 'from rasterio import open; open("tests/data/raster/rgb.jp2")' Traceback (most recent call last): File "rasterio/_base.pyx", line 199, in rasterio._base.DatasetBase.__init__ File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset File "rasterio/_err.pyx", line 188, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/rykov/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/env.py", line 403, in wrapper return f(*args, **kwds) File "/home/rykov/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/__init__.py", line 217, in open s = DatasetReader(path, driver=driver, **kwargs) File "rasterio/_base.pyx", line 201, in rasterio._base.DatasetBase.__init__ rasterio.errors.RasterioIOError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.

$ gdalinfo tests/data/raster/rgb.jp2 Driver: JP2OpenJPEG/JPEG-2000 driver based on OpenJPEG library Files: tests/data/raster/rgb.jp2 Size is 100, 100 Coordinate System is: PROJCS["WGS 84 / Pseudo-Mercator", GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4326"]], PROJECTION["Mercator_1SP"], PARAMETER["central_meridian",0], PARAMETER["scale_factor",1], PARAMETER["false_easting",0], PARAMETER["false_northing",0], UNIT["metre",1, AUTHORITY["EPSG","9001"]], AXIS["X",EAST], AXIS["Y",NORTH], EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs"], AUTHORITY["EPSG","3857"]] Origin = (-6574807.424977720715106,-4070118.882129065692425) Pixel Size = (76.437028285175984,-76.437028285176893) Image Structure Metadata: INTERLEAVE=PIXEL Corner Coordinates: Upper Left (-6574807.425,-4070118.882) ( 59d 3'45.00"W, 34d18'25.72"S) Lower Left (-6574807.425,-4077762.585) ( 59d 3'45.00"W, 34d21'49.84"S) Upper Right (-6567163.722,-4070118.882) ( 58d59'37.81"W, 34d18'25.72"S) Lower Right (-6567163.722,-4077762.585) ( 58d59'37.81"W, 34d21'49.84"S) Center (-6570985.574,-4073940.734) ( 59d 1'41.40"W, 34d20' 7.79"S) Band 1 Block=100x100 Type=Byte, ColorInterp=Red Overviews: arbitrary Image Structure Metadata: COMPRESSION=JPEG2000 Band 2 Block=100x100 Type=Byte, ColorInterp=Green Overviews: arbitrary Image Structure Metadata: COMPRESSION=JPEG2000 Band 3 Block=100x100 Type=Byte, ColorInterp=Blue Overviews: arbitrary Image Structure Metadata: COMPRESSION=JPEG2000

$ pip install rasterio==1.0.2 Collecting rasterio==1.0.2 Using cached https://files.pythonhosted.org/packages/62/0b/aa07a8a3e5cef361113e6663647d0562c170feee2cf79d70ff79910cdd73/rasterio-1.0.2-cp35-cp35m-manylinux1_x86_64.whl Requirement already satisfied: snuggs>=1.4.1 in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (1.4.1) Requirement already satisfied: click-plugins in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (1.0.3) Requirement already satisfied: attrs in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (17.4.0) Requirement already satisfied: affine in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (2.2.0) Requirement already satisfied: cligj in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (0.4.0) Requirement already satisfied: numpy in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (1.14.2) Requirement already satisfied: pyparsing in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from snuggs>=1.4.1->rasterio==1.0.2) (2.2.0) Requirement already satisfied: click in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from snuggs>=1.4.1->rasterio==1.0.2) (6.7) Installing collected packages: rasterio Found existing installation: rasterio 1.0.3.post1 Uninstalling rasterio-1.0.3.post1: Successfully uninstalled rasterio-1.0.3.post1 Successfully installed rasterio-1.0.2 $ python -c 'from rasterio import open; open("tests/data/raster/rgb.jp2")' $

Re: Fast X-array rasterization

Brendan Ward
 

I'm not entirely sure I follow what you are aiming for here yet.  Are you trying to aggregate across features per pixel to calculate various statistics?  Like, what is the average feature size of all features that intersect this pixel?

Thus if the first pixel has features 1, 2, 3, you want to calculate statistics on an attribute of those features?  And if the second pixel has only features 1 and 3, you'd do the same?  So your 3rd dimension, and your basis for statistics, is the set of features per pixel?

How many features do you have?  As written here, I'd expect the performance to be something like number of features * time to rasterize a single feature PLUS converting the data between formats on each iteration of the loop (i.e., from internal GDAL representation to numpy).  I'd recommend timing rasterization of a single feature and see how many features you have - it might be that 3 hours is perfectly normal for that number of features.

In contrast, when you do all features at once, you are doing the data format conversions only once.

Some ideas:
1) rasterize feature IDs (index w/in gdf) instead of a floating point like value.  This would let rasterize work on an unsigned integer dtype instead of float64 (theoretically resulting in a much smaller memory footprint and less I/O during data format conversion).
This will make it so that you only have to rasterize once, but then can use gdf to create arrays on the fly by looking up feature ID to the attribute you want.

2) rasterize onto the smallest possible array for each feature.  The idea is that you figure out your target array dimensions, then calculate a "window" within that covered by the feature's bounds.  Then rasterize that smaller window; it will be a bit more efficient because you reduce the size of the arrays that need to be converted.  Then you paste the result of rasterization into your target array.  You'll have to keep the transforms aligned so that the windowed rasterization can be directly pasted into the target.  The performance gains (if any) are likely limited to those cases where your features individually only cover small areas of the target.

3) do some profiling / timing of individual operations - see if you can find those things that are taking the most time.  I'd be interested in seeing where your code is spending most of its time.

Re: Rasterio 1.0.3.post1

Sean Gillies
 

Hi Denis,

I discovered that I had to patch GDAL 2.3.1 to build with OpenJPEG for the ancient manylinux1 platform. I've uploaded new wheels with a "-1" build tag to https://pypi.org/manage/project/rasterio/release/1.0.3.post1/. I didn't know about build tags – I won't make post-releases to fix wheel build problems anymore.

On Mon, Sep 10, 2018 at 1:10 AM Denis Rykov <rykovd@...> wrote:
Hi Sean!

After updating rasterio I get the following error:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
rasterio/_shim.pyx in rasterio._shim.open_dataset()
 
rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()
 
CPLE_OpenFailedError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.
 
During handling of the above exception, another exception occurred:
 
RasterioIOError                           Traceback (most recent call last)
<ipython-input-2-19e7b16f33e4> in <module>()
----> 1 open('tests/data/raster/rgb.jp2')
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    401             else:
    402                 pass
--> 403             return f(*args, **kwds)
    404 
    405     return wrapper
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    215         # None.
    216         if mode == 'r':
--> 217             s = DatasetReader(path, driver=driver, **kwargs)
    218         elif mode == 'r+':
    219             s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)
 
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
RasterioIOError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.

All works without any problems with rasterio 1.0.2:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
Out[2]: <open DatasetReader name='tests/data/raster/rgb.jp2' mode='r'>



--
Sean Gillies

Re: Rasterio 1.0.3.post1

Denis Rykov
 

Thank you very much, Sean!

Re: Fast X-array rasterization

Philipp Kats
 

Hi Brendan,

Thanks! 
Indeed, that what I am trying to achieve. and your hint to ints is great! In fact, I could boil it down to store binary masks - feature id would be stored in the Z dimension itself, and all pixels on one layer will likely have the same shared feature-based value after all. Moreover, this way I could calculate this 3d shape once - and then lookup values and perform computations on this cube for any feature's property I want. So this will definitely boost my computations by a lot.

The second one also sounds very legit but sounds a little trickier to implement - we'll see. 

Still, I wonder if it would make sense to (somehow) use gdal_rasterize bands here...

On Mon, Sep 10, 2018 at 11:06 PM Brendan Ward <brendan.charles.ward@...> wrote:
I'm not entirely sure I follow what you are aiming for here yet.  Are you trying to aggregate across features per pixel to calculate various statistics?  Like, what is the average feature size of all features that intersect this pixel?

Thus if the first pixel has features 1, 2, 3, you want to calculate statistics on an attribute of those features?  And if the second pixel has only features 1 and 3, you'd do the same?  So your 3rd dimension, and your basis for statistics, is the set of features per pixel?

How many features do you have?  As written here, I'd expect the performance to be something like number of features * time to rasterize a single feature PLUS converting the data between formats on each iteration of the loop (i.e., from internal GDAL representation to numpy).  I'd recommend timing rasterization of a single feature and see how many features you have - it might be that 3 hours is perfectly normal for that number of features.

In contrast, when you do all features at once, you are doing the data format conversions only once.

Some ideas:
1) rasterize feature IDs (index w/in gdf) instead of a floating point like value.  This would let rasterize work on an unsigned integer dtype instead of float64 (theoretically resulting in a much smaller memory footprint and less I/O during data format conversion).
This will make it so that you only have to rasterize once, but then can use gdf to create arrays on the fly by looking up feature ID to the attribute you want.

2) rasterize onto the smallest possible array for each feature.  The idea is that you figure out your target array dimensions, then calculate a "window" within that covered by the feature's bounds.  Then rasterize that smaller window; it will be a bit more efficient because you reduce the size of the arrays that need to be converted.  Then you paste the result of rasterization into your target array.  You'll have to keep the transforms aligned so that the windowed rasterization can be directly pasted into the target.  The performance gains (if any) are likely limited to those cases where your features individually only cover small areas of the target.

3) do some profiling / timing of individual operations - see if you can find those things that are taking the most time.  I'd be interested in seeing where your code is spending most of its time.

In-memory multi dataset object

Guy Doulberg
 

Hi
I wonder if I can create in-memory multi-dataset dataset, something like `gdalbuildvrt` but in memory

The use case I have in mind, is that I have a list of urls some of them construct the same band and some construct different band and I want to treat them as a single dataset

Theoretically, I an produce such dataset by calling gdalbuildvrt with those urls and them open the vrt created.  Is there another way of doing that?

If no, in environment that has only rasterio installed not gdal, how can I use  gdalbuildvrt?

Thanks Guy

Re: In-memory multi dataset object

Sean Gillies
 

Hi Guy,

Rasterio doesn't have first-class support for VRTs. I'm not rushing to do it because there are some VRT features that I frankly think are nuts, like embedding Python code.

However, we can use Python's XML libraries to generate XML texts and Rasterio will happily open the XML as a dataset. See the example at https://github.com/mapbox/rasterio/blob/master/tests/test_vrt.py#L9-L11. I'm doing this internally to implement Rasterio's boundless raster reads.


On Thu, Sep 13, 2018 at 7:02 AM Guy Doulberg <guyd@...> wrote:
Hi
I wonder if I can create in-memory multi-dataset dataset, something like `gdalbuildvrt` but in memory

The use case I have in mind, is that I have a list of urls some of them construct the same band and some construct different band and I want to treat them as a single dataset

Theoretically, I an produce such dataset by calling gdalbuildvrt with those urls and them open the vrt created.  Is there another way of doing that?

If no, in environment that has only rasterio installed not gdal, how can I use  gdalbuildvrt?

Thanks Guy



--
Sean Gillies

Re: In-memory multi dataset object

Guy Doulberg
 

Thanks, 

בתאריך יום ה׳, 13 בספט׳ 2018, 18:59, מאת Sean Gillies ‏<sean.gillies@...>:

Hi Guy,

Rasterio doesn't have first-class support for VRTs. I'm not rushing to do it because there are some VRT features that I frankly think are nuts, like embedding Python code.

However, we can use Python's XML libraries to generate XML texts and Rasterio will happily open the XML as a dataset. See the example at https://github.com/mapbox/rasterio/blob/master/tests/test_vrt.py#L9-L11. I'm doing this internally to implement Rasterio's boundless raster reads.


On Thu, Sep 13, 2018 at 7:02 AM Guy Doulberg <guyd@...> wrote:
Hi
I wonder if I can create in-memory multi-dataset dataset, something like `gdalbuildvrt` but in memory

The use case I have in mind, is that I have a list of urls some of them construct the same band and some construct different band and I want to treat them as a single dataset

Theoretically, I an produce such dataset by calling gdalbuildvrt with those urls and them open the vrt created.  Is there another way of doing that?

If no, in environment that has only rasterio installed not gdal, how can I use  gdalbuildvrt?

Thanks Guy


--
Sean Gillies

Rasterio 1.0.5

Sean Gillies
 

Hi all,

Rasterio 1.0.5 is on PyPI now. It fixes 3 bugs and provides a source distribution that will be compatible with Python 3.7. From the change log:
Bug fixes:

- The fill value for boundless reads was ignored in Rasterio versions 1-1.0.4
  but now applies (#1471).
- An invalid shortcut has been eliminated and Rasterio now produces a proper
  mask in the boundless masked read case (#1449).
- Loss of a row or column in geometry_window() and mask() has been fixed
  (#1472).
--
Sean Gillies

Rasterio 1.0.6

Sean Gillies
 

Hi all,

Rasterio 1.0.6 is on PyPI now with 3 significant bug fixes:
- If the build_overviews method of a dataset is passed a list of factors that specify more than one 1x1 pixel overview (#1333), Rasterio raises an exception.
- Calling calculate_default_transform for large extents should no longer result in the out of memory error reported in #1131. The rio-warp command should also now run more quickly and with a smaller memory footprint.
- We have a more general fix for the problem of filling the empty regions of boundless reads (#1471).
Thanks for the excellent issue reports and debugging help!

--
Sean Gillies

Click 7.0 breaks Rasterio

Sean Gillies
 

Hi all,

Heads up: click 7.0 was released to PyPI yesterday and rasterio is not compatible with it. If you pin `click==6.7` in your applications, you'll be fine. The change log entry for 7 is looong and I don't immediately see the problem: https://github.com/pallets/click/blob/master/CHANGES.rst#version-70.

--
Sean Gillies

Re: Click 7.0 breaks Rasterio

Alan Snow
 

I bet the subcommand remame from underscore to dash will break a lot.

Re: Click 7.0 breaks Rasterio

Sean Gillies
 

It turns out that the culprit is the cligj package. I'm uploading a new release to PyPI now.

On Wed, Sep 26, 2018 at 9:31 AM Alan Snow <alansnow21@...> wrote:
I bet the subcommand remame from underscore to dash will break a lot.



--
Sean Gillies

Please upgrade cligj to 0.5

Sean Gillies
 

Hi all,

A rasterio dependency, cligj 0.4.0, is not compatible with click 7 (released yesterday) and if click has been upgraded in your environment you will notice that the rio commands are broken. Upgrading cligj to 0.5.0 will solve the problem and maintain compatibility with older versions of click as well.


The next rasterio release (1.0.7) will require cligj 0.5.0.

--
Sean Gillies

Rasterio 1.0.7

Sean Gillies
 

Hi all,

Rasterio 1.0.7 is compatible with click versions >= 4 and < 8 (not released yet, of course) including the new click version 7.0, and fixes a couple other small bugs as well. Please upgrade when you can.

--
Sean Gillies

Rasterio 1.0.3 and Python 3.4

Denis Rykov
 
Edited

Hello folks,

I have problems with installation rasterio==1.0.3 and higher on Python 3.4. There are no problems with 3.5 and 3.6.

Is it an expected behavior?

$ sudo docker run --name python34 -it python:3.4 bash
root@e306b109d4c0:/# pip install numpy
Collecting numpy
  Downloading https://files.pythonhosted.org/packages/14/1c/546724245c8b3aad39d807a0bed14a37b39943860c6b34456a363076c65b/numpy-1.15.2-cp34-cp34m-manylinux1_x86_64.whl (13.8MB)
    100% |████████████████████████████████| 13.8MB 3.1MB/s 
Installing collected packages: numpy
Successfully installed numpy-1.15.2
root@e306b109d4c0:/# pip install rasterio
Collecting rasterio
  Downloading https://files.pythonhosted.org/packages/ce/e3/86ef71d178887f2e365763607f974d5a70be9fbd422e4378bd8c16b8c306/rasterio-1.0.7.tar.gz (1.8MB)
    100% |████████████████████████████████| 1.8MB 8.7MB/s 
    Complete output from command python setup.py egg_info:
    WARNING:root:Failed to get options via gdal-config: [Errno 2] No such file or directory: 'gdal-config'
    ERROR: A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-bv7euygw/rasterio/

Re: Rasterio 1.0.3 and Python 3.4

Denis Rykov
 

There is no Linux wheel for Python 3.4 here: https://pypi.org/project/rasterio/#files

Re: Rasterio 1.0.3 and Python 3.4

Sean Gillies
 

Hi Denis,

Since we switched over to making  wheels using the multibuild project we're blocked from releasing *well tested* Python 3.4 linux wheels by (my hypothesis) bugs in trusty's python 3.4.3. The issue is described here: https://github.com/sgillies/rasterio-wheels/issues/5.

I'm short on time to solve this problem, but there are 2 solid workarounds:

1) build and install rasterio from source `GDAL_CONFIG=path/to/gdal-config pip install rasterio` (requires build-essential, gdal-dev, other system packages).
2) switch to Python 3.5 or 3.6 from the deadsnakes PPA and then `pip install rasterio` to get a wheel for one of those versions.


On Fri, Sep 28, 2018 at 9:38 AM Denis Rykov <rykovd@...> wrote:
There is no Linux wheel for Python 3.4 here: https://pypi.org/project/rasterio/#files



--
Sean Gillies