Date   
Re: In-memory multi dataset object

Sean Gillies
 

Hi Guy,

Rasterio doesn't have first-class support for VRTs. I'm not rushing to do it because there are some VRT features that I frankly think are nuts, like embedding Python code.

However, we can use Python's XML libraries to generate XML texts and Rasterio will happily open the XML as a dataset. See the example at https://github.com/mapbox/rasterio/blob/master/tests/test_vrt.py#L9-L11. I'm doing this internally to implement Rasterio's boundless raster reads.


On Thu, Sep 13, 2018 at 7:02 AM Guy Doulberg <guyd@...> wrote:
Hi
I wonder if I can create in-memory multi-dataset dataset, something like `gdalbuildvrt` but in memory

The use case I have in mind, is that I have a list of urls some of them construct the same band and some construct different band and I want to treat them as a single dataset

Theoretically, I an produce such dataset by calling gdalbuildvrt with those urls and them open the vrt created.  Is there another way of doing that?

If no, in environment that has only rasterio installed not gdal, how can I use  gdalbuildvrt?

Thanks Guy



--
Sean Gillies

In-memory multi dataset object

Guy Doulberg
 

Hi
I wonder if I can create in-memory multi-dataset dataset, something like `gdalbuildvrt` but in memory

The use case I have in mind, is that I have a list of urls some of them construct the same band and some construct different band and I want to treat them as a single dataset

Theoretically, I an produce such dataset by calling gdalbuildvrt with those urls and them open the vrt created.  Is there another way of doing that?

If no, in environment that has only rasterio installed not gdal, how can I use  gdalbuildvrt?

Thanks Guy

Re: Fast X-array rasterization

Philipp Kats
 

Hi Brendan,

Thanks! 
Indeed, that what I am trying to achieve. and your hint to ints is great! In fact, I could boil it down to store binary masks - feature id would be stored in the Z dimension itself, and all pixels on one layer will likely have the same shared feature-based value after all. Moreover, this way I could calculate this 3d shape once - and then lookup values and perform computations on this cube for any feature's property I want. So this will definitely boost my computations by a lot.

The second one also sounds very legit but sounds a little trickier to implement - we'll see. 

Still, I wonder if it would make sense to (somehow) use gdal_rasterize bands here...

On Mon, Sep 10, 2018 at 11:06 PM Brendan Ward <brendan.charles.ward@...> wrote:
I'm not entirely sure I follow what you are aiming for here yet.  Are you trying to aggregate across features per pixel to calculate various statistics?  Like, what is the average feature size of all features that intersect this pixel?

Thus if the first pixel has features 1, 2, 3, you want to calculate statistics on an attribute of those features?  And if the second pixel has only features 1 and 3, you'd do the same?  So your 3rd dimension, and your basis for statistics, is the set of features per pixel?

How many features do you have?  As written here, I'd expect the performance to be something like number of features * time to rasterize a single feature PLUS converting the data between formats on each iteration of the loop (i.e., from internal GDAL representation to numpy).  I'd recommend timing rasterization of a single feature and see how many features you have - it might be that 3 hours is perfectly normal for that number of features.

In contrast, when you do all features at once, you are doing the data format conversions only once.

Some ideas:
1) rasterize feature IDs (index w/in gdf) instead of a floating point like value.  This would let rasterize work on an unsigned integer dtype instead of float64 (theoretically resulting in a much smaller memory footprint and less I/O during data format conversion).
This will make it so that you only have to rasterize once, but then can use gdf to create arrays on the fly by looking up feature ID to the attribute you want.

2) rasterize onto the smallest possible array for each feature.  The idea is that you figure out your target array dimensions, then calculate a "window" within that covered by the feature's bounds.  Then rasterize that smaller window; it will be a bit more efficient because you reduce the size of the arrays that need to be converted.  Then you paste the result of rasterization into your target array.  You'll have to keep the transforms aligned so that the windowed rasterization can be directly pasted into the target.  The performance gains (if any) are likely limited to those cases where your features individually only cover small areas of the target.

3) do some profiling / timing of individual operations - see if you can find those things that are taking the most time.  I'd be interested in seeing where your code is spending most of its time.

Re: Rasterio 1.0.3.post1

Denis Rykov
 

Thank you very much, Sean!

Re: Rasterio 1.0.3.post1

Sean Gillies
 

Hi Denis,

I discovered that I had to patch GDAL 2.3.1 to build with OpenJPEG for the ancient manylinux1 platform. I've uploaded new wheels with a "-1" build tag to https://pypi.org/manage/project/rasterio/release/1.0.3.post1/. I didn't know about build tags – I won't make post-releases to fix wheel build problems anymore.

On Mon, Sep 10, 2018 at 1:10 AM Denis Rykov <rykovd@...> wrote:
Hi Sean!

After updating rasterio I get the following error:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
rasterio/_shim.pyx in rasterio._shim.open_dataset()
 
rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()
 
CPLE_OpenFailedError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.
 
During handling of the above exception, another exception occurred:
 
RasterioIOError                           Traceback (most recent call last)
<ipython-input-2-19e7b16f33e4> in <module>()
----> 1 open('tests/data/raster/rgb.jp2')
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    401             else:
    402                 pass
--> 403             return f(*args, **kwds)
    404 
    405     return wrapper
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    215         # None.
    216         if mode == 'r':
--> 217             s = DatasetReader(path, driver=driver, **kwargs)
    218         elif mode == 'r+':
    219             s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)
 
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
RasterioIOError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.

All works without any problems with rasterio 1.0.2:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
Out[2]: <open DatasetReader name='tests/data/raster/rgb.jp2' mode='r'>



--
Sean Gillies

Re: Fast X-array rasterization

Brendan Ward
 

I'm not entirely sure I follow what you are aiming for here yet.  Are you trying to aggregate across features per pixel to calculate various statistics?  Like, what is the average feature size of all features that intersect this pixel?

Thus if the first pixel has features 1, 2, 3, you want to calculate statistics on an attribute of those features?  And if the second pixel has only features 1 and 3, you'd do the same?  So your 3rd dimension, and your basis for statistics, is the set of features per pixel?

How many features do you have?  As written here, I'd expect the performance to be something like number of features * time to rasterize a single feature PLUS converting the data between formats on each iteration of the loop (i.e., from internal GDAL representation to numpy).  I'd recommend timing rasterization of a single feature and see how many features you have - it might be that 3 hours is perfectly normal for that number of features.

In contrast, when you do all features at once, you are doing the data format conversions only once.

Some ideas:
1) rasterize feature IDs (index w/in gdf) instead of a floating point like value.  This would let rasterize work on an unsigned integer dtype instead of float64 (theoretically resulting in a much smaller memory footprint and less I/O during data format conversion).
This will make it so that you only have to rasterize once, but then can use gdf to create arrays on the fly by looking up feature ID to the attribute you want.

2) rasterize onto the smallest possible array for each feature.  The idea is that you figure out your target array dimensions, then calculate a "window" within that covered by the feature's bounds.  Then rasterize that smaller window; it will be a bit more efficient because you reduce the size of the arrays that need to be converted.  Then you paste the result of rasterization into your target array.  You'll have to keep the transforms aligned so that the windowed rasterization can be directly pasted into the target.  The performance gains (if any) are likely limited to those cases where your features individually only cover small areas of the target.

3) do some profiling / timing of individual operations - see if you can find those things that are taking the most time.  I'd be interested in seeing where your code is spending most of its time.

Re: Rasterio 1.0.3.post1

Denis Rykov
 

OS: Ubuntu 16.04.5
Wheel: rasterio-1.0.3.post1-cp35-cp35m-manylinux1_x86_64.whl

Full example:

$ pip install -U rasterio
Collecting rasterio
  Using cached https://files.pythonhosted.org/packages/48/9c/ed4e2b5b3150ee39052bf580de0e67c8c2e5b58eb89833a174c8e8694d5f/rasterio-1.0.3.post1-cp35-cp35m-manylinux1_x86_64.whl
Requirement already satisfied, skipping upgrade: attrs in ./env/lib/python3.5/site-packages (from rasterio) (17.4.0)
Requirement already satisfied, skipping upgrade: click-plugins in ./env/lib/python3.5/site-packages (from rasterio) (1.0.3)
Requirement already satisfied, skipping upgrade: snuggs>=1.4.1 in ./env/lib/python3.5/site-packages (from rasterio) (1.4.1)
Requirement already satisfied, skipping upgrade: cligj in ./env/lib/python3.5/site-packages (from rasterio) (0.4.0)
Requirement already satisfied, skipping upgrade: affine in ./env/lib/python3.5/site-packages (from rasterio) (2.2.0)
Requirement already satisfied, skipping upgrade: numpy in ./env/lib/python3.5/site-packages (from rasterio) (1.14.2)
Requirement already satisfied, skipping upgrade: click>=3.0 in ./env/lib/python3.5/site-packages (from click-plugins->rasterio) (6.7)
Requirement already satisfied, skipping upgrade: pyparsing in ./env/lib/python3.5/site-packages (from snuggs>=1.4.1->rasterio) (2.2.0)
Installing collected packages: rasterio
  Found existing installation: rasterio 1.0.2
    Uninstalling rasterio-1.0.2:
      Successfully uninstalled rasterio-1.0.2
Successfully installed rasterio-1.0.3.post1

$ python -c 'from rasterio import open; open("tests/data/raster/rgb.jp2")' Traceback (most recent call last): File "rasterio/_base.pyx", line 199, in rasterio._base.DatasetBase.__init__ File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset File "rasterio/_err.pyx", line 188, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/rykov/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/env.py", line 403, in wrapper return f(*args, **kwds) File "/home/rykov/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/__init__.py", line 217, in open s = DatasetReader(path, driver=driver, **kwargs) File "rasterio/_base.pyx", line 201, in rasterio._base.DatasetBase.__init__ rasterio.errors.RasterioIOError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.

$ gdalinfo tests/data/raster/rgb.jp2 Driver: JP2OpenJPEG/JPEG-2000 driver based on OpenJPEG library Files: tests/data/raster/rgb.jp2 Size is 100, 100 Coordinate System is: PROJCS["WGS 84 / Pseudo-Mercator", GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4326"]], PROJECTION["Mercator_1SP"], PARAMETER["central_meridian",0], PARAMETER["scale_factor",1], PARAMETER["false_easting",0], PARAMETER["false_northing",0], UNIT["metre",1, AUTHORITY["EPSG","9001"]], AXIS["X",EAST], AXIS["Y",NORTH], EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs"], AUTHORITY["EPSG","3857"]] Origin = (-6574807.424977720715106,-4070118.882129065692425) Pixel Size = (76.437028285175984,-76.437028285176893) Image Structure Metadata: INTERLEAVE=PIXEL Corner Coordinates: Upper Left (-6574807.425,-4070118.882) ( 59d 3'45.00"W, 34d18'25.72"S) Lower Left (-6574807.425,-4077762.585) ( 59d 3'45.00"W, 34d21'49.84"S) Upper Right (-6567163.722,-4070118.882) ( 58d59'37.81"W, 34d18'25.72"S) Lower Right (-6567163.722,-4077762.585) ( 58d59'37.81"W, 34d21'49.84"S) Center (-6570985.574,-4073940.734) ( 59d 1'41.40"W, 34d20' 7.79"S) Band 1 Block=100x100 Type=Byte, ColorInterp=Red Overviews: arbitrary Image Structure Metadata: COMPRESSION=JPEG2000 Band 2 Block=100x100 Type=Byte, ColorInterp=Green Overviews: arbitrary Image Structure Metadata: COMPRESSION=JPEG2000 Band 3 Block=100x100 Type=Byte, ColorInterp=Blue Overviews: arbitrary Image Structure Metadata: COMPRESSION=JPEG2000

$ pip install rasterio==1.0.2 Collecting rasterio==1.0.2 Using cached https://files.pythonhosted.org/packages/62/0b/aa07a8a3e5cef361113e6663647d0562c170feee2cf79d70ff79910cdd73/rasterio-1.0.2-cp35-cp35m-manylinux1_x86_64.whl Requirement already satisfied: snuggs>=1.4.1 in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (1.4.1) Requirement already satisfied: click-plugins in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (1.0.3) Requirement already satisfied: attrs in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (17.4.0) Requirement already satisfied: affine in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (2.2.0) Requirement already satisfied: cligj in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (0.4.0) Requirement already satisfied: numpy in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from rasterio==1.0.2) (1.14.2) Requirement already satisfied: pyparsing in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from snuggs>=1.4.1->rasterio==1.0.2) (2.2.0) Requirement already satisfied: click in /home/rykov/sandbox/telluric/env/lib/python3.5/site-packages (from snuggs>=1.4.1->rasterio==1.0.2) (6.7) Installing collected packages: rasterio Found existing installation: rasterio 1.0.3.post1 Uninstalling rasterio-1.0.3.post1: Successfully uninstalled rasterio-1.0.3.post1 Successfully installed rasterio-1.0.2 $ python -c 'from rasterio import open; open("tests/data/raster/rgb.jp2")' $

Re: Rasterio 1.0.3.post1

Sean Gillies
 

Even,

Thanks for the pointer!

In configuring my GDAL 2.3.1 builds, pkg-config and openjpeg were found:
checking for pkg-config... (cached) /usr/local/bin/pkg-config
checking pkg-config is at least version 0.25... yes
checking for OPENJPEG... yes
checking for opj_setup_decoder in -lopenjp2... yes

(full logs here:

and I'm able to open GDAL's jp2 test files with the 1.0.3.post1 wheel (for python 3.6 and OSX).

Denis, I think your issue is different than the one I've reported here. What's your operating system and did you install from the sdist or install a binary wheel?

On Mon, Sep 10, 2018 at 12:17 PM Even Rouault <even.rouault@...> wrote:
On lundi 10 septembre 2018 14:29:39 CEST Sean Gillies wrote:
> Thanks for the report, Denis. I had upgraded GDAL to 2.3.1 in these wheel
> builds, and was blaming my inability to access these jp2 files on a lack of
> credentials, but maybe something has changed between Rasterio and JP2 in
> 2.3.0? I'm going to revert to 2.2.4, the version in the 1.0.2 wheels.

Sean,

Your wheels use openjpeg for GDAL JPEG2000 capability, right ? There has been
a change in GDAL ./configure in 2.3.0 related to openjpeg detection. It now
requires pkg-config to be available to detect openjpeg.

Even


--
Sean Gillies

Re: Fast X-array rasterization

Philipp Kats
 

So what I am trying to achieve in the end is the same as basic `rasterio.feature.rasterize`, but with more flexible aggregation strategies, like median, mean, percentile and std - currently it supports only `add` and `replace`.

Re: Fast X-array rasterization

Philipp Kats
 

oh sure!
here is the link:

And indeed, I am using geoseries.apply, but it is pretty much the same as looping - and obviously converting N 2d rasters into 3d will take some time as well - so I wonder if
there is a better solution here...




On Mon, Sep 10, 2018 at 11:43 AM Brendan Ward <brendan.charles.ward@...> wrote:
Hi Philipp,

How are you looping over each value to create a new raster?  Are you able to share a gist of what you are doing?

Re: Rasterio 1.0.3.post1

Even Rouault
 

On lundi 10 septembre 2018 14:29:39 CEST Sean Gillies wrote:
Thanks for the report, Denis. I had upgraded GDAL to 2.3.1 in these wheel
builds, and was blaming my inability to access these jp2 files on a lack of
credentials, but maybe something has changed between Rasterio and JP2 in
2.3.0? I'm going to revert to 2.2.4, the version in the 1.0.2 wheels.
Sean,

Your wheels use openjpeg for GDAL JPEG2000 capability, right ? There has been
a change in GDAL ./configure in 2.3.0 related to openjpeg detection. It now
requires pkg-config to be available to detect openjpeg.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com

Re: Fast X-array rasterization

Brendan Ward
 

Hi Philipp,

How are you looping over each value to create a new raster?  Are you able to share a gist of what you are doing?

Fast X-array rasterization

Philipp Kats
 

Hi! I am trying to produce a map using rasterio's rasterization and datashader, following this gist

It works perfectly well, when I rasterize all the features at once, into one 2d numpy array;
However, what I want now is to produce a "cube", having separate raster for each feature separately, with a corresponding value (say, number of category). Plan is then to perform aggregation - say, take moda, from this cube.

However, that takes a lot of time on my machine (~3 hours vs 30 sec for one raster map), so I wonder if there are any suggestions/hacks on how I can speed it up.

Thank you in advance! 



Re: Rasterio 1.0.3.post1

Sean Gillies
 

Thanks for the report, Denis. I had upgraded GDAL to 2.3.1 in these wheel builds, and was blaming my inability to access these jp2 files on a lack of credentials, but maybe something has changed between Rasterio and JP2 in 2.3.0? I'm going to revert to 2.2.4, the version in the 1.0.2 wheels.


On Mon, Sep 10, 2018 at 7:10 AM Denis Rykov <rykovd@...> wrote:
Hi Sean!

After updating rasterio I get the following error:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
rasterio/_shim.pyx in rasterio._shim.open_dataset()
 
rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()
 
CPLE_OpenFailedError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.
 
During handling of the above exception, another exception occurred:
 
RasterioIOError                           Traceback (most recent call last)
<ipython-input-2-19e7b16f33e4> in <module>()
----> 1 open('tests/data/raster/rgb.jp2')
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    401             else:
    402                 pass
--> 403             return f(*args, **kwds)
    404 
    405     return wrapper
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    215         # None.
    216         if mode == 'r':
--> 217             s = DatasetReader(path, driver=driver, **kwargs)
    218         elif mode == 'r+':
    219             s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)
 
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
RasterioIOError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.

All works without any problems with rasterio 1.0.2:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
Out[2]: <open DatasetReader name='tests/data/raster/rgb.jp2' mode='r'>



--
Sean Gillies

Re: Rasterio 1.0.3.post1

Denis Rykov
 

Hi Sean!

After updating rasterio I get the following error:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
rasterio/_shim.pyx in rasterio._shim.open_dataset()
 
rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()
 
CPLE_OpenFailedError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.
 
During handling of the above exception, another exception occurred:
 
RasterioIOError                           Traceback (most recent call last)
<ipython-input-2-19e7b16f33e4> in <module>()
----> 1 open('tests/data/raster/rgb.jp2')
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    401             else:
    402                 pass
--> 403             return f(*args, **kwds)
    404 
    405     return wrapper
 
~/sandbox/telluric/env/lib/python3.5/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    215         # None.
    216         if mode == 'r':
--> 217             s = DatasetReader(path, driver=driver, **kwargs)
    218         elif mode == 'r+':
    219             s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)
 
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
 
RasterioIOError: 'tests/data/raster/rgb.jp2' not recognized as a supported file format.

All works without any problems with rasterio 1.0.2:

In [1]: from rasterio import open
 
In [2]: open('tests/data/raster/rgb.jp2')
Out[2]: <open DatasetReader name='tests/data/raster/rgb.jp2' mode='r'>

Rasterio 1.0.3.post1

Sean Gillies
 

Hi all,

The 1.0.3 wheels I built for OSX and uploaded to PyPI included a GDAL library with no curl support. I've made a post-release that fixes this problem. If you installed one of these wheels, please `pip install -U pip` to get 1.0.3.post1.

--
Sean Gillies

Rasterio 1.0.3

Sean Gillies
 

Hi all,

Rasterio 1.0.3 is on PyPI now, finally. It fixes 2 bugs found since 1.0.2:

- A regression in GeoJSON source handling in rio-rasterize (#1425) has been fixed.
- Rasterization of linear rings (#1431) is now supported.

Furthermore, we've made a switch to a multibuild-based infrastructure for building wheels: https://github.com/sgillies/rasterio-wheels. Please let me know if the new wheels don't behave as expected.

--
Sean Gillies

Re: rio stack with compression

Sean Gillies
 

Hi Luke,

I don't understand the issue entirely, but as Even Rouault explains in https://lists.osgeo.org/pipermail/gdal-dev/2015-June/041917.html a "naive" use of the write() method of a Rasterio dataset (and rio-stack does use it naively) can result in redundant blocks of data written to the GeoTIFF file. The workarounds are to specify band interleaving (as you discovered), or increase GDAL_CACHEMAX to allow the entire output file to be kept in memory until it is written. Remember: a dataset's write() method causes data to be written to the GDAL block cache and data is written from that cache to the GeoTIFF file when the block is bumped out by new data coming in, or when the dataset object is closed.

These are the issues that I need to bring up in Rasterio's documentation.

Hope this helps!

On Thu, Sep 6, 2018 at 5:37 AM Luke Pinner <lukepinnerau@...> wrote:
I stacked some singleband rasters into a multiband raster using `rio stack` with a compress=deflate creation option.  The compressed output filesize was approx 1.4x the sum of the uncompressed input filesizes.  Specifying --co interleave=band when running `rio stack` got the output filesize down to 0.25x the uncompressed input (per issue #70).   Running `gdalbuildvrt` to stack them, then `rio convert` or `gdal_translate` on the VRT with the same creation options as the original rio stack command also results in an output raster of around 0.25x the uncompressed input, but with default pixel interleaving.

I realise you're planning to document issues with compression (#77) but in the meantime, do you have any idea why compression is so poor with default pixel interleaving using `rio stack` but quite ok when using `rio convert` (with default pixel interleaving) on an already stacked VRT?

Luke



--
Sean Gillies

rio stack with compression

Luke Pinner
 

I stacked some singleband rasters into a multiband raster using `rio stack` with a compress=deflate creation option.  The compressed output filesize was approx 1.4x the sum of the uncompressed input filesizes.  Specifying --co interleave=band when running `rio stack` got the output filesize down to 0.25x the uncompressed input (per issue #70).   Running `gdalbuildvrt` to stack them, then `rio convert` or `gdal_translate` on the VRT with the same creation options as the original rio stack command also results in an output raster of around 0.25x the uncompressed input, but with default pixel interleaving.

I realise you're planning to document issues with compression (#77) but in the meantime, do you have any idea why compression is so poor with default pixel interleaving using `rio stack` but quite ok when using `rio convert` (with default pixel interleaving) on an already stacked VRT?

Luke

Re: read masked and overviews when no-data value is defined

Guy Doulberg
 

clicked send too soon:

2. when using masked overviews, it looked like it was using the overview


later today I will try to run tests again

Do you care on how we created the original raster?


On Wed, Aug 8, 2018 at 11:43 AM, Guy Doulberg <guyd@...> wrote:
He Sean,

We are creating  cog files using our code that takes as input any raster, tiles it add overviews and rearrange the tif to be a cog.

For doing so we are using rasetio shutils copy and rasterio build_overview.

The source rasters that we are working on have nodata

We did a variant of code in which we prior for creating the overviews we also created Band Mask

1. when we had no-data values, and read using the overviews it looked like it was constructing the overview from the original size

On Wed, Aug 8, 2018 at 12:14 AM, Sean Gillies <sean.gillies@...> wrote:
Hi Guy,

A month ago we resolved an issue in Rasterio that has some commonalities with yours. There was a report that overviews were being missed when reading masked arrays and it turned out that the overviews themselves were in error.


In Rasterio, we will skip reading a mask from the dataset if the dataset has no mask, but otherwise we don't call GDAL functions differently in the band mask vs nodata cases.

Can you tell us what nodata value is implicated in this problem and what software produced the data files?


On Mon, Aug 6, 2018 at 8:52 AM <guyd@...> wrote:
Hi all,

We are trying to read masked array using `DatasetReaderBase.read`, reading the documentation it looks that if we pass masked=True we will get what we wanted. 

I was testing the read on raster with pre-built overviews and I used the overviews in read. From functionality point of view, I always got the right result, but I was tesing the data block that are being used, and I noticed this behaviors:


1. When using band-mask, there was a read only from the overview to construct the mask -- as excpected
2. When using no-data value there were reads the data of the original resolution and not from the nearest overview? when mask=False it reads data only from the overview,  why is that?

A workaround for that is to read with no mask and to construct the mask myself using the no-data value.

Thanks.


--
Sean Gillies