Date   
Clarification on rio bounds --sequence/--collection

Yann-Sebastien Tremblay-Johnston
 

Hi,

In the usage description of `rio bounds --help` 

```
...

 --sequence / --collection  Write a LF-delimited sequence of texts containing
                             individual objects (the default) or write a
                             single JSON text containing a feature collection
                             object.
...

 --collection               Output as GeoJSON feature collection(s).
```

confuses me. The first instance of `--collection` seems to imply that if several input files are passed like `rio bounds --collection *.tif`, then the output would be a single GeoJSON document with a single FeatureCollection containing multiple Features corresponding to each tif file expanded in the shell. The second instance of `--collection` is perhaps more honest in setting expectation that a FeatureCollection object will be created for each tif file. I'm wondering if the first I described was ever meant to be intended behavior? or has there been a regression? 

Thank you so much!

Re: Occasional "not recognized as a supported file format" errors when reading from S3

Dion Häfner
 

Hey Sean,

I don't know if you've seen our update on the rasterio issue tracker:

https://github.com/mapbox/rasterio/issues/1686#issuecomment-541737735

As it turns out, `GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR` was the culprit. Do you know if that is related to a known bug in GDAL, or whether this is something that should be reported separately?

For now, we have disabled the flag in Terracotta, and everything seems to work fine, but we did take a performance hit from that. So in the long run, it would be great if that could be fixed.

Best,
Dion

On 15/06/2019 16.51, Sean Gillies via Groups.Io wrote:
Dion, Jonas:
There is a new report in the FIona tracker that may be related: https://github.com/Toblerity/Fiona/issues/761. There I advised turning on CPL_CURL_VERBOSE=YES to see more details about the HTTP requests that are failing. It will cause the volume of your logs to explode and it may be a challenge to find the signal in the noise, but I believe we will find some clues in the HTTP responses.
On Wed, Jun 12, 2019 at 1:55 AM Jonas <@j08lue <mailto:@j08lue>> wrote:
OK, sorry, forget what I said - the same map tile that earlier was
failing is now served fine. :/
--
Sean Gillies

Re: Flaw in rasterio design?

Sean Gillies
 

Hi all,

Since I haven't written much about Rasterio's design principles or goals beyond what is at https://rasterio.readthedocs.io/en/latest/intro.html#philosophy it is only natural that these questions come up.

Rasterio's stated design goals are to eliminate the biggest "gotchas" of the GDAL Python bindings and make GDAL configuration a little more Pythonic. The unstated design principles include minimizing the invention of new concepts and minimizing the risk of project failure. Rasterio's new concept of using a context manager to patch GDAL's runtime configuration has already used up most of our new invention budget, I believe. Whether or not this was the right choice depends on your perspective. It has certainly made my own team's code less brittle and more easy to reason about.

Not developing application-specific classes has been one way of minimizing the risk of project failure. Rasterio is going to stay useful for a long time by staying relatively small and behind the cutting edge. We have very limited resources and if we had used them to build a GIS application framework we would have been at risk (in my view) of missing the mark or being swept away by the new flavor of the month as we've seen in other domains. In my team's projects that use Rasterio we do have application classes, but they are constantly changing and I'm very glad not to have published them and committed myself to supporting their use, forever, for free.

On the other hand, count me in for helping design raster data protocols and standards. I'm all for them, but I do not think Rasterio is the right place to experiment with them.


On Tue, Oct 15, 2019 at 3:33 AM Dion Häfner <dion.haefner@...> wrote:
There are some libraries that try and implement a raster datatype, which
rasterio explicitly does not. Strangely, none of them is as successful
as rasterio.

I see rasterio as a tool to get the job done with minimal abstraction,
and I have the highest respect for this type of design. I like not
having to wrap my head around yet another container type. But others can
speak more about the original reasoning behind that design choice, which
I don't know.

I agree that in an ideal world with well-funded open source development
we would probably have pandas or xarray for raster data that doesn't
leak GDALisms into its abstractions. Until then, I suggest you come up
with a more concrete suggestion on how to improve the tools we have.

Best,
Dion

On 15/10/2019 11.14, Amine Aboufirass via Groups.Io wrote:
> Hi Dion,
>
> As an experienced user you find yourself juggling between dataset
> objects and arrays/transform objects way too often. It slows down the
> workflow and makes rasterio a pain to use. The choice of where to burn a
> file or not should be left up to the user. Dataset objects should be
> abstract and in memory at all times unless the user wishes for them to
> be written as final output.
>
> I feel like the attachment to these file abstractions is an artefact
> from whatever rasterio is built on top of (GDAL? C++?) and is not
> pythonic at all. Take for instance the pandas dataframe object. All
> operations are based on the in-memory object, or a pointer thereto. The
> choice of whether to write to excel, csv or heaven-knows-what other kind
> of file format is only a method of the dataframe object and not some
> kind of intrinsic part of it.
>
> In python the abstraction should represent the physical file and not the
> opposite.
>
> On Tue, Oct 15, 2019 at 11:05 AM Dion Häfner <dion.haefner@...
> <mailto:dion.haefner@...>> wrote:
>
>     Hey Amine,
>
>     usually there are enough functions to allow you to have your cake and
>     eat it, too.
>
>     Instead of reproject, I usually find myself using a VRT. An alternative
>     to mask that returns an ndarray is raster_geometry_mask. Some of this
>     you will have to roll yourself, though.
>
>     I think most of the functions that burn rasters directly (such as mask)
>     are meant for less experienced users that just want to get the job
>     done.
>     Maybe there should ideally always be two versions of each function, one
>     operating at dataset level, one at data (+ metadata) level. But as an
>     experienced user you can pretty much always work around that limitation.
>
>     Hope that helps,
>     Dion
>
>     On 15/10/2019 10.51, Amine Aboufirass via Groups.Io wrote:
>      > Dear All,
>      >
>      > Does anyone have any difficulty with input and output datatypes for
>      > rasterio functions in general? Many of the python functions in the
>      > library have inconsistent input and output. For instance,
>     sometimes they
>      > have ndarrays and transforms as input/output and sometimes they have
>      > physical datasets as input output. This is very annoying when
>     trying to
>      > develop a generic workflow. Here is a prime example:
>      > *
>      > *
>      > *rasterio.warp.reproject * takes ndarrays and source/destination
>      > transforms to do its work while *mask.mask *takes a dataset
>     opened in
>      > "r" mode. So then what if I first want to open a dataset, clip
>     and then
>      > reproject in my script? I would have to open the file as a
>     dataset in
>      > "r" mode, clip and then burn it into a file. Finally I would have to
>      > open this new file, extract its contents and transform and pass that
>      > into the *reproject *function and burn a third file.
>      >
>      > At first I thought memory files were a good solution, but in fact
>     they
>      > are not because we still see many functions which require
>     non-dataset
>      > formats to do their work. I really think this is a flaw in rasterio
>      > design. There should be one object type which can be passed into
>      > different functions regardless of what they do.
>      >
>      > In the meantime how can I adapt my code so I don't have to be
>     aware of
>      > input and output datatypes?
>      >
>
>
>
>





--
Sean Gillies

Re: Flaw in rasterio design?

Amine Aboufirass
 

Dear Alan, 

I just had a look. It looks like they are in fact taking those things into consideration. Very exciting. I can't wait to put this to use.

On another note, such a library would not have been possible without rasterio, so where rasterio shines (IMHO) is as a lower-level library upon which more pythonic libraries are constructed.

Regards,

Amine

On Tue, Oct 15, 2019 at 3:09 PM Amine Aboufirass via Groups.Io <amine.aboufirass=gmail.com@groups.io> wrote:
Hi Alan,

Thanks for your response. I am now a bit more intrigued by xarray, but I have a couple of questions:
  • Wouldn't you then lose all geospatial information? Things like affine and CRS are really important
  • Would I then be able to pass such an xarrayish object to rasterio functions or do similar things based on the rasterio package? I still need access to functions like masking, reprojecting, resampling and warping.
I'll take a look myself as well to check if it can reduce my suffering :P.

Regards,

Amine

On Tue, Oct 15, 2019 at 3:05 PM Alan Snow <alansnow21@...> wrote:
Hi Amine,

rasterio is definitely a great tool. But, if you are looking for an interface to it similar to pandas, you may be interested in rioxarray. It wraps the rasterio code and gives you an xarray (n-dimensional pandas) interface.

Here is a link to some examples that show how to do operations such as reproject and clip.
https://corteva.github.io/rioxarray/html/examples/examples.html

Hope this helps,
Alan

Re: Flaw in rasterio design?

Amine Aboufirass
 

Hi Alan,

Thanks for your response. I am now a bit more intrigued by xarray, but I have a couple of questions:
  • Wouldn't you then lose all geospatial information? Things like affine and CRS are really important
  • Would I then be able to pass such an xarrayish object to rasterio functions or do similar things based on the rasterio package? I still need access to functions like masking, reprojecting, resampling and warping.
I'll take a look myself as well to check if it can reduce my suffering :P.

Regards,

Amine

On Tue, Oct 15, 2019 at 3:05 PM Alan Snow <alansnow21@...> wrote:
Hi Amine,

rasterio is definitely a great tool. But, if you are looking for an interface to it similar to pandas, you may be interested in rioxarray. It wraps the rasterio code and gives you an xarray (n-dimensional pandas) interface.

Here is a link to some examples that show how to do operations such as reproject and clip.
https://corteva.github.io/rioxarray/html/examples/examples.html

Hope this helps,
Alan

Re: Flaw in rasterio design?

Alan Snow
 

Hi Amine,

rasterio is definitely a great tool. But, if you are looking for an interface to it similar to pandas, you may be interested in rioxarray. It wraps the rasterio code and gives you an xarray (n-dimensional pandas) interface.

Here is a link to some examples that show how to do operations such as reproject and clip.
https://corteva.github.io/rioxarray/html/examples/examples.html

Hope this helps,
Alan

Re: Flaw in rasterio design?

Amine Aboufirass
 

If it is intentional design then there is probably a reason for it. However it comes at a severe cost of useability. One of the main advantages of python is that users don't really need to know much about the internal workings of each library in order to use it. Some libraries are better encapsulated than others. Rasterio is definitely not the worst. I remember coming from GDAL's python bindings to rasterio and breathing a sigh of relief when I realized how much higher level it is.

However, due to this design it requires users to mess around with things like affine matrices, crs objects and also keeping track of array shapes. This partly defeats the purpose.

I hope you see my point and also that it is a valid point. I of course enjoy many of rasterio's features, but this one is particularly problematic.

On Tue, Oct 15, 2019 at 11:33 AM Dion Häfner <dion.haefner@...> wrote:
There are some libraries that try and implement a raster datatype, which
rasterio explicitly does not. Strangely, none of them is as successful
as rasterio.

I see rasterio as a tool to get the job done with minimal abstraction,
and I have the highest respect for this type of design. I like not
having to wrap my head around yet another container type. But others can
speak more about the original reasoning behind that design choice, which
I don't know.

I agree that in an ideal world with well-funded open source development
we would probably have pandas or xarray for raster data that doesn't
leak GDALisms into its abstractions. Until then, I suggest you come up
with a more concrete suggestion on how to improve the tools we have.

Best,
Dion

On 15/10/2019 11.14, Amine Aboufirass via Groups.Io wrote:
> Hi Dion,
>
> As an experienced user you find yourself juggling between dataset
> objects and arrays/transform objects way too often. It slows down the
> workflow and makes rasterio a pain to use. The choice of where to burn a
> file or not should be left up to the user. Dataset objects should be
> abstract and in memory at all times unless the user wishes for them to
> be written as final output.
>
> I feel like the attachment to these file abstractions is an artefact
> from whatever rasterio is built on top of (GDAL? C++?) and is not
> pythonic at all. Take for instance the pandas dataframe object. All
> operations are based on the in-memory object, or a pointer thereto. The
> choice of whether to write to excel, csv or heaven-knows-what other kind
> of file format is only a method of the dataframe object and not some
> kind of intrinsic part of it.
>
> In python the abstraction should represent the physical file and not the
> opposite.
>
> On Tue, Oct 15, 2019 at 11:05 AM Dion Häfner <dion.haefner@...
> <mailto:dion.haefner@...>> wrote:
>
>     Hey Amine,
>
>     usually there are enough functions to allow you to have your cake and
>     eat it, too.
>
>     Instead of reproject, I usually find myself using a VRT. An alternative
>     to mask that returns an ndarray is raster_geometry_mask. Some of this
>     you will have to roll yourself, though.
>
>     I think most of the functions that burn rasters directly (such as mask)
>     are meant for less experienced users that just want to get the job
>     done.
>     Maybe there should ideally always be two versions of each function, one
>     operating at dataset level, one at data (+ metadata) level. But as an
>     experienced user you can pretty much always work around that limitation.
>
>     Hope that helps,
>     Dion
>
>     On 15/10/2019 10.51, Amine Aboufirass via Groups.Io wrote:
>      > Dear All,
>      >
>      > Does anyone have any difficulty with input and output datatypes for
>      > rasterio functions in general? Many of the python functions in the
>      > library have inconsistent input and output. For instance,
>     sometimes they
>      > have ndarrays and transforms as input/output and sometimes they have
>      > physical datasets as input output. This is very annoying when
>     trying to
>      > develop a generic workflow. Here is a prime example:
>      > *
>      > *
>      > *rasterio.warp.reproject * takes ndarrays and source/destination
>      > transforms to do its work while *mask.mask *takes a dataset
>     opened in
>      > "r" mode. So then what if I first want to open a dataset, clip
>     and then
>      > reproject in my script? I would have to open the file as a
>     dataset in
>      > "r" mode, clip and then burn it into a file. Finally I would have to
>      > open this new file, extract its contents and transform and pass that
>      > into the *reproject *function and burn a third file.
>      >
>      > At first I thought memory files were a good solution, but in fact
>     they
>      > are not because we still see many functions which require
>     non-dataset
>      > formats to do their work. I really think this is a flaw in rasterio
>      > design. There should be one object type which can be passed into
>      > different functions regardless of what they do.
>      >
>      > In the meantime how can I adapt my code so I don't have to be
>     aware of
>      > input and output datatypes?
>      >
>
>
>
>



Re: Flaw in rasterio design?

Dion Häfner
 

There are some libraries that try and implement a raster datatype, which rasterio explicitly does not. Strangely, none of them is as successful as rasterio.

I see rasterio as a tool to get the job done with minimal abstraction, and I have the highest respect for this type of design. I like not having to wrap my head around yet another container type. But others can speak more about the original reasoning behind that design choice, which I don't know.

I agree that in an ideal world with well-funded open source development we would probably have pandas or xarray for raster data that doesn't leak GDALisms into its abstractions. Until then, I suggest you come up with a more concrete suggestion on how to improve the tools we have.

Best,
Dion

On 15/10/2019 11.14, Amine Aboufirass via Groups.Io wrote:
Hi Dion,
As an experienced user you find yourself juggling between dataset objects and arrays/transform objects way too often. It slows down the workflow and makes rasterio a pain to use. The choice of where to burn a file or not should be left up to the user. Dataset objects should be abstract and in memory at all times unless the user wishes for them to be written as final output.
I feel like the attachment to these file abstractions is an artefact from whatever rasterio is built on top of (GDAL? C++?) and is not pythonic at all. Take for instance the pandas dataframe object. All operations are based on the in-memory object, or a pointer thereto. The choice of whether to write to excel, csv or heaven-knows-what other kind of file format is only a method of the dataframe object and not some kind of intrinsic part of it.
In python the abstraction should represent the physical file and not the opposite.
On Tue, Oct 15, 2019 at 11:05 AM Dion Häfner <dion.haefner@... <mailto:dion.haefner@...>> wrote:
Hey Amine,
usually there are enough functions to allow you to have your cake and
eat it, too.
Instead of reproject, I usually find myself using a VRT. An alternative
to mask that returns an ndarray is raster_geometry_mask. Some of this
you will have to roll yourself, though.
I think most of the functions that burn rasters directly (such as mask)
are meant for less experienced users that just want to get the job
done.
Maybe there should ideally always be two versions of each function, one
operating at dataset level, one at data (+ metadata) level. But as an
experienced user you can pretty much always work around that limitation.
Hope that helps,
Dion
On 15/10/2019 10.51, Amine Aboufirass via Groups.Io wrote:
> Dear All,
>
> Does anyone have any difficulty with input and output datatypes for
> rasterio functions in general? Many of the python functions in the
> library have inconsistent input and output. For instance,
sometimes they
> have ndarrays and transforms as input/output and sometimes they have
> physical datasets as input output. This is very annoying when
trying to
> develop a generic workflow. Here is a prime example:
> *
> *
> *rasterio.warp.reproject * takes ndarrays and source/destination
> transforms to do its work while *mask.mask *takes a dataset
opened in
> "r" mode. So then what if I first want to open a dataset, clip
and then
> reproject in my script? I would have to open the file as a
dataset in
> "r" mode, clip and then burn it into a file. Finally I would have to
> open this new file, extract its contents and transform and pass that
> into the *reproject *function and burn a third file.
>
> At first I thought memory files were a good solution, but in fact
they
> are not because we still see many functions which require
non-dataset
> formats to do their work. I really think this is a flaw in rasterio
> design. There should be one object type which can be passed into
> different functions regardless of what they do.
>
> In the meantime how can I adapt my code so I don't have to be
aware of
> input and output datatypes?
>

Re: Flaw in rasterio design?

Amine Aboufirass
 

Hi Dion,

As an experienced user you find yourself juggling between dataset objects and arrays/transform objects way too often. It slows down the workflow and makes rasterio a pain to use. The choice of where to burn a file or not should be left up to the user. Dataset objects should be abstract and in memory at all times unless the user wishes for them to be written as final output. 

I feel like the attachment to these file abstractions is an artefact from whatever rasterio is built on top of (GDAL? C++?) and is not pythonic at all. Take for instance the pandas dataframe object. All operations are based on the in-memory object, or a pointer thereto. The choice of whether to write to excel, csv or heaven-knows-what other kind of file format is only a method of the dataframe object and not some kind of intrinsic part of it.

In python the abstraction should represent the physical file and not the opposite.

On Tue, Oct 15, 2019 at 11:05 AM Dion Häfner <dion.haefner@...> wrote:
Hey Amine,

usually there are enough functions to allow you to have your cake and
eat it, too.

Instead of reproject, I usually find myself using a VRT. An alternative
to mask that returns an ndarray is raster_geometry_mask. Some of this
you will have to roll yourself, though.

I think most of the functions that burn rasters directly (such as mask)
are meant for less experienced users that just want to get the job done.
Maybe there should ideally always be two versions of each function, one
operating at dataset level, one at data (+ metadata) level. But as an
experienced user you can pretty much always work around that limitation.

Hope that helps,
Dion

On 15/10/2019 10.51, Amine Aboufirass via Groups.Io wrote:
> Dear All,
>
> Does anyone have any difficulty with input and output datatypes for
> rasterio functions in general? Many of the python functions in the
> library have inconsistent input and output. For instance, sometimes they
> have ndarrays and transforms as input/output and sometimes they have
> physical datasets as input output. This is very annoying when trying to
> develop a generic workflow. Here is a prime example:
> *
> *
> *rasterio.warp.reproject * takes ndarrays and source/destination
> transforms to do its work while *mask.mask *takes a dataset opened in
> "r" mode. So then what if I first want to open a dataset, clip and then
> reproject in my script? I would have to open the file as a dataset in
> "r" mode, clip and then burn it into a file. Finally I would have to
> open this new file, extract its contents and transform and pass that
> into the *reproject *function and burn a third file.
>
> At first I thought memory files were a good solution, but in fact they
> are not because we still see many functions which require non-dataset
> formats to do their work. I really think this is a flaw in rasterio
> design. There should be one object type which can be passed into
> different functions regardless of what they do.
>
> In the meantime how can I adapt my code so I don't have to be aware of
> input and output datatypes?
>



Re: Flaw in rasterio design?

Dion Häfner
 

Hey Amine,

usually there are enough functions to allow you to have your cake and eat it, too.

Instead of reproject, I usually find myself using a VRT. An alternative to mask that returns an ndarray is raster_geometry_mask. Some of this you will have to roll yourself, though.

I think most of the functions that burn rasters directly (such as mask) are meant for less experienced users that just want to get the job done. Maybe there should ideally always be two versions of each function, one operating at dataset level, one at data (+ metadata) level. But as an experienced user you can pretty much always work around that limitation.

Hope that helps,
Dion

On 15/10/2019 10.51, Amine Aboufirass via Groups.Io wrote:
Dear All,
Does anyone have any difficulty with input and output datatypes for rasterio functions in general? Many of the python functions in the library have inconsistent input and output. For instance, sometimes they have ndarrays and transforms as input/output and sometimes they have physical datasets as input output. This is very annoying when trying to develop a generic workflow. Here is a prime example:
*
*
*rasterio.warp.reproject * takes ndarrays and source/destination transforms to do its work while *mask.mask *takes a dataset opened in "r" mode. So then what if I first want to open a dataset, clip and then reproject in my script? I would have to open the file as a dataset in "r" mode, clip and then burn it into a file. Finally I would have to open this new file, extract its contents and transform and pass that into the *reproject *function and burn a third file.
At first I thought memory files were a good solution, but in fact they are not because we still see many functions which require non-dataset formats to do their work. I really think this is a flaw in rasterio design. There should be one object type which can be passed into different functions regardless of what they do.
In the meantime how can I adapt my code so I don't have to be aware of input and output datatypes?

Flaw in rasterio design?

Amine Aboufirass
 

Dear All,

Does anyone have any difficulty with input and output datatypes for rasterio functions in general? Many of the python functions in the library have inconsistent input and output. For instance, sometimes they have ndarrays and transforms as input/output and sometimes they have physical datasets as input output. This is very annoying when trying to develop a generic workflow. Here is a prime example:

rasterio.warp.reproject  takes ndarrays and source/destination transforms to do its work while mask.mask takes a dataset opened in "r" mode. So then what if I first want to open a dataset, clip and then reproject in my script? I would have to open the file as a dataset in "r" mode, clip and then burn it into a file. Finally I would have to open this new file, extract its contents and transform and pass that into the reproject function and burn a third file.

At first I thought memory files were a good solution, but in fact they are not because we still see many functions which require non-dataset formats to do their work. I really think this is a flaw in rasterio design. There should be one object type which can be passed into different functions regardless of what they do. 

In the meantime how can I adapt my code so I don't have to be aware of input and output datatypes? 

masking using another raster

Gregory, Matthew
 

Hi all,

Simple question, but I want to make sure I'm doing this the canonical way.

Assume two single-band rasters with identical projections/transforms and one of the rasters has NODATA pixels defining an irregular shape and 1 otherwise. Is the best way to mask the other raster to use rio calc? e.g.:

rio calc "(* (read 1) (read 2))" -o out.tif input.tif mask.tif

This works, but didn't know if a different utility is better suited.

thanks, matt

Rasterio 1.1.0

Sean Gillies
 

Hi all,

Rasterio 1.1.0 is on PyPI now. There have been no changes since 1.1b3.

Thanks for your help, everyone!

--
Sean Gillies

Rasterio 1.1b3

Sean Gillies
 

Vincent Sarago brought a regression to my attention. It is fixed in 1.1b3, which is on PyPI now. I've used a couple of my team's projects to smoke test 1.1b3 and it seems to be ready to go. Let me know if you discover any problems that would block a release on Monday.

--
Sean Gillies

Re: gdal_translate -projwin with rasterio

James McBride
 

Haha, then I didn't my due diligence either. :) 

Re: gdal_translate -projwin with rasterio

Guillaume Lostis
 

You are right James, I had not done my due diligence thoroughly :)

The issue you are referring to must be this PR: https://github.com/mapbox/rasterio/pull/1565

There was also some more recent discussion on the subject here: https://github.com/mapbox/rasterio/issues/1143#issuecomment-503202993


On Wed, 2 Oct 2019 at 21:04, James McBride <jdmcbr@...> wrote:

  • Why is there no way to do a 'round' (not a 'ceil' or a 'floor') with the window.round_lengths() function? If it existed, it would make my code simpler.
I've wondered the same (and think even asked about it within another issue? though maybe I just thought about doing that). I'd be happy to add support for `round` if there's no opposition from Sean or other maintainers. 

Re: gdal_translate -projwin with rasterio

James McBride
 


  • Why is there no way to do a 'round' (not a 'ceil' or a 'floor') with the window.round_lengths() function? If it existed, it would make my code simpler.
I've wondered the same (and think even asked about it within another issue? though maybe I just thought about doing that). I'd be happy to add support for `round` if there's no opposition from Sean or other maintainers. 

gdal_translate -projwin with rasterio

Guillaume Lostis
 

Hi all,

I am trying to rewrite some legacy code that was using gdal_translate -projwin to use rasterio. The goal of the code is just to crop an image, given geographical bounds in the CRS of the image.

I struggled to understand what gdal_translate was doing when the projection of the projwin bounds into image coordinates did not align perfectly with the pixels, but I think I have it figured out now:

  • the coordinates of the upper left corner of projwin are "floored" to match integer pixel values
  • the pixel width and height derived from projwin are "rounded" to match integer values

(the projwin documentation doesn't really explain what happens in that case, I had to find through experiments).

I now have a python function doing nearly exactly the same thing as what gdal_translate -projwin does:

import subprocess

import rasterio


def rio_crop(src_path, dst_path, ulx, uly, lrx, lry):
    with rasterio.open(src_path) as src:
        bounds = (ulx, lry, lrx, uly)

        # Get the pixel coordinates of the bounds in src_path
        window = src.window(*bounds)

        # Do a 'ceil' operation on the offsets
        window = window.round_offsets()

        # Do a 'round' operation on the lengths
        # window.round_lengths() only offers to do a 'ceil' or 'floor', not a 'round'
        width = round(window.width)
        height = round(window.height)
        window = rasterio.windows.Window(window.col_off, window.row_off, width, height)

        profile = src.profile
        profile.update(
            {
                "driver": "GTiff",
                "compress": "deflate",
                "height": height,
                "width": width,
                "transform": src.window_transform(window),
            }
        )

        with rasterio.open(dst_path, "w", **profile) as out:
            out.write(src.read(window=window))


def gdal_crop(src_path, dst_path, ulx, uly, lrx, lry):
    cmd = [
        "gdal_translate",
        src_path,
        dst_path,
        "-of",
        "GTiff",
        "-projwin",
        str(ulx),
        str(uly),
        str(lrx),
        str(lry),
    ]
    subprocess.check_output(cmd)


if __name__ == "__main__":
    src_path = "/vsicurl/https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/176/039/LC08_L1TP_176039_20180411_20180417_01_T1/LC08_L1TP_176039_20180411_20180417_01_T1_B8.TIF"
    ulx = 318753
    uly = 3319235
    lrx = 321313
    lry = 3316675
    for delta in range(0, 30, 6):
        print(delta)
        rio_crop(src_path, f"rio_{delta}.tif", ulx, uly, lrx + delta, lry + delta)
        gdal_crop(src_path, f"gdal_{delta}.tif", ulx, uly, lrx + delta, lry + delta)
        with rasterio.open(f"rio_{delta}.tif") as rio:
            with rasterio.open(f"gdal_{delta}.tif") as gdal:
                assert rio.bounds == gdal.bounds
                print(rio.tags()["AREA_OR_POINT"])
                print(gdal.tags()["AREA_OR_POINT"])

My questions are:

  1. Why is there no way to do a 'round' (not a 'ceil' or a 'floor') with the window.round_lengths() function? If it existed, it would make my code simpler.
  2. When I compare the images output by rio_crop and by gdal_crop, they differ by their value of AREA_OR_POINT. GDAL preserves the original value (POINT), whereas rasterio changes it to AREA. I am always confused by the meaning of this variable, but I was wondering why rasterio doesn't behave like GDAL in that case?

Thanks!

Guillaume Lostis

Rasterio 1.1b1

Sean Gillies
 

Hi all,

A pre-release of Rasterio 1.1 is on PyPI now. Please try it out if you can and see if you can find any new bugs before we have a 1.1.0 release as soon as tomorrow.

Thanks!

--
Sean Gillies

Re: Serializing and deserializing rasterio object

Juliano
 

Thank you!