Date   

Combining rasters using rasterio

James David Smith
 

Hello,

Thanks for the help last week with the OVR files. I think I've solved that now.

My next question is about combining rasters of shared space, but different extents. Each raster has one band. I want to combine them into one raster, keeping the values of the highest cell value where they overlap. Hopefully the attached image explains better than I can in writing. If anyone can point me in the right direction or provide some examples I'd be very grateful. A previous colleague completed this work using ArcPY, and I'm doing my best to re-write the code in open-source. Thanks!

Capture.PNG


Re: Understanding `+wktext` more

Sean Gillies
 

Hi,

On Mon, Jul 1, 2019 at 9:54 AM David Hoese <dhoese@...> wrote:
I posted this originally on github, but have moved it here at Sean's request. Additional details of this question/issue are discussed in the `pyproj` repository [here](https://github.com/pyproj4/pyproj/issues/357).
 
I discovered that `nsper` and `ob_tran` projections required the `+wktext` parameter to be accepted by rasterio's `CRS` objects. It wasn't a big deal to add this until recently where I started using pyproj's `CRS` objects as my library's container for CRS information. Using pyproj 2.0+ and its `CRS` objects, the `+wktext` of my PROJ dictionary is being ignored/removed. This means that doing `rasterio.crs.CRS.from_dict(CRS.to_dict())` fails because rasterio needs the `+wktext` parameter.
 
I'm not sure if this should be considered a bug in pyproj or in rasterio or not a bug at all. I found the related issues in this repository that mentioned `+wktext` but I had trouble understanding what it was for and why it was needed. Any help and information would be appreciated.
 
## Examples
 
from pyproj import CRS                                                                                                                                                            
from rasterio.crs import CRS as rioCRS                                                                                                                                            
 
proj_dict = {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'wktext': True}
 
CRS.from_dict(proj_dict).to_dict()                                                                                                                                                
# {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'type': 'crs'}
 
rioCRS.from_dict(proj_dict)                                                                                                                                                       
# CRS.from_wkt('PROJCS["unnamed",GEOGCS["unnamed ellipse",DATUM["unknown",SPHEROID["unnamed",6371000,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["custom_proj4"],EXTENSION["PROJ4","+proj=ob_tran +o_proj=eqc +o_lat_p=30.0 +o_lon_p=10.0 +lon_0=-10.0 +a=6371000.0 +b=6371000.0 +wktext=True"]]')
 
rioCRS.from_dict(CRS.from_dict(proj_dict).to_dict()) 
# CRSError: The PROJ4 dict could not be understood. OGR Error code 5
The latest version of pyproj uses PROJ version 6, which is based on WKT (version 2) and not on the old PROJ.4 key/value representation of projections. I think +wktext may be gone from PROJ 6 as its no longer needed to carry extra WKT information into the PROJ.4 realm. If you really want to use the pyproj.CRS class like this, you'll need to seek help from that project.

Furthermore dicts are no longer the highest fidelity interface between pyproj and rasterio. To make a rasterio CRS object from a pyproj CRS object, export WKT (version 1) from the latter [1] and send that to rasterio.crs.CRS.from_wkt(). I haven't tested this, but am confident that it's the best approach.


--
Sean Gillies


Understanding `+wktext` more

David Hoese
 

I posted this originally on github, but have moved it here at Sean's request. Additional details of this question/issue are discussed in the `pyproj` repository [here](https://github.com/pyproj4/pyproj/issues/357).
 
I discovered that `nsper` and `ob_tran` projections required the `+wktext` parameter to be accepted by rasterio's `CRS` objects. It wasn't a big deal to add this until recently where I started using pyproj's `CRS` objects as my library's container for CRS information. Using pyproj 2.0+ and its `CRS` objects, the `+wktext` of my PROJ dictionary is being ignored/removed. This means that doing `rasterio.crs.CRS.from_dict(CRS.to_dict())` fails because rasterio needs the `+wktext` parameter.
 
I'm not sure if this should be considered a bug in pyproj or in rasterio or not a bug at all. I found the related issues in this repository that mentioned `+wktext` but I had trouble understanding what it was for and why it was needed. Any help and information would be appreciated.
 
## Examples
 

from pyproj import CRS                                                                                                                                                            
from rasterio.crs import CRS as rioCRS                                                                                                                                            
 
proj_dict = {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'wktext': True}
 
CRS.from_dict(proj_dict).to_dict()                                                                                                                                                
# {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'type': 'crs'}
 
rioCRS.from_dict(proj_dict)                                                                                                                                                       
# CRS.from_wkt('PROJCS["unnamed",GEOGCS["unnamed ellipse",DATUM["unknown",SPHEROID["unnamed",6371000,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["custom_proj4"],EXTENSION["PROJ4","+proj=ob_tran +o_proj=eqc +o_lat_p=30.0 +o_lon_p=10.0 +lon_0=-10.0 +a=6371000.0 +b=6371000.0 +wktext=True"]]')
 
rioCRS.from_dict(CRS.from_dict(proj_dict).to_dict()) 
# CRSError: The PROJ4 dict could not be understood. OGR Error code 5
 


Re: Writing raster to GeoPackage layer

Sean Gillies
 

It may be that rasterio doesn't support multilayer raster datasets. I'll make a note to open an issue in the tracker about this. 


On Sat, Jun 29, 2019, 10:19 AM Ryan <code@...> wrote:
Installed rasterio through conda:

conda create -n gpkg-test -c conda-forge python=3.7 rasterio


Re: Writing raster to GeoPackage layer

Ryan
 

Installed rasterio through conda:

conda create -n gpkg-test -c conda-forge python=3.7 rasterio


Re: Writing ESRI sidecar files

Sean Gillies
 

Hi,

On Fri, Jun 28, 2019 at 10:00 AM James David Smith <james.david.smith@...> wrote:
Thanks again Sean, and Evan too. It seems I just need to get the right
combination of options.

The options that you have shown Even ....how are they implemented?
They don't seem to go in the Env like the other options?

Correct, those dataset creation options need to be passed into rasterio.open().

One way to do that is to add to your metadata dict below

  metadata["profile"] = "GeoTIFF"

Any extra keyword argument passed to rasterio.open() is collected into a kwargs dict (see https://rasterio.readthedocs.io/en/latest/api/rasterio.html#rasterio.open) and items of that dict are turned into GDAL dataset creation options (by uppercasing the keyword argument names and converting their values appropriately).


My code at the moment as follows. It create the .tif and .ovr file .
If anyone can improve it, and help me get the aux.xml file too I'd be
very grateful. I need the ovr and aux.xml files to be created
independently as it's what the client needs.
-------
import rasterio

with rasterio.open("meri50year.tif") as file:
    data               = file.read(1)
    metadata           = file.profile
    metadata['nodata'] = 255
    data[data == 1 ]   = 4
    data[data == 15 ]  = 255

outfile = 'meri50_processed.tif'

with rasterio.Env(GDAL_PAM_ENABLED=True, ESRI_XML_PAM=True, TIFF_USE_OVR=True):
    with rasterio.open(outfile, 'w', **metadata) as new_temp_file:
        new_temp_file.write(data, 1)
        overviews = [2,4,8,16]
        new_temp_file.build_overviews(overviews, Resampling.nearest)
--------


On Fri, 28 Jun 2019 at 16:09, Even Rouault <even.rouault@...> wrote:
>
> On vendredi 28 juin 2019 08:26:35 CEST Sean Gillies wrote:
> > If you are creating a GeoTIFF it is possible that no XML file will be
> > written. As Even Rouault explains in
> > https://lists.osgeo.org/pipermail/gdal-dev/2010-May/024522.html, the PAM
> > file is only created if needed.
>
> Indeed. You can play with the GTIFF specific PROFILE creation option
> (see https://gdal.org/drivers/raster/gtiff.html#creation-issues)
>
> PROFILE=BASELINE: .aux.xml only created if presence of georeferencing and/or
> user metadata
> PROFILE=GEOTIFF: .aux.xml only created if there's user metadata
> PROFILE=GDALGeoTIFF: .aux.xml only created if user metadata doesn't fit in
> TIFF tag
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
>
>
>





--
Sean Gillies


Re: Writing ESRI sidecar files

James David Smith
 

Thanks again Sean, and Evan too. It seems I just need to get the right
combination of options.

The options that you have shown Even ....how are they implemented?
They don't seem to go in the Env like the other options?

My code at the moment as follows. It create the .tif and .ovr file .
If anyone can improve it, and help me get the aux.xml file too I'd be
very grateful. I need the ovr and aux.xml files to be created
independently as it's what the client needs.
-------
import rasterio

with rasterio.open("meri50year.tif") as file:
data = file.read(1)
metadata = file.profile
metadata['nodata'] = 255
data[data == 1 ] = 4
data[data == 15 ] = 255

outfile = 'meri50_processed.tif'

with rasterio.Env(GDAL_PAM_ENABLED=True, ESRI_XML_PAM=True, TIFF_USE_OVR=True):
with rasterio.open(outfile, 'w', **metadata) as new_temp_file:
new_temp_file.write(data, 1)
overviews = [2,4,8,16]
new_temp_file.build_overviews(overviews, Resampling.nearest)
--------

On Fri, 28 Jun 2019 at 16:09, Even Rouault <even.rouault@spatialys.com> wrote:

On vendredi 28 juin 2019 08:26:35 CEST Sean Gillies wrote:
If you are creating a GeoTIFF it is possible that no XML file will be
written. As Even Rouault explains in
https://lists.osgeo.org/pipermail/gdal-dev/2010-May/024522.html, the PAM
file is only created if needed.
Indeed. You can play with the GTIFF specific PROFILE creation option
(see https://gdal.org/drivers/raster/gtiff.html#creation-issues)

PROFILE=BASELINE: .aux.xml only created if presence of georeferencing and/or
user metadata
PROFILE=GEOTIFF: .aux.xml only created if there's user metadata
PROFILE=GDALGeoTIFF: .aux.xml only created if user metadata doesn't fit in
TIFF tag

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com



Re: Writing ESRI sidecar files

Even Rouault
 

On vendredi 28 juin 2019 08:26:35 CEST Sean Gillies wrote:
If you are creating a GeoTIFF it is possible that no XML file will be
written. As Even Rouault explains in
https://lists.osgeo.org/pipermail/gdal-dev/2010-May/024522.html, the PAM
file is only created if needed.
Indeed. You can play with the GTIFF specific PROFILE creation option
(see https://gdal.org/drivers/raster/gtiff.html#creation-issues)

PROFILE=BASELINE: .aux.xml only created if presence of georeferencing and/or
user metadata
PROFILE=GEOTIFF: .aux.xml only created if there's user metadata
PROFILE=GDALGeoTIFF: .aux.xml only created if user metadata doesn't fit in
TIFF tag

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com


Re: Writing ESRI sidecar files

Sean Gillies
 

If you are creating a GeoTIFF it is possible that no XML file will be written. As Even Rouault explains in https://lists.osgeo.org/pipermail/gdal-dev/2010-May/024522.html, the PAM file is only created if needed.

In the script below, I'm using JPEG as an output format and do see creation of a PAM file.

from rasterio import Env
from rasterio.shutil import copy

with Env(GDAL_PAM_ENABLED=True, ESRI_XML_PAM=True):
    copy("/Users/seang/code/rasterio/tests/data/RGB.byte.tif", "/tmp/foo.jpg", driver="JPEG")

$ ls -l foo.jpg*
-rw-r--r--  1 seang  wheel  103112 Jun 28 08:19 foo.jpg
-rw-r--r--  1 seang  wheel    2252 Jun 28 08:19 foo.jpg.aux.xml

I hope this gets you on the right track!


On Fri, Jun 28, 2019 at 7:36 AM James David Smith <james.david.smith@...> wrote:
This didn't work unfortunately Sean. No errors .... but nothing else
was wrote out as I was hoping. If you get a few minutes would you mind
testing to ensure I'm not going crazy? Thanks, James

On Thu, 27 Jun 2019 at 23:05, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:
>
> Sorry, James, I messed up and used the GDAL vernacular. Replace YES with True and I think you'll be good to go.
>
> On Thu, Jun 27, 2019 at 3:09 PM James David Smith <james.david.smith@...> wrote:
>>
>> Thanks for the tip Sean, but that doesn't work for me.
>>
>> with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES):
>> ----- name 'YES' is not defined
>>
>> If I change 'YES' to True then the code runs .... but I'm not sure
>> it's recognizing the commands as the aux.xml file isn't created (but
>> the .ovr is).
>>
>> Anyone got a good idea?
>>
>>
>> On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io
>> <sean=mapbox.com@groups.io> wrote:
>> >
>> > Hi James,
>> >
>> > I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
>> >
>> > https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED
>> > https://gdal.org/drivers/raster/gtiff.html#configuration-options
>> >
>> >
>> > On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@...> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I'm trying to use rasterio to manipulate a file and then write it out,
>> >> along with accompanying ESRI-type files. So the output files will be:
>> >>
>> >> raster.tif,
>> >> raster.tif.aux.xml
>> >> raster.tif.ovr
>> >>
>> >> I think before writing the file I can do something like this to create
>> >> the ovr file.
>> >>
>> >> rasterio.Env(TIFF_USE_OVR=True)
>> >>
>> >> Is there a similar command to create the aux.xml file please?
>> >>
>> >> Thanks, James
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Sean Gillies
>> >
>>
>>
>>
>
>
> --
> Sean Gillies
>





--
Sean Gillies


Re: Writing ESRI sidecar files

James David Smith
 

This didn't work unfortunately Sean. No errors .... but nothing else
was wrote out as I was hoping. If you get a few minutes would you mind
testing to ensure I'm not going crazy? Thanks, James

On Thu, 27 Jun 2019 at 23:05, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:

Sorry, James, I messed up and used the GDAL vernacular. Replace YES with True and I think you'll be good to go.

On Thu, Jun 27, 2019 at 3:09 PM James David Smith <james.david.smith@gmail.com> wrote:

Thanks for the tip Sean, but that doesn't work for me.

with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES):
----- name 'YES' is not defined

If I change 'YES' to True then the code runs .... but I'm not sure
it's recognizing the commands as the aux.xml file isn't created (but
the .ovr is).

Anyone got a good idea?


On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:

Hi James,

I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.

https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED
https://gdal.org/drivers/raster/gtiff.html#configuration-options


On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@gmail.com> wrote:

Hello,

I'm trying to use rasterio to manipulate a file and then write it out,
along with accompanying ESRI-type files. So the output files will be:

raster.tif,
raster.tif.aux.xml
raster.tif.ovr

I think before writing the file I can do something like this to create
the ovr file.

rasterio.Env(TIFF_USE_OVR=True)

Is there a similar command to create the aux.xml file please?

Thanks, James



--
Sean Gillies


--
Sean Gillies


Re: Writing raster to GeoPackage layer

Sean Gillies
 

Hi,

Can you tell us how you installed rasterio? If you installed one of the binary wheels from PyPI (which I built), you may be out of luck. I haven't paid any attention to making sure that they have support for geopackage rasters. I hope someone else can report whether they do or do not.

Also note that when you open an existing dataset in r+ mode, all of the keyword arguments you provide will be ignored, so the properties dict you've made in your code is unnecessary.


On Thu, Jun 20, 2019 at 12:48 PM Ryan <code@...> wrote:
I'm trying to add a raster layer to an existing GPKG with rasterio:

properties = {
        'driver':'GPKG',
        'width':array.shape[1],
        'height':array.shape[0],
        'count':1,
        'crs':crs,
        'transform':transform,
        'dtype':np.float32,
        'nodata':-9999,
        'layer':'rasterlayer'}
with rasterio.open('testing.gpkg', 'r+', **properties) as dst_dataset:
    dst_dataset.write(array)

However I get the following error:

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 76, in rasterio._base.get_dataset_driver
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: 'testing.gpkg' not recognized as a supported file format.

I can do this with "gdal_translate" so I at least know the GPKG and rasters are all properly formatted:

gdal_translate -of GPKG raster.tiff testing.gpkg -co APPEND_SUBDATASET=YES -co RASTER_TABLE=rasterlayer


Side Question: Is there a way to format code in groups.io? I can't seem to find anything on the edit toolbar :-/



--
Sean Gillies


Re: Writing ESRI sidecar files

Sean Gillies
 

Sorry, James, I messed up and used the GDAL vernacular. Replace YES with True and I think you'll be good to go.


On Thu, Jun 27, 2019 at 3:09 PM James David Smith <james.david.smith@...> wrote:
Thanks for the tip Sean, but that doesn't work for me.

with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES):
----- name 'YES' is not defined

If I change 'YES' to True then the code runs .... but I'm not sure
it's recognizing the commands as the aux.xml file isn't created (but
the .ovr is).

Anyone got a good idea?


On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:
>
> Hi James,
>
> I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
>
> https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED
> https://gdal.org/drivers/raster/gtiff.html#configuration-options
>
>
> On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@...> wrote:
>>
>> Hello,
>>
>> I'm trying to use rasterio to manipulate a file and then write it out,
>> along with accompanying ESRI-type files. So the output files will be:
>>
>> raster.tif,
>> raster.tif.aux.xml
>> raster.tif.ovr
>>
>> I think before writing the file I can do something like this to create
>> the ovr file.
>>
>> rasterio.Env(TIFF_USE_OVR=True)
>>
>> Is there a similar command to create the aux.xml file please?
>>
>> Thanks, James
>>
>>
>>
>
>
> --
> Sean Gillies
>





--
Sean Gillies


Re: Writing ESRI sidecar files

James David Smith
 

Thanks for the tip Sean, but that doesn't work for me.

with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES):
----- name 'YES' is not defined

If I change 'YES' to True then the code runs .... but I'm not sure
it's recognizing the commands as the aux.xml file isn't created (but
the .ovr is).

Anyone got a good idea?


On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:

Hi James,

I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.

https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED
https://gdal.org/drivers/raster/gtiff.html#configuration-options


On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@gmail.com> wrote:

Hello,

I'm trying to use rasterio to manipulate a file and then write it out,
along with accompanying ESRI-type files. So the output files will be:

raster.tif,
raster.tif.aux.xml
raster.tif.ovr

I think before writing the file I can do something like this to create
the ovr file.

rasterio.Env(TIFF_USE_OVR=True)

Is there a similar command to create the aux.xml file please?

Thanks, James



--
Sean Gillies


Re: Writing ESRI sidecar files

Sean Gillies
 

Hi James,

I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.



On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@...> wrote:
Hello,

I'm trying to use rasterio to manipulate a file and then write it out,
along with accompanying ESRI-type files. So the output files will be:

raster.tif,
raster.tif.aux.xml
raster.tif.ovr

I think before writing the file I can do something like this to create
the ovr file.

rasterio.Env(TIFF_USE_OVR=True)

Is there a similar command to create the aux.xml file please?

Thanks, James





--
Sean Gillies


Writing ESRI sidecar files

James David Smith
 

Hello,

I'm trying to use rasterio to manipulate a file and then write it out,
along with accompanying ESRI-type files. So the output files will be:

raster.tif,
raster.tif.aux.xml
raster.tif.ovr

I think before writing the file I can do something like this to create
the ovr file.

rasterio.Env(TIFF_USE_OVR=True)

Is there a similar command to create the aux.xml file please?

Thanks, James


Writing raster to GeoPackage layer

Ryan
 

I'm trying to add a raster layer to an existing GPKG with rasterio:

properties = {
        'driver':'GPKG',
        'width':array.shape[1],
        'height':array.shape[0],
        'count':1,
        'crs':crs,
        'transform':transform,
        'dtype':np.float32,
        'nodata':-9999,
        'layer':'rasterlayer'}
with rasterio.open('testing.gpkg', 'r+', **properties) as dst_dataset:
    dst_dataset.write(array)

However I get the following error:

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 76, in rasterio._base.get_dataset_driver
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: 'testing.gpkg' not recognized as a supported file format.

I can do this with "gdal_translate" so I at least know the GPKG and rasters are all properly formatted:

gdal_translate -of GPKG raster.tiff testing.gpkg -co APPEND_SUBDATASET=YES -co RASTER_TABLE=rasterlayer


Side Question: Is there a way to format code in groups.io? I can't seem to find anything on the edit toolbar :-/


Re: Occasional "not recognized as a supported file format" errors when reading from S3

Sean Gillies
 

Dion, Jonas:

There is a new report in the FIona tracker that may be related: https://github.com/Toblerity/Fiona/issues/761. There I advised turning on CPL_CURL_VERBOSE=YES to see more details about the HTTP requests that are failing. It will cause the volume of your logs to explode and it may be a challenge to find the signal in the noise, but I believe we will find some clues in the HTTP responses.


On Wed, Jun 12, 2019 at 1:55 AM Jonas <j08lue@...> wrote:
OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/



--
Sean Gillies


Re: problem with window sizes when parallelizing a funciton

javier lopatin
 

Dear Dior, Thanks!, this was perfect. I did't get it the first time, but know it runs smoothly.
Many thanks! 
Javier

El jue., 13 jun. 2019 a las 16:35, Dion Häfner (<dion.haefner@...>) escribió:

As I mentioned, the canonical way to do this is to leverage functools.partial.

For example:


    
from functools import partial
from concurrent.futures import ProcessPoolExecutor

    
def worker_function(x, arg1, arg2):
    ...

    
if __name__ == '__main__':
    do_work = partial(worker_function, arg1=0, arg2='foo')
    with ProcessPoolExecutor(4) as executor:
        executor.map(do_work, [1, 2, 3])


The arguments have to be picklable, I think, but they should be in your case.


On 13/06/2019 16.29, javier lopatin via Groups.Io wrote:
Hi Dion, Thanks for the tip. I thought that that might be the problem. Any ideas of how to pass a function to executor.map that need several arguments? so far I'm only able to pass functions with a dstack as only argument, like:

def main(x):
    ...
    return something
executor.map(main, x)

so if I put the function main outside in the global scope I now need to have several args in the function:

def main(x, arg1, arg2, arg3, ...):
    ...
   return something
 
I have no idea how to pass these args to executor.map(). Any ideas?
I really appreciate this!
Cheers,
Javier

El jue., 13 jun. 2019 a las 15:12, Dion Häfner (<dion.haefner@...>) escribió:

Hey Javier,

this is not really a rasterio problem, mostly a Python peculiarity.

In general, you cannot pass functions defined in a local closure to a different process. Just move your main function to global scope and you should be fine. You can use functools.partial to fix some optional arguments of your main function and achieve the same thing as with your closure.

Best,
Dion

On 13/06/2019 15.05, javier lopatin via Groups.Io wrote:
Hi Guys!

I'm still working with the parallelization of raster processes with rasterio, and I came across a problem that you might know about. I've some raster stacks of time series data from which I want to obtain a interpolation signature. In practice I've a function to process the raster chunks in parallel with rasterio (calling a function main()):

def _parallel_process(inData, outData, main, count,  n_jobs=4, chuckSize=256):
    with rasterio.Env():
        with rasterio.open(inData) as src:
            profile = src.profile
            profile.update(blockxsize=chuckSize, blockysize=chuckSize,
                           count=count, dtype='float64', tiled=True)
            with rasterio.open(outData, "w", **profile) as dst:
                windows = [window for ij, window in dst.block_windows()]
                data_gen = (src.read(window=window) for window in windows)
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
                    for window, result in zip(
                        tqdm(windows), executor.map(main, data_gen)
                    ):
                        dst.write(result, window=window)

And I have another function that calls _parallel_process:

def PhenoShape(inData, outData, dates=None, nan_replace=None, rollWindow=None, nGS=46, chuckSize=256, n_jobs=4):
    def main(dstack):
        # load raster as a xarray
        xarray = xr.DataArray(dstack)
        xarray.coords['dim_0'] = dates
        xarray.coords['doy'] = xarray.dim_0.dt.dayofyear
        # sort basds according to day-of-the-year
        xarray = xarray.sortby('doy')
        # rearrange time dimension for smoothing and interpolation
        xarray['time'] = xarray['doy']
        # turn a value to NaN
        if nan_replace is not None:
            xarray = xarray.where(xarray.values != nan_replace)
        # rolling average using moving window
        if rollWindow is not None:
            xarray = xarray.rolling(dim_0=rollWindow, center=True).mean()
        # prepare inputs to getPheno
        x = xarray.doy.values
        y = xarray.values
        # get phenology shape accross the time axis
        return np.apply_along_axis(_getPheno, 0, y, x, nGS)
    # apply PhenoShape with parallel processing
    try:
        _parallel_process(inData, outData, main, nGS, n_jobs, chuckSize)
    except AttributeError:
        print('ERROR somehow.. :(')
 
This works really fine! But when I import the script as a library (functions inside phenology.py):

import phenology as phen
phen.PhenoShape(inData=inData, outData=outData, dates=dates, rollWindow=3,
    nan_replace=-32767, nGS=46, chuckSize=256, n_jobs=4)

I get the following error:

  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'

Any ideas? The main idea here is to call this functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.

El jue., 23 may. 2019 a las 11:13, javier lopatin via Groups.Io (<javierlopatin=gmail.com@groups.io>) escribió:
Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...



--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


Re: problem with window sizes when parallelizing a funciton

Dion Häfner
 

As I mentioned, the canonical way to do this is to leverage functools.partial.

For example:


    
from functools import partial
from concurrent.futures import ProcessPoolExecutor

    
def worker_function(x, arg1, arg2):
    ...

    
if __name__ == '__main__':
    do_work = partial(worker_function, arg1=0, arg2='foo')
    with ProcessPoolExecutor(4) as executor:
        executor.map(do_work, [1, 2, 3])


The arguments have to be picklable, I think, but they should be in your case.


On 13/06/2019 16.29, javier lopatin via Groups.Io wrote:
Hi Dion, Thanks for the tip. I thought that that might be the problem. Any ideas of how to pass a function to executor.map that need several arguments? so far I'm only able to pass functions with a dstack as only argument, like:

def main(x):
    ...
    return something
executor.map(main, x)

so if I put the function main outside in the global scope I now need to have several args in the function:

def main(x, arg1, arg2, arg3, ...):
    ...
   return something
 
I have no idea how to pass these args to executor.map(). Any ideas?
I really appreciate this!
Cheers,
Javier

El jue., 13 jun. 2019 a las 15:12, Dion Häfner (<dion.haefner@...>) escribió:

Hey Javier,

this is not really a rasterio problem, mostly a Python peculiarity.

In general, you cannot pass functions defined in a local closure to a different process. Just move your main function to global scope and you should be fine. You can use functools.partial to fix some optional arguments of your main function and achieve the same thing as with your closure.

Best,
Dion

On 13/06/2019 15.05, javier lopatin via Groups.Io wrote:
Hi Guys!

I'm still working with the parallelization of raster processes with rasterio, and I came across a problem that you might know about. I've some raster stacks of time series data from which I want to obtain a interpolation signature. In practice I've a function to process the raster chunks in parallel with rasterio (calling a function main()):

def _parallel_process(inData, outData, main, count,  n_jobs=4, chuckSize=256):
    with rasterio.Env():
        with rasterio.open(inData) as src:
            profile = src.profile
            profile.update(blockxsize=chuckSize, blockysize=chuckSize,
                           count=count, dtype='float64', tiled=True)
            with rasterio.open(outData, "w", **profile) as dst:
                windows = [window for ij, window in dst.block_windows()]
                data_gen = (src.read(window=window) for window in windows)
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
                    for window, result in zip(
                        tqdm(windows), executor.map(main, data_gen)
                    ):
                        dst.write(result, window=window)

And I have another function that calls _parallel_process:

def PhenoShape(inData, outData, dates=None, nan_replace=None, rollWindow=None, nGS=46, chuckSize=256, n_jobs=4):
    def main(dstack):
        # load raster as a xarray
        xarray = xr.DataArray(dstack)
        xarray.coords['dim_0'] = dates
        xarray.coords['doy'] = xarray.dim_0.dt.dayofyear
        # sort basds according to day-of-the-year
        xarray = xarray.sortby('doy')
        # rearrange time dimension for smoothing and interpolation
        xarray['time'] = xarray['doy']
        # turn a value to NaN
        if nan_replace is not None:
            xarray = xarray.where(xarray.values != nan_replace)
        # rolling average using moving window
        if rollWindow is not None:
            xarray = xarray.rolling(dim_0=rollWindow, center=True).mean()
        # prepare inputs to getPheno
        x = xarray.doy.values
        y = xarray.values
        # get phenology shape accross the time axis
        return np.apply_along_axis(_getPheno, 0, y, x, nGS)
    # apply PhenoShape with parallel processing
    try:
        _parallel_process(inData, outData, main, nGS, n_jobs, chuckSize)
    except AttributeError:
        print('ERROR somehow.. :(')
 
This works really fine! But when I import the script as a library (functions inside phenology.py):

import phenology as phen
phen.PhenoShape(inData=inData, outData=outData, dates=dates, rollWindow=3,
    nan_replace=-32767, nGS=46, chuckSize=256, n_jobs=4)

I get the following error:

  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'

Any ideas? The main idea here is to call this functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.

El jue., 23 may. 2019 a las 11:13, javier lopatin via Groups.Io (<javierlopatin=gmail.com@groups.io>) escribió:
Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


Re: problem with window sizes when parallelizing a funciton

javier lopatin
 

Hi Dion, Thanks for the tip. I thought that that might be the problem. Any ideas of how to pass a function to executor.map that need several arguments? so far I'm only able to pass functions with a dstack as only argument, like:

def main(x):
    ...
    return something
executor.map(main, x)

so if I put the function main outside in the global scope I now need to have several args in the function:

def main(x, arg1, arg2, arg3, ...):
    ...
   return something
 
I have no idea how to pass these args to executor.map(). Any ideas?
I really appreciate this!
Cheers,
Javier

El jue., 13 jun. 2019 a las 15:12, Dion Häfner (<dion.haefner@...>) escribió:

Hey Javier,

this is not really a rasterio problem, mostly a Python peculiarity.

In general, you cannot pass functions defined in a local closure to a different process. Just move your main function to global scope and you should be fine. You can use functools.partial to fix some optional arguments of your main function and achieve the same thing as with your closure.

Best,
Dion

On 13/06/2019 15.05, javier lopatin via Groups.Io wrote:
Hi Guys!

I'm still working with the parallelization of raster processes with rasterio, and I came across a problem that you might know about. I've some raster stacks of time series data from which I want to obtain a interpolation signature. In practice I've a function to process the raster chunks in parallel with rasterio (calling a function main()):

def _parallel_process(inData, outData, main, count,  n_jobs=4, chuckSize=256):
    with rasterio.Env():
        with rasterio.open(inData) as src:
            profile = src.profile
            profile.update(blockxsize=chuckSize, blockysize=chuckSize,
                           count=count, dtype='float64', tiled=True)
            with rasterio.open(outData, "w", **profile) as dst:
                windows = [window for ij, window in dst.block_windows()]
                data_gen = (src.read(window=window) for window in windows)
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
                    for window, result in zip(
                        tqdm(windows), executor.map(main, data_gen)
                    ):
                        dst.write(result, window=window)

And I have another function that calls _parallel_process:

def PhenoShape(inData, outData, dates=None, nan_replace=None, rollWindow=None, nGS=46, chuckSize=256, n_jobs=4):
    def main(dstack):
        # load raster as a xarray
        xarray = xr.DataArray(dstack)
        xarray.coords['dim_0'] = dates
        xarray.coords['doy'] = xarray.dim_0.dt.dayofyear
        # sort basds according to day-of-the-year
        xarray = xarray.sortby('doy')
        # rearrange time dimension for smoothing and interpolation
        xarray['time'] = xarray['doy']
        # turn a value to NaN
        if nan_replace is not None:
            xarray = xarray.where(xarray.values != nan_replace)
        # rolling average using moving window
        if rollWindow is not None:
            xarray = xarray.rolling(dim_0=rollWindow, center=True).mean()
        # prepare inputs to getPheno
        x = xarray.doy.values
        y = xarray.values
        # get phenology shape accross the time axis
        return np.apply_along_axis(_getPheno, 0, y, x, nGS)
    # apply PhenoShape with parallel processing
    try:
        _parallel_process(inData, outData, main, nGS, n_jobs, chuckSize)
    except AttributeError:
        print('ERROR somehow.. :(')
 
This works really fine! But when I import the script as a library (functions inside phenology.py):

import phenology as phen
phen.PhenoShape(inData=inData, outData=outData, dates=dates, rollWindow=3,
    nan_replace=-32767, nGS=46, chuckSize=256, n_jobs=4)

I get the following error:

  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'
Traceback (most recent call last):
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'PhenoShape.<locals>.main'

Any ideas? The main idea here is to call this functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.

El jue., 23 may. 2019 a las 11:13, javier lopatin via Groups.Io (<javierlopatin=gmail.com@groups.io>) escribió:
Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. 
Grate job on the library, congrats!
Cheers,
Javier

El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió:
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...


--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...



--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...

441 - 460 of 698