Combining rasters using rasterio
Hello, Thanks for the help last week with the OVR files. I think I've solved that now. My next question is about combining rasters of shared space, but different extents. Each raster has one band. I want to combine them into one raster, keeping the values of the highest cell value where they overlap. Hopefully the attached image explains better than I can in writing. If anyone can point me in the right direction or provide some examples I'd be very grateful. A previous colleague completed this work using ArcPY, and I'm doing my best to re-write the code in open-source. Thanks!
|
|
Re: Understanding `+wktext` more
Hi, On Mon, Jul 1, 2019 at 9:54 AM David Hoese < dhoese@...> wrote:
I discovered that `nsper` and `ob_tran` projections required the `+wktext` parameter to be accepted by rasterio's `CRS` objects. It wasn't a big deal to add this until recently where I started using pyproj's `CRS` objects as my library's container for CRS information. Using pyproj 2.0+ and its `CRS` objects, the `+wktext` of my PROJ dictionary is being ignored/removed. This means that doing `rasterio.crs.CRS.from_dict(CRS.to_dict())` fails because rasterio needs the `+wktext` parameter.
I'm not sure if this should be considered a bug in pyproj or in rasterio or not a bug at all. I found the related issues in this repository that mentioned `+wktext` but I had trouble understanding what it was for and why it was needed. Any help and information would be appreciated.
## Examples
from pyproj import CRS
from rasterio.crs import CRS as rioCRS
proj_dict = {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'wktext': True}
CRS.from_dict(proj_dict).to_dict()
# {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'type': 'crs'}
rioCRS.from_dict(proj_dict)
# CRS.from_wkt('PROJCS["unnamed",GEOGCS["unnamed ellipse",DATUM["unknown",SPHEROID["unnamed",6371000,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["custom_proj4"],EXTENSION["PROJ4","+proj=ob_tran +o_proj=eqc +o_lat_p=30.0 +o_lon_p=10.0 +lon_0=-10.0 +a=6371000.0 +b=6371000.0 +wktext=True"]]')
rioCRS.from_dict(CRS.from_dict(proj_dict).to_dict())
# CRSError: The PROJ4 dict could not be understood. OGR Error code 5 The latest version of pyproj uses PROJ version 6, which is based on WKT (version 2) and not on the old PROJ.4 key/value representation of projections. I think +wktext may be gone from PROJ 6 as its no longer needed to carry extra WKT information into the PROJ.4 realm. If you really want to use the pyproj.CRS class like this, you'll need to seek help from that project.
Furthermore dicts are no longer the highest fidelity interface between pyproj and rasterio. To make a rasterio CRS object from a pyproj CRS object, export WKT (version 1) from the latter [1] and send that to rasterio.crs.CRS.from_wkt(). I haven't tested this, but am confident that it's the best approach.
--
|
|
Understanding `+wktext` more
I posted this originally on github, but have moved it here at Sean's request. Additional details of this question/issue are discussed in the `pyproj` repository [here](https://github.com/pyproj4/pyproj/issues/357).
I discovered that `nsper` and `ob_tran` projections required the `+wktext` parameter to be accepted by rasterio's `CRS` objects. It wasn't a big deal to add this until recently where I started using pyproj's `CRS` objects as my library's container for CRS information. Using pyproj 2.0+ and its `CRS` objects, the `+wktext` of my PROJ dictionary is being ignored/removed. This means that doing `rasterio.crs.CRS.from_dict(CRS.to_dict())` fails because rasterio needs the `+wktext` parameter.
I'm not sure if this should be considered a bug in pyproj or in rasterio or not a bug at all. I found the related issues in this repository that mentioned `+wktext` but I had trouble understanding what it was for and why it was needed. Any help and information would be appreciated.
## Examples
toggle quoted messageShow quoted text
from pyproj import CRS
from rasterio.crs import CRS as rioCRS
proj_dict = {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'wktext': True}
CRS.from_dict(proj_dict).to_dict()
# {'proj': 'ob_tran', 'o_proj': 'eqc', 'o_lat_p': 30.0, 'o_lon_p': 10.0, 'lon_0': -10.0, 'a': 6371000.0, 'b': 6371000.0, 'type': 'crs'}
rioCRS.from_dict(proj_dict)
# CRS.from_wkt('PROJCS["unnamed",GEOGCS["unnamed ellipse",DATUM["unknown",SPHEROID["unnamed",6371000,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["custom_proj4"],EXTENSION["PROJ4","+proj=ob_tran +o_proj=eqc +o_lat_p=30.0 +o_lon_p=10.0 +lon_0=-10.0 +a=6371000.0 +b=6371000.0 +wktext=True"]]')
rioCRS.from_dict(CRS.from_dict(proj_dict).to_dict())
# CRSError: The PROJ4 dict could not be understood. OGR Error code 5
|
|
Re: Writing raster to GeoPackage layer
It may be that rasterio doesn't support multilayer raster datasets. I'll make a note to open an issue in the tracker about this.
toggle quoted messageShow quoted text
On Sat, Jun 29, 2019, 10:19 AM Ryan < code@...> wrote: Installed rasterio through conda:
conda create -n gpkg-test -c conda-forge python=3.7 rasterio
|
|
Re: Writing raster to GeoPackage layer
Installed rasterio through conda:
conda create -n gpkg-test -c conda-forge python=3.7 rasterio
|
|
Re: Writing ESRI sidecar files
Hi,
Thanks again Sean, and Evan too. It seems I just need to get the right
combination of options.
The options that you have shown Even ....how are they implemented?
They don't seem to go in the Env like the other options?
Correct, those dataset creation options need to be passed into rasterio.open().
One way to do that is to add to your metadata dict below
metadata["profile"] = "GeoTIFF"
My code at the moment as follows. It create the .tif and .ovr file .
If anyone can improve it, and help me get the aux.xml file too I'd be
very grateful. I need the ovr and aux.xml files to be created
independently as it's what the client needs.
-------
import rasterio
with rasterio.open("meri50year.tif") as file:
data = file.read(1)
metadata = file.profile
metadata['nodata'] = 255
data[data == 1 ] = 4
data[data == 15 ] = 255
outfile = 'meri50_processed.tif'
with rasterio.Env(GDAL_PAM_ENABLED=True, ESRI_XML_PAM=True, TIFF_USE_OVR=True):
with rasterio.open(outfile, 'w', **metadata) as new_temp_file:
new_temp_file.write(data, 1)
overviews = [2,4,8,16]
new_temp_file.build_overviews(overviews, Resampling.nearest)
--------
On Fri, 28 Jun 2019 at 16:09, Even Rouault <even.rouault@...> wrote:
>
> On vendredi 28 juin 2019 08:26:35 CEST Sean Gillies wrote:
> > If you are creating a GeoTIFF it is possible that no XML file will be
> > written. As Even Rouault explains in
> > https://lists.osgeo.org/pipermail/gdal-dev/2010-May/024522.html, the PAM
> > file is only created if needed.
>
> Indeed. You can play with the GTIFF specific PROFILE creation option
> (see https://gdal.org/drivers/raster/gtiff.html#creation-issues)
>
> PROFILE=BASELINE: .aux.xml only created if presence of georeferencing and/or
> user metadata
> PROFILE=GEOTIFF: .aux.xml only created if there's user metadata
> PROFILE=GDALGeoTIFF: .aux.xml only created if user metadata doesn't fit in
> TIFF tag
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
>
>
>
-- Sean Gillies
|
|
Re: Writing ESRI sidecar files
Thanks again Sean, and Evan too. It seems I just need to get the right combination of options.
The options that you have shown Even ....how are they implemented? They don't seem to go in the Env like the other options?
My code at the moment as follows. It create the .tif and .ovr file . If anyone can improve it, and help me get the aux.xml file too I'd be very grateful. I need the ovr and aux.xml files to be created independently as it's what the client needs. ------- import rasterio
with rasterio.open("meri50year.tif") as file: data = file.read(1) metadata = file.profile metadata['nodata'] = 255 data[data == 1 ] = 4 data[data == 15 ] = 255
outfile = 'meri50_processed.tif'
with rasterio.Env(GDAL_PAM_ENABLED=True, ESRI_XML_PAM=True, TIFF_USE_OVR=True): with rasterio.open(outfile, 'w', **metadata) as new_temp_file: new_temp_file.write(data, 1) overviews = [2,4,8,16] new_temp_file.build_overviews(overviews, Resampling.nearest) --------
toggle quoted messageShow quoted text
On Fri, 28 Jun 2019 at 16:09, Even Rouault <even.rouault@spatialys.com> wrote: On vendredi 28 juin 2019 08:26:35 CEST Sean Gillies wrote:
If you are creating a GeoTIFF it is possible that no XML file will be written. As Even Rouault explains in https://lists.osgeo.org/pipermail/gdal-dev/2010-May/024522.html, the PAM file is only created if needed. Indeed. You can play with the GTIFF specific PROFILE creation option (see https://gdal.org/drivers/raster/gtiff.html#creation-issues)
PROFILE=BASELINE: .aux.xml only created if presence of georeferencing and/or user metadata PROFILE=GEOTIFF: .aux.xml only created if there's user metadata PROFILE=GDALGeoTIFF: .aux.xml only created if user metadata doesn't fit in TIFF tag
Even
-- Spatialys - Geospatial professional services http://www.spatialys.com
|
|
Re: Writing ESRI sidecar files
|
|
Re: Writing ESRI sidecar files
In the script below, I'm using JPEG as an output format and do see creation of a PAM file.
from rasterio import Env from rasterio.shutil import copy
with Env(GDAL_PAM_ENABLED=True, ESRI_XML_PAM=True): copy("/Users/seang/code/rasterio/tests/data/RGB.byte.tif", "/tmp/foo.jpg", driver="JPEG")
$ ls -l foo.jpg* -rw-r--r-- 1 seang wheel 103112 Jun 28 08:19 foo.jpg -rw-r--r-- 1 seang wheel 2252 Jun 28 08:19 foo.jpg.aux.xml
I hope this gets you on the right track!
toggle quoted messageShow quoted text
This didn't work unfortunately Sean. No errors .... but nothing else
was wrote out as I was hoping. If you get a few minutes would you mind
testing to ensure I'm not going crazy? Thanks, James
On Thu, 27 Jun 2019 at 23:05, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:
>
> Sorry, James, I messed up and used the GDAL vernacular. Replace YES with True and I think you'll be good to go.
>
> On Thu, Jun 27, 2019 at 3:09 PM James David Smith <james.david.smith@...> wrote:
>>
>> Thanks for the tip Sean, but that doesn't work for me.
>>
>> with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES):
>> ----- name 'YES' is not defined
>>
>> If I change 'YES' to True then the code runs .... but I'm not sure
>> it's recognizing the commands as the aux.xml file isn't created (but
>> the .ovr is).
>>
>> Anyone got a good idea?
>>
>>
>> On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io
>> <sean=mapbox.com@groups.io> wrote:
>> >
>> > Hi James,
>> >
>> > I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
>> >
>> > https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED
>> > https://gdal.org/drivers/raster/gtiff.html#configuration-options
>> >
>> >
>> > On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@...> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I'm trying to use rasterio to manipulate a file and then write it out,
>> >> along with accompanying ESRI-type files. So the output files will be:
>> >>
>> >> raster.tif,
>> >> raster.tif.aux.xml
>> >> raster.tif.ovr
>> >>
>> >> I think before writing the file I can do something like this to create
>> >> the ovr file.
>> >>
>> >> rasterio.Env(TIFF_USE_OVR=True)
>> >>
>> >> Is there a similar command to create the aux.xml file please?
>> >>
>> >> Thanks, James
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Sean Gillies
>> >
>>
>>
>>
>
>
> --
> Sean Gillies
>
|
|
Re: Writing ESRI sidecar files
This didn't work unfortunately Sean. No errors .... but nothing else was wrote out as I was hoping. If you get a few minutes would you mind testing to ensure I'm not going crazy? Thanks, James On Thu, 27 Jun 2019 at 23:05, Sean Gillies via Groups.Io <sean=mapbox.com@groups.io> wrote: Sorry, James, I messed up and used the GDAL vernacular. Replace YES with True and I think you'll be good to go.
On Thu, Jun 27, 2019 at 3:09 PM James David Smith <james.david.smith@gmail.com> wrote:
Thanks for the tip Sean, but that doesn't work for me.
with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES): ----- name 'YES' is not defined
If I change 'YES' to True then the code runs .... but I'm not sure it's recognizing the commands as the aux.xml file isn't created (but the .ovr is).
Anyone got a good idea?
On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io <sean=mapbox.com@groups.io> wrote:
Hi James,
I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED https://gdal.org/drivers/raster/gtiff.html#configuration-options
On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@gmail.com> wrote:
Hello,
I'm trying to use rasterio to manipulate a file and then write it out, along with accompanying ESRI-type files. So the output files will be:
raster.tif, raster.tif.aux.xml raster.tif.ovr
I think before writing the file I can do something like this to create the ovr file.
rasterio.Env(TIFF_USE_OVR=True)
Is there a similar command to create the aux.xml file please?
Thanks, James
-- Sean Gillies
-- Sean Gillies
|
|
Re: Writing raster to GeoPackage layer
Hi,
Can you tell us how you installed rasterio? If you installed one of the binary wheels from PyPI (which I built), you may be out of luck. I haven't paid any attention to making sure that they have support for geopackage rasters. I hope someone else can report whether they do or do not.
Also note that when you open an existing dataset in r+ mode, all of the keyword arguments you provide will be ignored, so the properties dict you've made in your code is unnecessary.
toggle quoted messageShow quoted text
On Thu, Jun 20, 2019 at 12:48 PM Ryan < code@...> wrote: I'm trying to add a raster layer to an existing GPKG with rasterio:
properties = { 'driver':'GPKG', 'width':array.shape[1], 'height':array.shape[0], 'count':1, 'crs':crs, 'transform':transform, 'dtype':np.float32, 'nodata':-9999, 'layer':'rasterlayer'} with rasterio.open('testing.gpkg', 'r+', **properties) as dst_dataset: dst_dataset.write(array)
However I get the following error:
Traceback (most recent call last): File "rasterio/_base.pyx", line 76, in rasterio._base.get_dataset_driver File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: 'testing.gpkg' not recognized as a supported file format.
I can do this with "gdal_translate" so I at least know the GPKG and rasters are all properly formatted:
gdal_translate -of GPKG raster.tiff testing.gpkg -co APPEND_SUBDATASET=YES -co RASTER_TABLE=rasterlayer
Side Question: Is there a way to format code in groups.io? I can't seem to find anything on the edit toolbar :-/
|
|
Re: Writing ESRI sidecar files
Sorry, James, I messed up and used the GDAL vernacular. Replace YES with True and I think you'll be good to go.
toggle quoted messageShow quoted text
Thanks for the tip Sean, but that doesn't work for me.
with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES):
----- name 'YES' is not defined
If I change 'YES' to True then the code runs .... but I'm not sure
it's recognizing the commands as the aux.xml file isn't created (but
the .ovr is).
Anyone got a good idea?
On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io
<sean=mapbox.com@groups.io> wrote:
>
> Hi James,
>
> I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
>
> https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED
> https://gdal.org/drivers/raster/gtiff.html#configuration-options
>
>
> On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@...> wrote:
>>
>> Hello,
>>
>> I'm trying to use rasterio to manipulate a file and then write it out,
>> along with accompanying ESRI-type files. So the output files will be:
>>
>> raster.tif,
>> raster.tif.aux.xml
>> raster.tif.ovr
>>
>> I think before writing the file I can do something like this to create
>> the ovr file.
>>
>> rasterio.Env(TIFF_USE_OVR=True)
>>
>> Is there a similar command to create the aux.xml file please?
>>
>> Thanks, James
>>
>>
>>
>
>
> --
> Sean Gillies
>
|
|
Re: Writing ESRI sidecar files
Thanks for the tip Sean, but that doesn't work for me. with rasterio.Env(TIFF_USE_OVR=True, GDAL_PAM_ENABLED=YES, ESRI_XML_PAM=YES): ----- name 'YES' is not defined If I change 'YES' to True then the code runs .... but I'm not sure it's recognizing the commands as the aux.xml file isn't created (but the .ovr is). Anyone got a good idea? On Thu, 27 Jun 2019 at 18:53, Sean Gillies via Groups.Io <sean=mapbox.com@groups.io> wrote: Hi James,
I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_PAM_ENABLED https://gdal.org/drivers/raster/gtiff.html#configuration-options
On Thu, Jun 27, 2019 at 11:11 AM James David Smith <james.david.smith@gmail.com> wrote:
Hello,
I'm trying to use rasterio to manipulate a file and then write it out, along with accompanying ESRI-type files. So the output files will be:
raster.tif, raster.tif.aux.xml raster.tif.ovr
I think before writing the file I can do something like this to create the ovr file.
rasterio.Env(TIFF_USE_OVR=True)
Is there a similar command to create the aux.xml file please?
Thanks, James
-- Sean Gillies
|
|
Re: Writing ESRI sidecar files
Hi James,
I have tried this myself, but I think you'll want to add both GDAL_PAM_ENABLED=YES and ESRI_XML_PAM=YES to the Env.
toggle quoted messageShow quoted text
Hello,
I'm trying to use rasterio to manipulate a file and then write it out,
along with accompanying ESRI-type files. So the output files will be:
raster.tif,
raster.tif.aux.xml
raster.tif.ovr
I think before writing the file I can do something like this to create
the ovr file.
rasterio.Env(TIFF_USE_OVR=True)
Is there a similar command to create the aux.xml file please?
Thanks, James
|
|
Writing ESRI sidecar files
Hello,
I'm trying to use rasterio to manipulate a file and then write it out, along with accompanying ESRI-type files. So the output files will be:
raster.tif, raster.tif.aux.xml raster.tif.ovr
I think before writing the file I can do something like this to create the ovr file.
rasterio.Env(TIFF_USE_OVR=True)
Is there a similar command to create the aux.xml file please?
Thanks, James
|
|
Writing raster to GeoPackage layer
I'm trying to add a raster layer to an existing GPKG with rasterio:
properties = { 'driver':'GPKG', 'width':array.shape[1], 'height':array.shape[0], 'count':1, 'crs':crs, 'transform':transform, 'dtype':np.float32, 'nodata':-9999, 'layer':'rasterlayer'} with rasterio.open('testing.gpkg', 'r+', **properties) as dst_dataset: dst_dataset.write(array)
However I get the following error:
Traceback (most recent call last): File "rasterio/_base.pyx", line 76, in rasterio._base.get_dataset_driver File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: 'testing.gpkg' not recognized as a supported file format.
I can do this with "gdal_translate" so I at least know the GPKG and rasters are all properly formatted:
gdal_translate -of GPKG raster.tiff testing.gpkg -co APPEND_SUBDATASET=YES -co RASTER_TABLE=rasterlayer
Side Question: Is there a way to format code in groups.io? I can't seem to find anything on the edit toolbar :-/
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
Dion, Jonas:
There is a new report in the FIona tracker that may be related: https://github.com/Toblerity/Fiona/issues/761. There I advised turning on CPL_CURL_VERBOSE=YES to see more details about the HTTP requests that are failing. It will cause the volume of your logs to explode and it may be a challenge to find the signal in the noise, but I believe we will find some clues in the HTTP responses.
toggle quoted messageShow quoted text
On Wed, Jun 12, 2019 at 1:55 AM Jonas < j08lue@...> wrote: OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/
|
|
Re: problem with window sizes when parallelizing a funciton
Dear Dior, Thanks!, this was perfect. I did't get it the first time, but know it runs smoothly. Many thanks! Javier
toggle quoted messageShow quoted text
As I mentioned, the canonical way to do this is to leverage
functools.partial.
For example:
from functools import partial
from concurrent.futures import ProcessPoolExecutor
def worker_function(x, arg1, arg2):
...
if __name__ == '__main__':
do_work = partial(worker_function, arg1=0, arg2='foo')
with ProcessPoolExecutor(4) as executor:
executor.map(do_work, [1, 2, 3])
The arguments have to be picklable, I think, but they should be
in your case.
On 13/06/2019 16.29, javier lopatin via
Groups.Io wrote:
Hi Dion, Thanks for the tip. I thought that that
might be the problem. Any ideas of how to pass a function to executor.map
that need several arguments? so far I'm only able to pass
functions with a dstack as only argument, like:
def
main(x):
...
return something
executor.map(main, x)
so if I put the function main outside in the global scope I
now need to have several args in the function:
def
main(x, arg1, arg2, arg3, ...):
...
return something
I have no idea how to pass these args to executor.map().
Any ideas?
I really appreciate this!
Cheers,
Javier
Hey Javier,
this is not really a rasterio problem, mostly a Python
peculiarity.
In general, you cannot pass functions defined in a local
closure to a different process. Just move your main
function to global scope and you should be fine. You can
use functools.partial to fix some optional arguments of
your main function and achieve the same thing as with your
closure.
Best,
Dion
On
13/06/2019 15.05, javier lopatin via Groups.Io wrote:
Hi Guys!
I'm still working with the parallelization of
raster processes with rasterio, and I came across a
problem that you might know about. I've some raster
stacks of time series data from which I want to
obtain a interpolation signature. In practice I've a
function to process the raster chunks in parallel
with rasterio (calling a function main()):
def
_parallel_process(inData, outData, main, count,
n_jobs=4, chuckSize=256):
with rasterio.Env():
with rasterio.open(inData) as src:
profile = src.profile
profile.update(blockxsize=chuckSize,
blockysize=chuckSize,
count=count,
dtype='float64', tiled=True)
with rasterio.open(outData, "w",
**profile) as dst:
windows = [window for ij, window in
dst.block_windows()]
data_gen = (src.read(window=window)
for window in windows)
with
concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs
) as executor:
for window, result in zip(
tqdm(windows),
executor.map(main, data_gen)
):
dst.write(result,
window=window)
And I have another function that calls
_parallel_process:
def
PhenoShape(inData, outData, dates=None,
nan_replace=None, rollWindow=None, nGS=46,
chuckSize=256, n_jobs=4):
def main(dstack):
# load raster as a xarray
xarray = xr.DataArray(dstack)
xarray.coords['dim_0'] = dates
xarray.coords['doy'] =
xarray.dim_0.dt.dayofyear
# sort basds according to day-of-the-year
xarray = xarray.sortby('doy')
# rearrange time dimension for smoothing and
interpolation
xarray['time'] = xarray['doy']
# turn a value to NaN
if nan_replace is not None:
xarray = xarray.where(xarray.values !=
nan_replace)
# rolling average using moving window
if rollWindow is not None:
xarray = xarray.rolling(dim_0=rollWindow,
center=True).mean()
# prepare inputs to getPheno
x = xarray.doy.values
y = xarray.values
# get phenology shape accross the time axis
return np.apply_along_axis(_getPheno, 0, y, x,
nGS)
# apply PhenoShape with parallel processing
try:
_parallel_process(inData, outData, main,
nGS, n_jobs, chuckSize)
except AttributeError:
print('ERROR somehow.. :(')
This works really fine! But when I import the
script as a library (functions inside phenology.py):
import phenology as
phen
phen.PhenoShape(inData=inData, outData=outData,
dates=dates, rollWindow=3,
nan_replace=-32767, nGS=46, chuckSize=256,
n_jobs=4)
I get the following error:
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Any ideas? The main idea here is to call this
functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.
El jue., 23 may. 2019
a las 11:13, javier lopatin via Groups.Io
(<javierlopatin= gmail.com@groups.io>)
escribió:
Hi Sean, thanks a lot. This was very
helpfull indeed, and the performance improved a bit
when changing the size to 512.
Grate job on the library, congrats!
Cheers,
Javier
El mié., 22 may.
2019 a las 23:36, Alan Snow (< alansnow21@...>)
escribió:
You are indeed
correct Sean. Thanks for catching that and
providing the correct answer! It seems I skimmed
through the original message too fast.
--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...
--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...
--
Javier
Lopatin
Institute
of Geography and Geoecology
Karlsruhe
Institute of Technology (KIT)
javier.lopatin@...
-- Javier LopatinInstitute of Geography and GeoecologyKarlsruhe Institute of Technology (KIT)javier.lopatin@...
|
|
Re: problem with window sizes when parallelizing a funciton
As I mentioned, the canonical way to do this is to leverage
functools.partial.
For example:
from functools import partial
from concurrent.futures import ProcessPoolExecutor
def worker_function(x, arg1, arg2):
...
if __name__ == '__main__':
do_work = partial(worker_function, arg1=0, arg2='foo')
with ProcessPoolExecutor(4) as executor:
executor.map(do_work, [1, 2, 3])
The arguments have to be picklable, I think, but they should be
in your case.
toggle quoted messageShow quoted text
On 13/06/2019 16.29, javier lopatin via
Groups.Io wrote:
Hi Dion, Thanks for the tip. I thought that that
might be the problem. Any ideas of how to pass a function to executor.map
that need several arguments? so far I'm only able to pass
functions with a dstack as only argument, like:
def
main(x):
...
return something
executor.map(main, x)
so if I put the function main outside in the global scope I
now need to have several args in the function:
def
main(x, arg1, arg2, arg3, ...):
...
return something
I have no idea how to pass these args to executor.map().
Any ideas?
I really appreciate this!
Cheers,
Javier
Hey Javier,
this is not really a rasterio problem, mostly a Python
peculiarity.
In general, you cannot pass functions defined in a local
closure to a different process. Just move your main
function to global scope and you should be fine. You can
use functools.partial to fix some optional arguments of
your main function and achieve the same thing as with your
closure.
Best,
Dion
On
13/06/2019 15.05, javier lopatin via Groups.Io wrote:
Hi Guys!
I'm still working with the parallelization of
raster processes with rasterio, and I came across a
problem that you might know about. I've some raster
stacks of time series data from which I want to
obtain a interpolation signature. In practice I've a
function to process the raster chunks in parallel
with rasterio (calling a function main()):
def
_parallel_process(inData, outData, main, count,
n_jobs=4, chuckSize=256):
with rasterio.Env():
with rasterio.open(inData) as src:
profile = src.profile
profile.update(blockxsize=chuckSize,
blockysize=chuckSize,
count=count,
dtype='float64', tiled=True)
with rasterio.open(outData, "w",
**profile) as dst:
windows = [window for ij, window in
dst.block_windows()]
data_gen = (src.read(window=window)
for window in windows)
with
concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs
) as executor:
for window, result in zip(
tqdm(windows),
executor.map(main, data_gen)
):
dst.write(result,
window=window)
And I have another function that calls
_parallel_process:
def
PhenoShape(inData, outData, dates=None,
nan_replace=None, rollWindow=None, nGS=46,
chuckSize=256, n_jobs=4):
def main(dstack):
# load raster as a xarray
xarray = xr.DataArray(dstack)
xarray.coords['dim_0'] = dates
xarray.coords['doy'] =
xarray.dim_0.dt.dayofyear
# sort basds according to day-of-the-year
xarray = xarray.sortby('doy')
# rearrange time dimension for smoothing and
interpolation
xarray['time'] = xarray['doy']
# turn a value to NaN
if nan_replace is not None:
xarray = xarray.where(xarray.values !=
nan_replace)
# rolling average using moving window
if rollWindow is not None:
xarray = xarray.rolling(dim_0=rollWindow,
center=True).mean()
# prepare inputs to getPheno
x = xarray.doy.values
y = xarray.values
# get phenology shape accross the time axis
return np.apply_along_axis(_getPheno, 0, y, x,
nGS)
# apply PhenoShape with parallel processing
try:
_parallel_process(inData, outData, main,
nGS, n_jobs, chuckSize)
except AttributeError:
print('ERROR somehow.. :(')
This works really fine! But when I import the
script as a library (functions inside phenology.py):
import phenology as
phen
phen.PhenoShape(inData=inData, outData=outData,
dates=dates, rollWindow=3,
nan_replace=-32767, nGS=46, chuckSize=256,
n_jobs=4)
I get the following error:
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Any ideas? The main idea here is to call this
functions to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.
El jue., 23 may. 2019
a las 11:13, javier lopatin via Groups.Io
(<javierlopatin= gmail.com@groups.io>)
escribió:
Hi Sean, thanks a lot. This was very
helpfull indeed, and the performance improved a bit
when changing the size to 512.
Grate job on the library, congrats!
Cheers,
Javier
El mié., 22 may.
2019 a las 23:36, Alan Snow (< alansnow21@...>)
escribió:
You are indeed
correct Sean. Thanks for catching that and
providing the correct answer! It seems I skimmed
through the original message too fast.
--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...
--
Javier Lopatin
Institute of Geography and Geoecology
Karlsruhe Institute of Technology (KIT)
javier.lopatin@...
--
Javier
Lopatin
Institute
of Geography and Geoecology
Karlsruhe
Institute of Technology (KIT)
javier.lopatin@...
|
|
Re: problem with window sizes when parallelizing a funciton
Hi Dion, Thanks for the tip. I thought that that might be the problem. Any ideas of how to pass a function to executor.map that need several arguments? so far I'm only able to pass functions with a dstack as only argument, like:
def main(x): ... return something executor.map(main, x)
so if I put the function main outside in the global scope I now need to have several args in the function:
def main(x, arg1, arg2, arg3, ...): ... return something I have no idea how to pass these args to executor.map(). Any ideas? I really appreciate this! Cheers, Javier
Hey Javier,
this is not really a rasterio problem, mostly a Python
peculiarity.
In general, you cannot pass functions defined in a local closure
to a different process. Just move your main function to global
scope and you should be fine. You can use functools.partial to fix
some optional arguments of your main function and achieve the same
thing as with your closure.
Best,
Dion
On 13/06/2019 15.05, javier lopatin via
Groups.Io wrote:
Hi Guys!
I'm still working with the parallelization of raster
processes with rasterio, and I came across a problem that
you might know about. I've some raster stacks of time series
data from which I want to obtain a interpolation signature.
In practice I've a function to process the raster chunks in
parallel with rasterio (calling a function main()):
def
_parallel_process(inData, outData, main, count, n_jobs=4,
chuckSize=256):
with rasterio.Env():
with rasterio.open(inData) as src:
profile = src.profile
profile.update(blockxsize=chuckSize,
blockysize=chuckSize,
count=count, dtype='float64',
tiled=True)
with rasterio.open(outData, "w", **profile) as
dst:
windows = [window for ij, window in
dst.block_windows()]
data_gen = (src.read(window=window) for window
in windows)
with concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs
) as executor:
for window, result in zip(
tqdm(windows), executor.map(main,
data_gen)
):
dst.write(result, window=window)
And I have another function that calls
_parallel_process:
def
PhenoShape(inData, outData, dates=None, nan_replace=None,
rollWindow=None, nGS=46, chuckSize=256, n_jobs=4):
def main(dstack):
# load raster as a xarray
xarray = xr.DataArray(dstack)
xarray.coords['dim_0'] = dates
xarray.coords['doy'] = xarray.dim_0.dt.dayofyear
# sort basds according to day-of-the-year
xarray = xarray.sortby('doy')
# rearrange time dimension for smoothing and
interpolation
xarray['time'] = xarray['doy']
# turn a value to NaN
if nan_replace is not None:
xarray = xarray.where(xarray.values !=
nan_replace)
# rolling average using moving window
if rollWindow is not None:
xarray = xarray.rolling(dim_0=rollWindow,
center=True).mean()
# prepare inputs to getPheno
x = xarray.doy.values
y = xarray.values
# get phenology shape accross the time axis
return np.apply_along_axis(_getPheno, 0, y, x, nGS)
# apply PhenoShape with parallel processing
try:
_parallel_process(inData, outData, main, nGS,
n_jobs, chuckSize)
except AttributeError:
print('ERROR somehow.. :(')
This works really fine! But when I import the script as
a library (functions inside phenology.py):
import
phenology as phen
phen.PhenoShape(inData=inData, outData=outData, dates=dates,
rollWindow=3,
nan_replace=-32767, nGS=46, chuckSize=256, n_jobs=4)
I get the following error:
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Traceback (most recent call last):
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/queues.py",
line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File
"/home/javier/miniconda3/lib/python3.6/multiprocessing/reduction.py",
line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object
'PhenoShape.<locals>.main'
Any ideas? The main idea here is to call this functions
to have cleaner script at the end.
Many thanks in advance.
Cheers,
Javier L.
El jue., 23 may. 2019 a las
11:13, javier lopatin via Groups.Io (<javierlopatin= gmail.com@groups.io>)
escribió:
Hi Sean, thanks a lot. This was very helpfull
indeed, and the performance improved a bit when changing the
size to 512.
Grate job on the library, congrats!
Cheers,
Javier
El mié., 22 may. 2019 a
las 23:36, Alan Snow (< alansnow21@...>)
escribió:
You are indeed correct
Sean. Thanks for catching that and providing the correct
answer! It seems I skimmed through the original message
too fast.
--
Javier
Lopatin
Institute
of Geography and Geoecology
Karlsruhe
Institute of Technology (KIT)
javier.lopatin@...
--
Javier
Lopatin
Institute
of Geography and Geoecology
Karlsruhe
Institute of Technology (KIT)
javier.lopatin@...
-- Javier LopatinInstitute of Geography and GeoecologyKarlsruhe Institute of Technology (KIT)javier.lopatin@...
|
|