problem with window sizes when parallelizing a funciton
javierlopatin@...
Hi all,
I'm currently using rasterio to process time series analysis of data. Because the raster tiles are too big, I'm trying to implement a windowed function with parallel processing. I tried your tutorial on concurrent processing (https://github.com/mapbox/rasterio/blob/master/docs/topics/concurrency.rst) and it works beautify with my own functions. I'm just wondering why it is only working with window sizes of 128 (profile.update(blockxsize=128,blockysize=128, tiled=True)), just like in the example. If I change these values to e.g. 200 or 500 it does not work anymore. I'm currently trying with a 3,000 X 3,000 raster size. Because loops are slow, I assume that increasing a bit the window size could be helpful. The error message that I receive if I change blockxsize is: Traceback (most recent call last):
File "TSA.py", line 252, in <module>
main(args.inputImage, args.outputImage, args.j)
File "TSA.py", line 225, in main
parallel_process(infile, outfile, n_jobs)
File "TSA.py", line 187, in parallel_process
with rasterio.open(outfile, "w", **profile) as dst:
File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/env.py", line 398, in wrapper
return f(*args, **kwds)
File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/__init__.py", line 226, in open
**kwargs)
File "rasterio/_io.pyx", line 1129, in rasterio._io.DatasetWriterBase.__init__
File "rasterio/_err.pyx", line 194, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: _TIFFVSetField:/home/javier/Documents/SF_delta/Sentinel/TSA/X-004_Y-001/2015-2019_001-365_LEVEL4_TSA_SEN2L_EVI_C0_S0_FAVG_TY_C95T_FBY_TSA.tif: Bad value 500 for "TileWidth" tag
The used function is below (see the whole script at https://github.com/JavierLopatin/Python-Remote-Sensing-Scripts/blob/master/TSA.py): def parallel_process(infile, outfile, n_jobs):
"""
Process infile block-by-block with parallel processing
and write to a new file.
"""
from tqdm import tqdm # progress bar
with rasterio.Env():
with rasterio.open(infile) as src:
# Create a destination dataset based on source params. The
# destination will be tiled, and we'll process the tiles
# concurrently.
profile = src.profile
profile.update(blockxsize=128, blockysize=128,
count=6, dtype='float64', tiled=True)
with rasterio.open(outfile, "w", **profile) as dst:
# Materialize a list of destination block windows
# that we will use in several statements below.
windows = [window for ij, window in dst.block_windows()]
# This generator comprehension gives us raster data
# arrays for each window. Later we will zip a mapping
# of it with the windows list to get (window, result)
# pairs.
data_gen = (src.read(window=window) for window in windows)
with concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs
) as executor:
# We map the TSA() function over the raster
# data generator, zip the resulting iterator with
# the windows list, and as pairs come back we
# write data to the destination dataset.
for window, result in zip(
tqdm(windows), executor.map(TSA, data_gen)
):
dst.write(result, window=window)Hope you guys can help. Cheers, Javier |
|