problem with window sizes when parallelizing a funciton


javierlopatin@...
 

Hi all,
I'm currently using rasterio to process time series analysis of data. Because the raster tiles are too big, I'm trying to implement a windowed function with parallel processing. I tried your tutorial on concurrent processing (
https://github.com/mapbox/rasterio/blob/master/docs/topics/concurrency.rst) and it works beautify with my own functions. I'm just wondering why it is only working with window sizes of 128 (profile.update(blockxsize=128,blockysize=128tiled=True)​), just like in the example. If I change these values to e.g. 200 or 500 it does not work anymore. I'm currently trying with a 3,000 X 3,000 raster size. Because loops are slow, I assume that increasing a bit the window size could be helpful. The error message that I receive if I change blockxsize is:

Traceback (most recent call last):
  File "TSA.py", line 252, in <module>
    main(args.inputImage, args.outputImage, args.j)
  File "TSA.py", line 225, in main
    parallel_process(infile, outfile, n_jobs)
  File "TSA.py", line 187, in parallel_process
    with rasterio.open(outfile, "w", **profile) as dst:
  File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/env.py", line 398, in wrapper
    return f(*args, **kwds)
  File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/__init__.py", line 226, in open
    **kwargs)
  File "rasterio/_io.pyx", line 1129, in rasterio._io.DatasetWriterBase.__init__
  File "rasterio/_err.pyx", line 194, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: _TIFFVSetField:/home/javier/Documents/SF_delta/Sentinel/TSA/X-004_Y-001/2015-2019_001-365_LEVEL4_TSA_SEN2L_EVI_C0_S0_FAVG_TY_C95T_FBY_TSA.tif: Bad value 500 for "TileWidth" tag
 

The used function is below (see the whole script at 
https://github.com/JavierLopatin/Python-Remote-Sensing-Scripts/blob/master/TSA.py):

def parallel_process(infile, outfile, n_jobs):
    """
    Process infile block-by-block with parallel processing
    and write to a new file.
    """
    from tqdm import tqdm # progress bar
 
    with rasterio.Env():
 
        with rasterio.open(infile) as src:
 
            # Create a destination dataset based on source params. The
            # destination will be tiled, and we'll process the tiles
            # concurrently.
            profile = src.profile
            profile.update(blockxsize=128, blockysize=128,
                           count=6, dtype='float64', tiled=True)
 
            with rasterio.open(outfile, "w", **profile) as dst:
 
                # Materialize a list of destination block windows
                # that we will use in several statements below.
                windows = [window for ij, window in dst.block_windows()]
 
                # This generator comprehension gives us raster data
                # arrays for each window. Later we will zip a mapping
                # of it with the windows list to get (window, result)
                # pairs.
                data_gen = (src.read(window=window) for window in windows)
 
                with concurrent.futures.ProcessPoolExecutor(
                    max_workers=n_jobs
                ) as executor:
 
                    # We map the TSA() function over the raster
                    # data generator, zip the resulting iterator with
                    # the windows list, and as pairs come back we
                    # write data to the destination dataset.
                    for window, result in zip(
                        tqdm(windows), executor.map(TSA, data_gen)
                    ):
                        dst.write(result, window=window)

Hope you guys can help. 

Cheers, Javier

Join {main@rasterio.groups.io to automatically receive all group messages.