Re: Asyncio + Rasterio for slow network requests?


kylebarron2@...
 

Sorry for the slow response. As Vincent noted, just moving back to GDAL 2.4 made the process ~8x faster, from 1.7s to read to ~200ms to read each source tile.

> A constant time regardless of the amount of overlap suggests to me that your source files may lack the proper tiling.

According to the AWS NAIP docs, the COG sources were created with

gdal_translate -b 1 -b 2 -b 3 -of GTiff -co tiled=yes -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=DEFLATE -co PREDICTOR=2 src_dataset dst_dataset

gdaladdo -r average -ro src_dataset 2 4 8 16 32 64

gdal_translate -b 1 -b 2 -b 3 -of GTiff -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=JPEG -co JPEG_QUALITY=85 -co PHOTOMETRIC=YCBCR -co COPY_SRC_OVERVIEWS=YES –config GDAL_TIFF_OVR_BLOCKSIZE 512 src_dataset dst_dataset

asyncio's run_in_executor does the exact same thing as using a thread pool

That makes sense, and I ultimately expected to not be able to make progress since it's GDAL making the low level requests.

> Usually, reading a tile from S3 takes something like 10-100ms if you do it right.

Moving back to GDAL 2.4 got around these speeds.

At the moment, GDAL reads are not thread-safe!

That's really great to keep in mind! Means I'll probably shy away from attempting concurrency with GDAL in general.

Join main@rasterio.groups.io to automatically receive all group messages.