First of all, I'm not very familiar with rio-tiler. Hopefully, Vincent will help us out.
I'm trying to improve performance of dynamic satellite imagery tiling, using
which combines source Cloud-Optimized GeoTIFFs into a web mercator tile on the
fly. I'm using AWS Landsat and NAIP imagery stored in S3 buckets, and running
code on AWS Lambda in the same region.
Since NAIP imagery doesn't overlap cleanly with web mercator tiles, at zoom 12 I
have to load on average [6 assets to create one mercator
While profiling the AWS Lambda instance using AWS X-Ray, I found that the
biggest bottleneck was the [base
to `WarpedVRT.read()`. That call always takes [between 1.7 and 2.0
for each tile, regardless of the amount of overlap with the mercator tile.
A constant time regardless of the amount of overlap suggests to me that your source files may lack the proper tiling. If the sources are tiled, the number of bytes transferred (and time) would scale roughly with the amount of overlap.
Can you verify that your sources have overviews? If you're accessing 6 sources to fill a web mercator tile, overviews will help dramatically.
When testing tile load times on an EC2 t2.nano in the same region, for the first
tile load, CPU time is 120 ms but wall time is 1.1 seconds. That leads me to
believe that the bottleneck is S3 latency.
If the code running on Lambda shares the same 90% proportion spent on latency
for each asset, that would imply that 9 seconds total are spent waiting on
Using multithreading with a `ThreadPoolExecutor` takes longer than running
single-threaded. Given the situation, it would seem ideal to use `asyncio` for
the COG network requests to improve performance.
I wonder if Vincent can tell us from his experience if there is a risk of overwhelming GDAL's raster block cache on Lambda when making many current reads? I've seen programs appear to hang when the cache is too small.
Has this been attempted ever with Rasterio? I saw a [Rasterio example of using
to improve performance on a CPU bound function, and plan to try that out, but
I'm pessimistic about that approach directly because I'd think that the `async`
calls would need to be applied on the core fetch calls directly.
That asyncio example is dated and could be hard to generalize to your problem. I'd love to see a good working example.
You're right that there's only so much we can do in Python about maximizing this conconcurrency. At some level, it's code in GDAL that is making the HTTP requests for parts of the COGs and using a strategy that we can't entirely control from Python.