Asyncio + Rasterio for slow network requests?

kylebarron2@...
 

I'm trying to improve performance of dynamic satellite imagery tiling, using
[`cogeo-mosaic-tiler`](https://github.com/developmentseed/cogeo-mosaic-tiler)/[`rio-tiler`](https://github.com/cogeotiff/rio-tiler),
which combines source Cloud-Optimized GeoTIFFs into a web mercator tile on the
fly. I'm using AWS Landsat and NAIP imagery stored in S3 buckets, and running
code on AWS Lambda in the same region.
 
Since NAIP imagery doesn't overlap cleanly with web mercator tiles, at zoom 12 I
have to load on average [6 assets to create one mercator
tile](https://user-images.githubusercontent.com/15164633/77286861-cfc7df00-6c99-11ea-84e9-8ed584b030c0.png).
While profiling the AWS Lambda instance using AWS X-Ray, I found that the
biggest bottleneck was the [base
call](https://github.com/cogeotiff/rio-tiler/blob/6b0d4df0b6aa1454c50312e8d352ed57f0a4e3cb/rio_tiler/utils.py#L449-L455)
to `WarpedVRT.read()`. That call always takes [between 1.7 and 2.0
seconds](https://user-images.githubusercontent.com/15164633/77289999-c5f5aa00-6ca0-11ea-816a-5aaf248a782c.png)
for each tile, regardless of the amount of overlap with the mercator tile.
 
When testing tile load times on an EC2 t2.nano in the same region, for the first
tile load, CPU time is 120 ms but wall time is 1.1 seconds. That leads me to
believe that the bottleneck is S3 latency.
 
If the code running on Lambda shares the same 90% proportion spent on latency
for each asset, that would imply that 9 seconds total are spent waiting on
latency.
 
Using multithreading with a `ThreadPoolExecutor` takes longer than running
single-threaded. Given the situation, it would seem ideal to use `asyncio` for
the COG network requests to improve performance.
 
Has this been attempted ever with Rasterio? I saw a [Rasterio example of using
async](https://github.com/mapbox/rasterio/blob/master/examples/async-rasterio.py)
to improve performance on a CPU bound function, and plan to try that out, but
I'm pessimistic about that approach directly because I'd think that the `async`
calls would need to be applied on the core fetch calls directly.
 
 
Reproduction for tile loading:
```py
import os
from rio_tiler.main import tile
os.environ['CURL_CA_BUNDLE'] = '/etc/ssl/certs/ca-certificates.crt'
os.environ['AWS_REQUEST_PAYER'] ="requester"
address = 's3://naip-visualization/ca/2018/60cm/rgb/34118/m_3411861_ne_11_060_20180723_20190208.tif'
x = 701
y = 1635
z = 12
tilesize = 512
%time data, mask = tile(address, x, y, z, tilesize)
```
```
CPU times: user 119 ms, sys: 20.3 ms, total: 140 ms
Wall time: 1.1 s
```

Join main@rasterio.groups.io to automatically receive all group messages.