Asyncio + Rasterio for slow network requests?
kylebarron2@...
I'm trying to improve performance of dynamic satellite imagery tiling, using
[`cogeo-mosaic-tiler`](https://github.com/developmentseed/cogeo-mosaic-tiler)/[`rio-tiler`](https://github.com/cogeotiff/rio-tiler),
which combines source Cloud-Optimized GeoTIFFs into a web mercator tile on the
fly. I'm using AWS Landsat and NAIP imagery stored in S3 buckets, and running
code on AWS Lambda in the same region.
Since NAIP imagery doesn't overlap cleanly with web mercator tiles, at zoom 12 I
have to load on average [6 assets to create one mercator
tile](https://user-images.githubusercontent.com/15164633/77286861-cfc7df00-6c99-11ea-84e9-8ed584b030c0.png).
While profiling the AWS Lambda instance using AWS X-Ray, I found that the
biggest bottleneck was the [base
call](https://github.com/cogeotiff/rio-tiler/blob/6b0d4df0b6aa1454c50312e8d352ed57f0a4e3cb/rio_tiler/utils.py#L449-L455)
to `WarpedVRT.read()`. That call always takes [between 1.7 and 2.0
seconds](https://user-images.githubusercontent.com/15164633/77289999-c5f5aa00-6ca0-11ea-816a-5aaf248a782c.png)
for each tile, regardless of the amount of overlap with the mercator tile.
When testing tile load times on an EC2 t2.nano in the same region, for the first
tile load, CPU time is 120 ms but wall time is 1.1 seconds. That leads me to
believe that the bottleneck is S3 latency.
If the code running on Lambda shares the same 90% proportion spent on latency
for each asset, that would imply that 9 seconds total are spent waiting on
latency.
Using multithreading with a `ThreadPoolExecutor` takes longer than running
single-threaded. Given the situation, it would seem ideal to use `asyncio` for
the COG network requests to improve performance.
Has this been attempted ever with Rasterio? I saw a [Rasterio example of using
async](https://github.com/mapbox/rasterio/blob/master/examples/async-rasterio.py)
to improve performance on a CPU bound function, and plan to try that out, but
I'm pessimistic about that approach directly because I'd think that the `async`
calls would need to be applied on the core fetch calls directly.
Reproduction for tile loading:
```py
import os
from rio_tiler.main import tile
os.environ['CURL_CA_BUNDLE'] = '/etc/ssl/certs/ca-certificates.crt'
os.environ['AWS_REQUEST_PAYER'] ="requester"
address = 's3://naip-visualization/ca/2018/60cm/rgb/34118/m_3411861_ne_11_060_20180723_20190208.tif'
x = 701
y = 1635
z = 12
tilesize = 512
%time data, mask = tile(address, x, y, z, tilesize)
```
```
CPU times: user 119 ms, sys: 20.3 ms, total: 140 ms
Wall time: 1.1 s
``` |
|
Sean Gillies
Hi, First of all, I'm not very familiar with rio-tiler. Hopefully, Vincent will help us out. On Tue, Mar 24, 2020 at 3:36 PM <kylebarron2@...> wrote:
A constant time regardless of the amount of overlap suggests to me that your source files may lack the proper tiling. If the sources are tiled, the number of bytes transferred (and time) would scale roughly with the amount of overlap. Can you verify that your sources have overviews? If you're accessing 6 sources to fill a web mercator tile, overviews will help dramatically.
That asyncio example is dated and could be hard to generalize to your problem. I'd love to see a good working example. You're right that there's only so much we can do in Python about maximizing this conconcurrency. At some level, it's code in GDAL that is making the HTTP requests for parts of the COGs and using a strategy that we can't entirely control from Python. Sean Gillies |
|
Dion Häfner <dion.haefner@...>
Hey Kyle,
toggle quoted message
Show quoted text
maybe I can help out here. - asyncio's run_in_executor does the exact same thing as using a thread pool, it's just a different API. Until both GDAL and rasterio explicitly support this, you cannot use "real" asynchronous (non-blocking) IO. - I can second Sean's comment that multithreading should speed up tile retrieval, and I suspect that something is off with your code and/or your raster. Usually, reading a tile from S3 takes something like 10-100ms if you do it right. - At the moment, GDAL reads are not thread-safe! This leads to seemingly random failing tile reads (we struggled a lot with this in Terracotta). Now we use a process pool that we spawn at server start, which seems to work OK both performance and reliability-wise (https://github.com/DHI-GRAS/terracotta/blob/master/terracotta/drivers/raster_base.py). Best, Dion On 24/03/2020 22.12, kylebarron2 via Groups.Io wrote:
I'm trying to improve performance of dynamic satellite imagery tiling, using --
Dion Häfner PhD Student Niels Bohr Institute Physics of Ice, Climate and Earth University of Copenhagen Tagensvej 16, DK-2200 Copenhagen, DENMARK _.~"~._.~"~._.~"~._.~"~._.~"~._.~"~._.~"~._ |
|
vincent.sarago@...
Hi All,
I'll answer for Kyle but he can jump back if needed. The problem Kyle was facing was due to GDAL3 (running on AWS Lambda, CentOS) being extremely slow for image reprojection. We faced this in https://github.com/RemotePixel/amazonlinux/issues/16 and though it was fixed when we updated sqlite lib (https://github.com/RemotePixel/amazonlinux/pull/17) but while this made things a bit faster, it seems there is still a `huge` difference between gdal2/proj5 and gdal3/proj6. We still went through some testing with async but because kyle uses AWS Lambda and https://github.com/vincentsarago/lambda-proxy which is not async compatible we just switched to gdal2 and to threading. FYI, I've updated another tiling project using async but I need to run benchmarks https://github.com/developmentseed/titiler/blob/master/titiler/api/api_v1/endpoints/tiles.py#L26 Vincent |
|
Sean Gillies
Hi Vincent, On Mon, Mar 30, 2020 at 7:12 AM <vincent.sarago@...> wrote: Hi All, Thanks for the update. This situation points out a downside of using the warped VRT: it abstracts everything (network, reprojection, caching) and makes diagnosing problems difficult. -- Sean Gillies |
|
kylebarron2@...
Sorry for the slow response. As Vincent noted, just moving back to GDAL 2.4 made the process ~8x faster, from 1.7s to read to ~200ms to read each source tile.
> A constant time regardless of the amount of overlap suggests to me that your source files may lack the proper tiling. According to the AWS NAIP docs, the COG sources were created with gdal_translate -b 1 -b 2 -b 3 -of GTiff -co tiled=yes -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=DEFLATE -co PREDICTOR=2 src_dataset dst_dataset gdaladdo -r average -ro src_dataset 2 4 8 16 32 64 gdal_translate -b 1 -b 2 -b 3 -of GTiff -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=JPEG -co JPEG_QUALITY=85 -co PHOTOMETRIC=YCBCR -co COPY_SRC_OVERVIEWS=YES –config GDAL_TIFF_OVR_BLOCKSIZE 512 src_dataset dst_dataset |
|
Sean Gillies
Hi Kyle, Dion: On Mon, Mar 30, 2020 at 1:41 PM <kylebarron2@...> wrote: Sorry for the slow response. As Vincent noted, just moving back to GDAL 2.4 made the process ~8x faster, from 1.7s to read to ~200ms to read each source tile. Thank you for the details.
Dion, can you say a little more about reads not being thread-safe? It's intended that we can call GDAL's RasterIO functions in different threads concurrently as long as we don't share dataset handles between threads. If we observe otherwise, then there is a GDAL bug that we can fix. There is an additional consideration for VRTs explained in https://gdal.org/drivers/raster/vrt.html#multi-threading-issues. If we have multiple VRTs, used in different threads, pointing to the same URLs, we need to take an extra step to prevent GDAL from accidentally sharing those non-VRT dataset handles between the threads. Sean Gillies |
|
Dion Häfner <dion.haefner@...>
Hey Sean,
toggle quoted message
Show quoted text
Sorry, I should have been clearer. As it stands, my statement is false: GDAL is of course designed to be thread-safe, so doing concurrent reads in different threads *should* work. But in our experience, it doesn't, to the point that we have given up on threads entirely. Relevant issues from last year: https://github.com/mapbox/rasterio/issues/1686 https://github.com/OSGeo/gdal/issues/1960 https://github.com/OSGeo/gdal/issues/1244 Even though GDAL#1244 was closed as fixed, we still observed the problem, so I suspect there is another race condition somewhere within GDAL. Anyway, this wasn't meant as a general statement, just a personal word of advice. To me, multiprocessing seems like a saner alternative at the moment, but YMMV. Best, Dion On 30/03/2020 23.38, Sean Gillies via Groups.Io wrote:
Hi Kyle, Dion: --
Dion Häfner PhD Student Niels Bohr Institute Physics of Ice, Climate and Earth University of Copenhagen Tagensvej 16, DK-2200 Copenhagen, DENMARK _.~"~._.~"~._.~"~._.~"~._.~"~._.~"~._.~"~._ |
|