Occasional "not recognized as a supported file format" errors when reading from S3
Dion Häfner <dion.haefner@...>
Dear rasterio group,
(I initially posted this at https://github.com/mapbox/rasterio/issues/1686) Lately, we have encountered a strange bug in Terracotta. It basically always leads to errors like these: (from DHI-GRAS/terracotta#139)
or (from DHI-GRAS/terracotta#10 (comment))
The errors occur on different versions of rasterio, although anecdotally it wasn't a problem pre-1.0.15. It also seems to occur both during The problem is that we have only observed it with huge raster files, and we haven't been able to reproduce this reliably, or in a way where I could share it with you. Does anyone have any intuition why this might be happening / what we could look at to debug this?
Cheers,
|
|
vincent.sarago@...
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ?
|
|
Dion Häfner <dion.haefner@...>
We've only tried rasterio wheels. On 23/05/2019 21.23, vincent.sarago via
Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ?
|
|
Sean Gillies
Hi Dion, That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem. First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all? Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs. Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated. If the error occurs every time
On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:
--
Sean Gillies
|
|
Dion Häfner <dion.haefner@...>
Hey Sean, thank you very much for the detailed assessment. I think this already sheds some light on some of the problems. To answer your questions: - The /vsis3/ identifiers are correct and work most of the time, we only see these errors from time to time. - This is just a default read from a private S3 bucket. I don't know which protocol boto (or GDAL?) uses under the hood. - In the first case, there is indeed multithreading involved, so this is a hot lead. I will have a look at the documentation as soon as I can find it somewhere :) It working most of the time would of course suggest some sort of race condition. In the second case, the setup is actually more complicated. We experimented with "cloud-optimized" VRT files, that we created by 1. gdalbuildvrt on many small-ish COG 2. gdaladdo on the resulting VRT into an external .vrt.ovr file We could then serve these COG mosaics up as a single dataset via Terracotta, but apparently the read from the VRT would fail when we zoomed in too far. Thank you for your time! Dion
On 24/05/2019 20.49, Sean Gillies via
Groups.Io wrote:
|
|
Sean Gillies
I found the VRT multi-threading guidance at https://gdal.org/drivers/raster/vrt.html#multi-threading-issues. It moved during the site migration. This smells like a GDAL bug to me, perhaps a manifestation of https://github.com/OSGeo/gdal/issues/1244 or https://github.com/OSGeo/gdal/issues/1031.
On Mon, May 27, 2019 at 2:50 AM Dion Häfner <dion.haefner@...> wrote:
-- Sean Gillies
|
|
Here is some more insight on the issue. Remember, we are seeing this when reading small chunks from a cloud-optimized GeoTIFF on S3 with our Terracotta tile server.
On our latest encounters of the issue, we are getting it consistently on the same map tile but not on adjacent tiles that are reading the exact same GeoTIFF. So this challenges the assumption that this is a race condition, does it not? Btw, we also tried the solution from https://github.com/OSGeo/gdal/issues/1244#issuecomment-487164897 ( CPL_VSIL_CURL_NON_CACHED="/vsis3/" ) to no avail.
|
|
OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/
|
|
Sean Gillies
Dion, Jonas: There is a new report in the FIona tracker that may be related: https://github.com/Toblerity/Fiona/issues/761. There I advised turning on CPL_CURL_VERBOSE=YES to see more details about the HTTP requests that are failing. It will cause the volume of your logs to explode and it may be a challenge to find the signal in the noise, but I believe we will find some clues in the HTTP responses.
On Wed, Jun 12, 2019 at 1:55 AM Jonas <j08lue@...> wrote: OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/ -- Sean Gillies
|
|
Dion Häfner <dion.haefner@...>
Hey Sean,
toggle quoted messageShow quoted text
I don't know if you've seen our update on the rasterio issue tracker: https://github.com/mapbox/rasterio/issues/1686#issuecomment-541737735 As it turns out, `GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR` was the culprit. Do you know if that is related to a known bug in GDAL, or whether this is something that should be reported separately? For now, we have disabled the flag in Terracotta, and everything seems to work fine, but we did take a performance hit from that. So in the long run, it would be great if that could be fixed. Best, Dion
On 15/06/2019 16.51, Sean Gillies via Groups.Io wrote:
Dion, Jonas:
|
|