Re: Occasional "not recognized as a supported file format" errors when reading from S3
OK, sorry, forget what I said - the same map tile that earlier was failing is now served fine. :/
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
Here is some more insight on the issue. Remember, we are seeing this when reading small chunks from a cloud-optimized GeoTIFF on S3 with our Terracotta tile server.
On our latest encounters of the issue, we are getting it consistently on the same map tile but not on adjacent tiles that are reading the exact same GeoTIFF. So this challenges the assumption that this is a race condition, does it not? Btw, we also tried the solution from https://github.com/OSGeo/gdal/issues/1244#issuecomment-487164897 ( CPL_VSIL_CURL_NON_CACHED="/vsis3/" ) to no avail.
|
|
Rasterio 1.0.24
Sean Gillies
Hi all, Rasterio 1.0.24 is on PyPI now: https://pypi.org/project/rasterio/1.0.24/#files. This release fixes a pretty major bug that potentially affected multi-threaded programs or programs that reopened datasets while changing opening options like overview level (see issue #1504). Upgrade when you can. The only other change to note is that the wheels on PyPI now include GDAL 2.4.1. Sean Gillies
|
|
Re: gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)
Erick Palacios Moreno
Hi Sean,
toggle quoted messageShow quoted text
Thank you for your answer. Even answered that LUSTRE is causing this behaviour and my best option is building GDAL and libnetcdf against a libhdf5 version that plays well with LUSTRE. In ubuntu xenial we didn't have this problems so we will discuss if it's an option using a different version of libhdf5. Thank you, Erick ----- Mensaje original ----- De: "Sean Gillies" <sean.gillies@gmail.com> Para: main@rasterio.groups.io Enviados: Miércoles, 5 de Junio 2019 13:25:55 Asunto: Re: [rasterio] gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic) Hi Erick, I'm not familiar with Lustre and only slightly familiar with the details of GDAL's netCDF driver. I think, since the problem manifests with gdalinfo as well as rasterio programs, that the best source of help will be the gdal-dev list: https://lists.osgeo.org/mailman/listinfo/gdal-dev/. There have been recent discussions related to HDF5 and netCDF4 there and GDAL's developer, Even Rouault, will probably have some insights. I hate to redirect you to another email list, but gdal-dev seems to be the best place to ask in this case. When you get help there, I'll make sure to follow up here.
On Wed, Jun 5, 2019 at 9:22 AM <epalacios@conabio.gob.mx> wrote:
Hi, --
Sean Gillies
|
|
Re: gdalinfo's and rasterio's reading problem in LUSTRE FS with NetCDF file (ubuntu:bionic)
Sean Gillies
Hi Erick, I'm not familiar with Lustre and only slightly familiar with the details of GDAL's netCDF driver. I think, since the problem manifests with gdalinfo as well as rasterio programs, that the best source of help will be the gdal-dev list: https://lists.osgeo.org/mailman/listinfo/gdal-dev/. There have been recent discussions related to HDF5 and netCDF4 there and GDAL's developer, Even Rouault, will probably have some insights. I hate to redirect you to another email list, but gdal-dev seems to be the best place to ask in this case. When you get help there, I'll make sure to follow up here.
On Wed, Jun 5, 2019 at 9:22 AM <epalacios@...> wrote: Hi, -- Sean Gillies
|
|
Re: Clipping a raster (grib2), end up with zeros on left edge when cropping with mask
Shane Mill - NOAA Affiliate
Hey Sean, As always, thanks for the feedback. So to follow up, it turns out that I had to add "filled=False". In my particular situation, the user drags a bounding box over the original raster and is returned with the subsetted raster. With 'filled=False' in out_image, out_transform = mask(src, geoms, filled=False, crop=True), i get the desired result. Thanks! Shane Mill
On Fri, May 24, 2019 at 10:00 AM Sean Gillies <sean.gillies@...> wrote:
--
Shane Mill Meteorological Application Developer, AceInfo Solutions Meteorological Development Laboratory (NWS) Phone: 301-427-9452
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
Sean Gillies
I found the VRT multi-threading guidance at https://gdal.org/drivers/raster/vrt.html#multi-threading-issues. It moved during the site migration. This smells like a GDAL bug to me, perhaps a manifestation of https://github.com/OSGeo/gdal/issues/1244 or https://github.com/OSGeo/gdal/issues/1031.
On Mon, May 27, 2019 at 2:50 AM Dion Häfner <dion.haefner@...> wrote:
-- Sean Gillies
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
Dion Häfner <dion.haefner@...>
Hey Sean, thank you very much for the detailed assessment. I think this already sheds some light on some of the problems. To answer your questions: - The /vsis3/ identifiers are correct and work most of the time, we only see these errors from time to time. - This is just a default read from a private S3 bucket. I don't know which protocol boto (or GDAL?) uses under the hood. - In the first case, there is indeed multithreading involved, so this is a hot lead. I will have a look at the documentation as soon as I can find it somewhere :) It working most of the time would of course suggest some sort of race condition. In the second case, the setup is actually more complicated. We experimented with "cloud-optimized" VRT files, that we created by 1. gdalbuildvrt on many small-ish COG 2. gdaladdo on the resulting VRT into an external .vrt.ovr file We could then serve these COG mosaics up as a single dataset via Terracotta, but apparently the read from the VRT would fail when we zoomed in too far. Thank you for your time! Dion
On 24/05/2019 20.49, Sean Gillies via
Groups.Io wrote:
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
Sean Gillies
Hi Dion, That error comes from these lines in GDALOpenEx when the function fails to return successfully: https://github.com/OSGeo/gdal/blob/fe0b5dd644abb4ac0c9869db28e9cf977181fbce/gdal/gcore/gdaldataset.cpp#L3455. The way the errors are re-raised obscures the problem. First, are the /vsis3/ identifiers for the datasets correct? Is Rasterio mangling them? Can you access the data successfully any time at all? Are you using any advanced HTTP features? HTTP/2? The Rasterio wheels use a slightly old version of curl and that project is ever fixing bugs. Is multithreading involved? I recommend that you consult https://gdal.org/gdal_vrttut.html#gdal_vrttut_mt as soon as it is back online. Multithreaded access to VRTs is complicated. If the error occurs every time
On Fri, May 24, 2019 at 8:13 AM Dion Häfner <dion.haefner@...> wrote:
--
Sean Gillies
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
Dion Häfner <dion.haefner@...>
We've only tried rasterio wheels.
On 23/05/2019 21.23, vincent.sarago via
Groups.Io wrote:
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ?
|
|
Re: Clipping a raster (grib2), end up with zeros on left edge when cropping with mask
Sean Gillies
Hi Shane, It looks like your feature geometries extend beyond the extent of your raster, yes? In that case, crop can create an extra row or column of pixels, which will be empty. This seems like a bug to me, or undefined behavior, at least. I suggest passing crop=False in your situation.
On Thu, May 23, 2019 at 8:52 AM Shane Mill - NOAA Affiliate via Groups.Io <shane.mill=noaa.gov@groups.io> wrote: Hi everyone, -- Sean Gillies
|
|
Re: Occasional "not recognized as a supported file format" errors when reading from S3
vincent.sarago@...
This is not something I've been seen for now. does this happens with Rasterio wheels and rasterio build on gdal source ?
|
|
Occasional "not recognized as a supported file format" errors when reading from S3
Dion Häfner <dion.haefner@...>
Dear rasterio group,
(I initially posted this at https://github.com/mapbox/rasterio/issues/1686) Lately, we have encountered a strange bug in Terracotta. It basically always leads to errors like these: (from DHI-GRAS/terracotta#139)
or (from DHI-GRAS/terracotta#10 (comment))
The errors occur on different versions of rasterio, although anecdotally it wasn't a problem pre-1.0.15. It also seems to occur both during The problem is that we have only observed it with huge raster files, and we haven't been able to reproduce this reliably, or in a way where I could share it with you. Does anyone have any intuition why this might be happening / what we could look at to debug this?
Cheers,
|
|
Clipping a raster (grib2), end up with zeros on left edge when cropping with mask
Shane Mill - NOAA Affiliate
Hi everyone,
I am currently clipping a grib2 file, and am ending up with zeros on the left edge of the raster. I'm not an expert with GIS so maybe this is a common problem but was curious if anyone else has experienced this before? I think the issue is where the mask function is used with the cropping. In the picture below, the blue edge shows the zeros. And for context, here is the driving code that is performing the action:
|
|
Re: problem with window sizes when parallelizing a funciton
javier lopatin
Hi Sean, thanks a lot. This was very helpfull indeed, and the performance improved a bit when changing the size to 512. Grate job on the library, congrats! Cheers, Javier
El mié., 22 may. 2019 a las 23:36, Alan Snow (<alansnow21@...>) escribió: You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast. --
Javier Lopatin Institute of Geography and Geoecology Karlsruhe Institute of Technology (KIT) javier.lopatin@...
|
|
Re: problem with window sizes when parallelizing a funciton
Alan Snow
You are indeed correct Sean. Thanks for catching that and providing the correct answer! It seems I skimmed through the original message too fast.
|
|
Re: problem with window sizes when parallelizing a funciton
Sean Gillies
Alan: the function does indeed write to a new file with blocksizes set appropriately. Javier: the TIFF spec states that tile width (and this goes for height, I presume) must be a multiple of 16. Neither GDAL's GeoTIFF format page nor Rasterio docs make this clear. I'll add a note. I think that if you try 512 instead of 128, you'll have success.
On Wed, May 22, 2019 at 7:29 AM Alan Snow <alansnow21@...> wrote: Hi Javier, -- Sean Gillies
|
|
Re: problem with window sizes when parallelizing a funciton
Alan Snow
Hi Javier,
The blocksize is not a dynamic attribute. It represents the blocksize of the file on disk. So, if you want to change the blocksize, you will need to write it to a new file with a new blocksize. Hopefully this helps, Alan
|
|
problem with window sizes when parallelizing a funciton
javier lopatin
Hi all,
I'm currently using rasterio to process time series analysis of data. Because the raster tiles are too big, I'm trying to implement a windowed function with parallel processing. I tried your tutorial on concurrent processing (https://github.com/mapbox/rasterio/blob/master/docs/topics/concurrency.rst) and it works beautify with my own functions. I'm just wondering why it is only working with window sizes of 128 (profile.update(blockxsize=128,blockysize=128, tiled=True)), just like in the example. If I change these values to e.g. 200 or 500 it does not work anymore. I'm currently trying with a 3,000 X 3,000 raster size. Because loops are slow, I assume that increasing a bit the window size could be helpful. The error message that I receive if I change blockxsize is: Traceback (most recent call last):
File "TSA.py", line 252, in <module>
main(args.inputImage, args.outputImage, args.j)
File "TSA.py", line 225, in main
parallel_process(infile, outfile, n_jobs)
File "TSA.py", line 187, in parallel_process
with rasterio.open(outfile, "w", **profile) as dst:
File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/env.py", line 398, in wrapper
return f(*args, **kwds)
File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/__init__.py", line 226, in open
**kwargs)
File "rasterio/_io.pyx", line 1129, in rasterio._io.DatasetWriterBase.__init__
File "rasterio/_err.pyx", line 194, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: _TIFFVSetField:/home/javier/Documents/SF_delta/Sentinel/TSA/X-004_Y-001/2015-2019_001-365_LEVEL4_TSA_SEN2L_EVI_C0_S0_FAVG_TY_C95T_FBY_TSA.tif: Bad value 500 for "TileWidth" tag
The used function is below (see the whole script at https://github.com/JavierLopatin/Python-Remote-Sensing-Scripts/blob/master/TSA.py): def parallel_process(infile, outfile, n_jobs):
"""
Process infile block-by-block with parallel processing
and write to a new file.
"""
from tqdm import tqdm # progress bar
with rasterio.Env():
with rasterio.open(infile) as src:
# Create a destination dataset based on source params. The
# destination will be tiled, and we'll process the tiles
# concurrently.
profile = src.profile
profile.update(blockxsize=128, blockysize=128,
count=6, dtype='float64', tiled=True)
with rasterio.open(outfile, "w", **profile) as dst:
# Materialize a list of destination block windows
# that we will use in several statements below.
windows = [window for ij, window in dst.block_windows()]
# This generator comprehension gives us raster data
# arrays for each window. Later we will zip a mapping
# of it with the windows list to get (window, result)
# pairs.
data_gen = (src.read(window=window) for window in windows)
with concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs
) as executor:
# We map the TSA() function over the raster
# data generator, zip the resulting iterator with
# the windows list, and as pairs come back we
# write data to the destination dataset.
for window, result in zip(
tqdm(windows), executor.map(TSA, data_gen)
):
dst.write(result, window=window)Hope you guys can help. Cheers, Javier
|
|
Re: Python rasterio for saveing GeoTIFF files and read in ArcGIS or QGIS
Gabriel Cotlier
Dear Sean,
toggle quoted messageShow quoted text
Thank you very much for your help. I appreciate it. Regards Gabriel
On Monday, May 20, 2019, Sean Gillies <sean.gillies@...> wrote:
|
|