MemoryFile loses Profile information
ciaran.evans@...
Hi there,
I'm currently re-writing a process that was stacking .jp2 bands into one image and writing the resultant stack to disk. I'm moving this up to AWS so want to use MemoryFile's instead. My code looks something like: s3_r = boto3.resource('s3') If I download that image and open it in QGIS, the image has no geo-coding at all. If I process the same image, but write to disk like: profile = source_image.profile.copy() I get a geo-coded image and it appears where I'd expect. |
|
ciaran.evans@...
On further inspection, when I switched to saving my outputs as GeoTIFF, the resultant images are both geo-coded fine (Disk vs. In Memory to S3). I'm guessing something is going wrong when Any suggestions on next steps?
|
|
ciaran.evans@...
Revised: Don't require the
For reference, doing:
results in a correctly geocoded GeoTiff. Appears it is an issue with |
|
vincent.sarago@...
I can confirm that writing a Jpeg2000 from memory losses the geo information
toggle quoted message
Show quoted text
|
|
ciaran.evans@...
I tried this with GDAL 2.4.2 and GDAL 3.0 too If anyone can point me to where I might look further to diagnose this, I can create an issue with some useful info, whether that's in Rasterio/GDAL :) |
|
Sean Gillies
Hi, On Tue, Apr 28, 2020 at 8:02 AM <ciaran.evans@...> wrote:
To confirm: you're right, there's no need to seek after writing using mem.write() because the dataset API doesn't change the MemoryFile's stream position. It remains at 0, the beginning of the file. Is it possible that your B01.jp2 has auxiliary files? If so, they can be lost because MemoryFile.read() only returns bytes from the primary file and will overlook auxiliaries. For example, see the code below using rasterio's test data. $ rio insp tests/data/RGB.byte.tif >>> from rasterio.io import MemoryFile >>> profile = src.profile >>> del profile["tiled"] >>> del profile["interleave"] >>> profile["driver"] = "PNG" >>> with MemoryFile(filename="lolwut.png") as memfile: ... with memfile.open(**profile) as dataset_1: ... dataset_1.write(src.read()) ... print(dataset_1.files) ... print(dataset_1.profile) ... with memfile.open() as dataset_2: ... print(dataset_2.files) ... print(dataset_2.profile) ... with open("/tmp/lolwut.png", "wb") as f: ... f.write(memfile.read()) ... with rasterio.open("/tmp/lolwut.png") as dataset_3: ... print(dataset_3.files) ... print(dataset_3.profile) ... [] {'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0, 0.0, -300.041782729805, 2826915.0), 'tiled': False} ['/vsimem/lolwut.png', '/vsimem/lolwut.png.aux.xml'] {'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0, 0.0, -300.041782729805, 2826915.0), 'tiled': False, 'interleave': 'pixel'} 775558 ['/tmp/lolwut.png'] {'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0, 0.0, 1.0, 0.0), 'tiled': False, 'interleave': 'pixel'} dataset_1.files is an empty list because no files are written to the /vsimem virtual filesystem until dataset_1 is closed. dataset_2.files show two files: the primary /vsimem/lolwut.png file and its auxiliary '/vsimem/lolwut.png.aux.xml' file, containing the georeferencing information. dataset_3.files shows only one file, the auxiliary file has been lost by memfile.read(). MemoryFile isn't any good for multiple-file formats or multiple-file format variants. It's fine for a profile of GeoTIFFs that keep their georeferencing and masks and overviews in a single file. Sean Gillies |
|
Guillaume Lostis <g.lostis@...>
Hi, I guess the B01.jp2 file mentioned is a Sentinel-2 band, which is standalone, there is no side-car file attached to it. I have tried downloading a single B01.jp2 file from a SAFE and running Vincent Sarago's snippet on it and I get the same result, the CRS and transform are lost when the jp2 is written through a MemoryFile. Guillaume Lostis
|
|
ciaran.evans@...
As with Guillaume's answer, it's a Sentinel 2 band and has no side-car file, when I write straight to a file, the georeferencing is maintained. |
|
Sean Gillies
On Wed, Apr 29, 2020 at 1:38 AM <ciaran.evans@...> wrote:
Since there's no JP2 specific code in MemoryFile and things work with the GeoTIFF driver, I suspect there's a subtle bug involving the vsimem system and one of the JP2 drivers. It would be useful to try to reproduce this with GDAL's Python bindings, but I don't have them installed and won't be able to try that. Sean Gillies |
|
ciaran.evans@...
Would you say it's worth raising an issue on GDAL then? I'm also not familiar with the Python GDAL bindings, Rasterio is my window in that haha. |
|
Guillaume Lostis <g.lostis@...>
Like Ciaran I'm not very familiar with GDAL's python bindings, but I've tried running a little snippet which I believe does the same thing as what was done with
When I look at the (Side note: Interestingly enough, the pixel values of Guillaume |
|
Even Rouault
On mercredi 29 avril 2020 08:40:07 CEST Guillaume Lostis wrote: > Like Ciaran I'm not very familiar with GDAL's python bindings, but I've > tried running a little snippet which I believe does the same thing as what > was done with `rasterio`: it writes a JP2 to a `vsimem` in-memory file, and > then writes the bytes to a file on disk. > > ```python > from osgeo import gdal > > mem, out = "/vsimem/B01.jp2", "new_B01.jp2" > gdal.Translate(mem, "B01.jp2") > > # Taken from > https://lists.osgeo.org/pipermail/gdal-dev/2016-August/045030.html f = > gdal.VSIFOpenL(mem, "rb") > gdal.VSIFSeekL(f, 0, 2) # seek to end > size = gdal.VSIFTellL(f) > gdal.VSIFSeekL(f, 0, 0) # seek to beginning > data = gdal.VSIFReadL(1, size, f) > gdal.Unlink(mem) > > with open(out, "wb") as ff: > ff.write(data) > ``` > > When I look at the `gdalinfo` of `new_B01.jp2`, it has a CRS and a > transform, so I am not able to reproduce the behavior we have with > `rasterio`. > > (Side note: Interestingly enough, the pixel values of `new_B01.jp2` have > slightly changed with respect to the original `B01.jp2` file, so there is > some data loss somewhere in the process. But maybe that is expected of the > JP2 format and could be avoided by passing extra arguments to > `gdal.Translate`?
By default, the JP2OpenJPEG driver uses a lossy compression (my personal take on JPEG2000 is that it has very limited interested when used in its lossless profile. Better use more conventional algorithms that are way faster). See https://gdal.org/drivers/raster/jp2openjpeg.html#lossless-compression for options to pass to get a lossless JPEG2000 as output.
So all in all that should be something like
gdal.Translate(mem, "B01.jp2", format = "JP2OpenJPEG", # in case you have several JP2K drivers available options = ["REVERSIBLE=YES", "QUALITY=100"])
-- Spatialys - Geospatial professional services http://www.spatialys.com |
|
Sean Gillies
Hi, On Wed, Apr 29, 2020 at 9:40 AM Guillaume Lostis <g.lostis@...> wrote:
I've only now remembered (not being a regular jp2 user) that JPEG2000 is a create-copy format (see https://gdal.org/drivers/raster/index.html) and as such is not suited for uses cases like with rasterio.open("file.jp2", "w", driver="JP2OpenJPEG", ...) as dataset: dataset.write(data) Rasterio tries to abstract over the differences between "create" and "create-copy" formats, but might falter in some cases. We're testing JPEG and PNG in the test suite, but not JPEG2000. If you're constructing a dataset from the bands of multiple other datasets, you should probably be using GeoTIFF as a format, it's well suited for this. And then convert to JPEG2000 before uploading to S3 using rasterio.shutil.copy, which calls on GDAL's GDALCreateCopy and is mostly equivalent to gdal_translate. Sean Gillies |
|
ciaran.evans@...
I think the problem is that I'm trying to do the conversion in memory, not on disk. If I've read your latest reply correctly, I'd convert to JPEG2000 on disk and then copy to S3? Can I do the conversion via rasterio.shutil.copy in memory? |
|