Re: MemoryFile loses Profile information


Guillaume Lostis
 

Hi,

I guess the B01.jp2 file mentioned is a Sentinel-2 band, which is standalone, there is no side-car file attached to it.

I have tried downloading a single B01.jp2 file from a SAFE and running Vincent Sarago's snippet on it and I get the same result, the CRS and transform are lost when the jp2 is written through a MemoryFile.

Guillaume Lostis 

On Wed, 29 Apr 2020, 00:24 Sean Gillies via groups.io, <sean=mapbox.com@groups.io> wrote:
Hi,

On Tue, Apr 28, 2020 at 8:02 AM <ciaran.evans@...> wrote:

I tried this with GDAL 2.4.2 and GDAL 3.0 too

If anyone can point me to where I might look further to diagnose this, I can create an issue with some useful info, whether that's in Rasterio/GDAL :)


To confirm: you're right, there's no need to seek after writing using mem.write() because the dataset API doesn't change the MemoryFile's stream position. It remains at 0, the beginning of the file.

Is it possible that your B01.jp2 has auxiliary files? If so, they can be lost because MemoryFile.read() only returns bytes from the primary file and will overlook auxiliaries. For example, see the code below using rasterio's test data.

$ rio insp tests/data/RGB.byte.tif
>>> from rasterio.io import MemoryFile
>>> profile = src.profile
>>> del profile["tiled"]
>>> del profile["interleave"]
>>> profile["driver"] = "PNG"
>>> with MemoryFile(filename="lolwut.png") as memfile:
...     with memfile.open(**profile) as dataset_1:
...         dataset_1.write(src.read())
...         print(dataset_1.files)
...         print(dataset_1.profile)
...     with memfile.open() as dataset_2:
...         print(dataset_2.files)
...         print(dataset_2.profile)
...     with open("/tmp/lolwut.png", "wb") as f:
...         f.write(memfile.read())
...     with rasterio.open("/tmp/lolwut.png") as dataset_3:
...         print(dataset_3.files)
...         print(dataset_3.profile)
...
[]
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0), 'tiled': False}
['/vsimem/lolwut.png', '/vsimem/lolwut.png.aux.xml']
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': CRS.from_epsg(32618), 'transform': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0), 'tiled': False, 'interleave': 'pixel'}
775558
['/tmp/lolwut.png']
{'driver': 'PNG', 'dtype': 'uint8', 'nodata': 0.0, 'width': 791, 'height': 718, 'count': 3, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0), 'tiled': False, 'interleave': 'pixel'}

dataset_1.files is an empty list because no files are written to the /vsimem virtual filesystem until dataset_1 is closed.

dataset_2.files show two files: the primary /vsimem/lolwut.png file and its auxiliary  '/vsimem/lolwut.png.aux.xml' file, containing the georeferencing information.

dataset_3.files shows only one file, the auxiliary file has been lost by memfile.read(). MemoryFile isn't any good for multiple-file formats or multiple-file format variants. It's fine for a profile of GeoTIFFs that keep their georeferencing and masks and overviews in a single file.

--
Sean Gillies

Join main@rasterio.groups.io to automatically receive all group messages.