tempfile.NamedTemporaryFile behaving as /vsimem and eating all the machine memory


vincent.sarago@...
 

While working on https://github.com/cogeotiff/rio-cogeo/pull/75 we noticed strange behaviors with `vsimem` driver (this could be a GDAL but TBH). 

1. When using `
tempfile.NamedTemporaryFile()` Rasterio uses `vsimem` driver

with tempfile.NamedTemporaryFile() as tmpfile:

   print(tmpfile)

   with rasterio.open(tmpfile, "w", **meta) as tmp_dst:

       print(tmp_dst)

<tempfile._TemporaryFileWrapper object at 0x1151bb2e8>

<open DatasetWriter name='/vsimem/d26d4650-3010-4d39-8b0c-3d947b94f1d5.' mode='w+'>

Here I was expecting Rasterio/GDAL to behave as `tempfile` was a regular file.

2. When closing
 a `vsimem`  (`MemoryFile` or `tempfile`) we observe a huge memory surge when working with big images.

code: 
https://github.com/cogeotiff/rio-cogeo/pull/75#issuecomment-482745580



Tested on Mac OS and linux with python 3.7 (gdal 2.4 and 2.3) 

Thanks 


Sean Gillies
 

Hi Vincent,

This is expected (if not well-documented) behavior. tempfile.NamedTemporaryFile() returns an open Python file object, not a filename. GDAL can't use a Python file object, so in that case rasterio.open reads all the bytes from the file object, copies them to the vsimem filesystem, and opens that vsimem file.

I think what you want do do is pass the name of the temp file object to GDAL. Like this:

with tempfile.NamedTemporaryFile() as temp:
    with rasterio.open(temp.name) as dataset:
        print(dataset)

No copy in the vsimem filesystem will be made.

On Tue, Apr 16, 2019 at 6:55 AM <vincent.sarago@...> wrote:
While working on https://github.com/cogeotiff/rio-cogeo/pull/75 we noticed strange behaviors with `vsimem` driver (this could be a GDAL but TBH). 

1. When using `
tempfile.NamedTemporaryFile()` Rasterio uses `vsimem` driver

with tempfile.NamedTemporaryFile() as tmpfile:

   print(tmpfile)

   with rasterio.open(tmpfile, "w", **meta) as tmp_dst:

       print(tmp_dst)

<tempfile._TemporaryFileWrapper object at 0x1151bb2e8>

<open DatasetWriter name='/vsimem/d26d4650-3010-4d39-8b0c-3d947b94f1d5.' mode='w+'>

Here I was expecting Rasterio/GDAL to behave as `tempfile` was a regular file.

2. When closing
 a `vsimem`  (`MemoryFile` or `tempfile`) we observe a huge memory surge when working with big images.

code: 
https://github.com/cogeotiff/rio-cogeo/pull/75#issuecomment-482745580



Tested on Mac OS and linux with python 3.7 (gdal 2.4 and 2.3) 

Thanks 



--
Sean Gillies


vincent.sarago@...
 

Thanks Sean this is really helpful and love the `temp.name` solution. 

About the second point, do you have any idea why `/vsimem` driver need so much memory when exiting/closing ? Should I raise this to the gdal list? 


Sean Gillies
 

Vincent.

At https://github.com/mapbox/rasterio/blob/master/rasterio/__init__.py#L191, a big GeoTIFF is created in RAM. Then at https://github.com/mapbox/rasterio/blob/master/rasterio/__init__.py#L199 that GeoTIFF is read into memory *again* so that it can be written to the Python file object. There will be two copies in memory. It's terribly inefficient, but I don't want to spend the time to optimize this case when I should be documenting the limitations instead.

On Tue, Apr 16, 2019 at 12:47 PM <vincent.sarago@...> wrote:
Thanks Sean this is really helpful and love the `temp.name` solution. 

About the second point, do you have any idea why `/vsimem` driver need so much memory when exiting/closing ? Should I raise this to the gdal list? 



--
Sean Gillies