Reading NetCDF file as inMemoryFile


vincent.sarago@...
 

I'm seeking some advice here, 

I'm trying to reduce the memory/disk usage of one of my script where I try to translate a netcdf file to COG. Ideally I'd love not to save any file to disk.

The problem I'm facing is when opening the file from disk everything works fine and the netcdf is recognise as `netCDF` but when load the same file in memory it is then recognized as a HDF5... 

```
import rasterio
from rasterio.io import MemoryFile
 
src_path = "OR_ABI-L1b-RadF-M6C04_G16_s20193221600287_e20193221609595_c20193221610025.nc"
 
with rasterio.open(src_path) as src_dst:
    print(src_dst.name)
    print(src_dst.meta)
 
OR_ABI-L1b-RadF-M6C04_G16_s20193221600287_e20193221609595_c20193221610025.nc
{'driver': 'netCDF', 'dtype': 'float_', 'nodata': None, 'width': 512, 'height': 512, 'count': 0, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0)}
 
with open(src_path, "rb") as f:
    with rasterio.open(f) as src_dst:
        with src_dst.open() as mem:
            print(mem.name)
            print(mem.meta)
 
/vsimem/8ee5dd37-9f49-47c9-bce7-a6732c72d4b5.nc
{'driver': 'HDF5', 'dtype': 'float_', 'nodata': None, 'width': 512, 'height': 512, 'count': 0, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
       0.0, 1.0, 0.0)}
```


The problem with HDF5 is that it seems to loose any geographical information and thus is not usable for the process I'm doing.

thanks for your help 


Even Rouault
 

On vendredi 20 décembre 2019 05:03:36 CET vincent.sarago@... wrote:
I'm seeking some advice here,

I'm trying to reduce the memory/disk usage of one of my script where I try
to translate a netcdf file to COG. Ideally I'd love not to save any file to
disk.

The problem I'm facing is when opening the file from disk everything works
fine and the netcdf is recognise as `netCDF` but when load the same file in
memory it is then recognized as a HDF5...
RTF(anstatic)M :-)

https://gdal.org/drivers/raster/netcdf.html#vsi-virtual-file-system-api-support

"Since GDAL 2.4, and with Linux kernel >=4.3 and libnetcdf >=4.5, read operations on /vsi file systems are supported."

When building GDAL, you must see "NetCDF has netcdf_mem.h: yes" in the summary output of ./configure

If you don't meet those requirements, /vsimem/ access on netCDF4/HDF5 files will
fallback to the HDF5 driver, which has proper support for pluggable I/O since libhdf5 allows it

The netCDF library has no pluggable I/O layer, hence the GDAL support uses the netCDF
in-memory file API combined with the Linux userfaultfd mechanism to populated the
in-memory mapping with data. That said, for a file hosted in /vsimem/ (to
be opposed to /vsi network file systems), we could probably improve that to avoid
any Linux specificities.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com


vincent.sarago@...
 
Edited

Thanks for your answer Even, and be assured that I always read the manual before asking question :-) 


```
$ more /proc/version
Linux version 4.9.184-linuxkit (root@a8c33e955a82) (gcc version 8.3.0 (Alpine 8.3.0) ) #1 SMP Tue Jul 2 22:58:16 UTC 2019
...
# From gdal /configure
  NetCDF support:            yes
  NetCDF has netcdf_mem.h:   yes
...
# NetCDF install
ENV NETCDF_VERSION=4.6.3

# NetCDF
RUN mkdir /tmp/netcdf \
&& curl -sfL https://github.com/Unidata/netcdf-c/archive/v$NETCDF_VERSION.tar.gz | tar zxf - -C /tmp/netcdf --strip-components=1

RUN cd /tmp/netcdf \
&& CPPFLAGS="-I${PREFIX}/include" LDFLAGS="-L${PREFIX}/lib" \
./configure \
--with-default-chunk-size=67108864 \
--with-chunk-cache-size=67108864 \
--prefix=$PREFIX \
--disable-static \
--enable-netcdf4 \
--enable-dap \
--with-pic \
&& make -j $(nproc) --silent && make install && make clean \
&& rm -rf /tmp/netcdf
```

My configuration meets the current spec from the docs, and still fallback to HDF5


Sean Gillies
 

Vincent,

Can you try naming the driver when you open the in-memory dataset? Something like

with MemoryFile(...) as memfile:
    with memfile.open(driver="netCDF") as dataset:
        ....


On Fri, Dec 20, 2019 at 8:23 AM <vincent.sarago@...> wrote:

[Edited Message Follows]

Thanks for your answer Even, and be assured that I always read the manual before asking question :-) 


```
$ more /proc/version
Linux version 4.9.184-linuxkit (root@a8c33e955a82) (gcc version 8.3.0 (Alpine 8.3.0) ) #1 SMP Tue Jul 2 22:58:16 UTC 2019
...
# From gdal /configure
  NetCDF support:            yes
  NetCDF has netcdf_mem.h:   yes
...
# NetCDF install
ENV NETCDF_VERSION=4.6.3

# NetCDF
RUN mkdir /tmp/netcdf \
&& curl -sfL https://github.com/Unidata/netcdf-c/archive/v$NETCDF_VERSION.tar.gz | tar zxf - -C /tmp/netcdf --strip-components=1

RUN cd /tmp/netcdf \
&& CPPFLAGS="-I${PREFIX}/include" LDFLAGS="-L${PREFIX}/lib" \
./configure \
--with-default-chunk-size=67108864 \
--with-chunk-cache-size=67108864 \
--prefix=$PREFIX \
--disable-static \
--enable-netcdf4 \
--enable-dap \
--with-pic \
&& make -j $(nproc) --silent && make install && make clean \
&& rm -rf /tmp/netcdf
```

My configuration meets the current spec from the docs, and still fallback to HDF5



--
Sean Gillies


Even Rouault
 

On vendredi 20 décembre 2019 07:22:11 CET vincent.sarago@... wrote:

$ more /proc/version
Linux version 4.9.184-linuxkit (root@a8c33e955a82) (gcc version 8.3.0
(Alpine 8.3.0) ) #1 SMP Tue Jul 2 22:58:16 UTC 2019 ...
# From gdal /configure
NetCDF support: yes
NetCDF has netcdf_mem.h: yes
Can you check for the following too ?
userfaultfd support: yes
to check that you actually built against sufficiently recent kernel headers.


--
Spatialys - Geospatial professional services
http://www.spatialys.com


vincent.sarago@...
 

Then I get "not recognized as a supported file format."
```
>>> f = open("/local/OR_ABI-L1b-RadF-M6C04_G16_s20193221600287_e20193221609595_c20193221610025.nc", "rb")
>>> with MemoryFile(f) as mem:
...     with mem.open(driver="netCDF") as mem_dst:
...             print(mem_dst.meta)
...
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 78, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsimem/808d1a21-9154-43a9-a7a8-459294b33fe4.' not recognized as a supported file format.
```

same when using

```
with open("/local/OR_ABI-L1b-RadF-M6C04_G16_s20193221600287_e20193221609595_c20193221610025.nc", "rb") as f:
     with rasterio.open(f, driver="netCDF") as src_dst:

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 78, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsimem/b370432d-9a53-4c40-9315-bc9ad408589d.' not recognized as a supported file format.
```


vincent.sarago@...
 

Yes it is set to yes,

here the full log https://gist.github.com/vincentsarago/36473e6322336e84cf928ef445db64cc


Even Rouault
 

On vendredi 20 décembre 2019 09:45:45 CET vincent.sarago@... wrote:
Yes it is set to yes,
You were responding to "userfaultfd support: yes" ?

Hum, then I'm not sure. Are you running in a container ? Maybe there are
some restrictions by default. Dunno. But you should see GDAL error messages
if userfaulfd system calls fail at runtime. There are quite a lot of them in
https://github.com/OSGeo/gdal/blob/master/gdal/port/cpl_userfaultfd.cpp

And fallback to the magic solution of open source projects (the reason why
we all use open source, right ;-) ?): take your favorite debugger and break at
https://github.com/OSGeo/gdal/blob/master/gdal/frmts/netcdf/netcdfdataset.cpp#L7274
and follow what happens then...
It should normally go to the call to nc_open_mem()

--
Spatialys - Geospatial professional services
http://www.spatialys.com