Topics

issue with opening/closing datasets


Amine Aboufirass <amine.aboufirass@...>
 

Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine


Sean Gillies
 

Hi Amine,

I think you have made in error in pasting code into the GitHub issue. The code you've given will fail at dataset = memfile.open because you haven't assigned memfile yet.

The message you see printed comes straight from the GDAL library. You haven't configured any GDAL error or log message handler and so the messages go directly to your terminal. Message handlers are configured if you run your code within a `with rasterio.Env()` block.

    import rasterio

    with rasterio.Env():
        # your code here

Also if you do

    with memfile.open(...) as dataset:

you won't see this message.


On Fri, Mar 22, 2019 at 9:25 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine



--
Sean Gillies


Amine Aboufirass <amine.aboufirass@...>
 

Hi Sean, the issue is that I am writing functions where the output is often a rasterio dataset. I don't know if this can be accomplished using a with statement:
function open_raster(filename):
    rasterio_dataset_object = rasterio.open(filename)
    return rasterio_dataset_object

function do_stuff_with_raster(rasterio_dataset_object):
    ###do stuff with raster
    return rasterio_dataset_object

dataset = open_raster('raster.tif')
new_raster = do_stuff_with_raster(dataset)
new_raster.close() 

Thanks,
Amine

On Fri, Mar 22, 2019 at 5:25 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

I think you have made in error in pasting code into the GitHub issue. The code you've given will fail at dataset = memfile.open because you haven't assigned memfile yet.

The message you see printed comes straight from the GDAL library. You haven't configured any GDAL error or log message handler and so the messages go directly to your terminal. Message handlers are configured if you run your code within a `with rasterio.Env()` block.

    import rasterio

    with rasterio.Env():
        # your code here

Also if you do

    with memfile.open(...) as dataset:

you won't see this message.


On Fri, Mar 22, 2019 at 9:25 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine



--
Sean Gillies


Sean Gillies
 

Hi Amine,

On Tue, Mar 26, 2019 at 5:10 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi Sean, the issue is that I am writing functions where the output is often a rasterio dataset. I don't know if this can be accomplished using a with statement:
function open_raster(filename):
    rasterio_dataset_object = rasterio.open(filename)
    return rasterio_dataset_object

function do_stuff_with_raster(rasterio_dataset_object):
    ###do stuff with raster
    return rasterio_dataset_object

dataset = open_raster('raster.tif')
new_raster = do_stuff_with_raster(dataset)
new_raster.close() 

Thanks,
Amine

The following will work if you are calling the two functions in the same module.

with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

 
On Fri, Mar 22, 2019 at 5:25 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

I think you have made in error in pasting code into the GitHub issue. The code you've given will fail at dataset = memfile.open because you haven't assigned memfile yet.

The message you see printed comes straight from the GDAL library. You haven't configured any GDAL error or log message handler and so the messages go directly to your terminal. Message handlers are configured if you run your code within a `with rasterio.Env()` block.

    import rasterio

    with rasterio.Env():
        # your code here

Also if you do

    with memfile.open(...) as dataset:

you won't see this message.


On Fri, Mar 22, 2019 at 9:25 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine


_._,_._

--
Sean Gillies


Amine Aboufirass <amine.aboufirass@...>
 

Dear Sean,

I am slightly confused. You state the following:

The following will work if you are calling the two functions in the same module.
with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

I have two issues with the above statement:
  •  Why are you closing the file if it is inside the with block? I thought one of the advantages of using the with  block is that files are closed implicitly?
  • This doesn't completely answer my question, since I would like to return  the file read object and pass it from function to function so that I can pry it open in each function and read/write the necessary values to and from it.
To be more explicit, can I do something like this?:

def open_dataset(filename):
    with rasterio.open(filename) as dataset:
        return dataset

def do_stuff_to dataset(dataset):
    dataset.write()
    return modified_dataset

def get_info_from_dataset(dataset):
    dataset.information
    return information

If so, then what are the disadvantages of using the with block inside a function and returning the object to be used outside? Is this good practice? If not what is the recommended way to write functions which use the rasterio library? This also extends to with blocks containing rasterio.Env(). Should I nest the with statement inside the function as stated above?

I ask because I would like to avoid writing a dataset to physical file until I am done modifying it.  For instance, geopandas uses the GeoDataFrame construct which is stored in memory, and not attached to any physical file. Memory files in rasterio come close, but they are still attached to a temporary file.

It seems that in rasterio defining a raster object must be via 3 disjoint entities (numpy array, affine transform and CRS ). It would be nice to have one object which groups all these entities and is somehow detached from physical/temporary files. A sort of glorified numpy array with metadata (a "GeoNumpy" array), just like geopandas glorifies the pandas Dataframe with metadata.

Of course this is just a naive (but hopefully constructive) suggestion, and perhaps also due to the fact that I do not completely understand how the library works :). 

Regards,

Amine




On Tue, Mar 26, 2019 at 2:56 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

On Tue, Mar 26, 2019 at 5:10 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi Sean, the issue is that I am writing functions where the output is often a rasterio dataset. I don't know if this can be accomplished using a with statement:
function open_raster(filename):
    rasterio_dataset_object = rasterio.open(filename)
    return rasterio_dataset_object

function do_stuff_with_raster(rasterio_dataset_object):
    ###do stuff with raster
    return rasterio_dataset_object

dataset = open_raster('raster.tif')
new_raster = do_stuff_with_raster(dataset)
new_raster.close() 

Thanks,
Amine

The following will work if you are calling the two functions in the same module.

with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

 
On Fri, Mar 22, 2019 at 5:25 PM Sean Gillies <sean.gillies@...> wrote:
Hi Amine,

I think you have made in error in pasting code into the GitHub issue. The code you've given will fail at dataset = memfile.open because you haven't assigned memfile yet.

The message you see printed comes straight from the GDAL library. You haven't configured any GDAL error or log message handler and so the messages go directly to your terminal. Message handlers are configured if you run your code within a `with rasterio.Env()` block.

    import rasterio

    with rasterio.Env():
        # your code here

Also if you do

    with memfile.open(...) as dataset:

you won't see this message.


On Fri, Mar 22, 2019 at 9:25 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Hi All, 

I just listed an issue on the main github log. https://github.com/mapbox/rasterio/issues/1659

If anyone could take a look I would be very grateful.

Kind Regards,

Amine


_._,_._

--
Sean Gillies


Sean Gillies
 

Amine,

On Wed, Mar 27, 2019 at 11:24 AM Amine Aboufirass <amine.aboufirass@...> wrote:
Dear Sean,

I am slightly confused. You state the following:

The following will work if you are calling the two functions in the same module.
with open_raster('raster.tif') as dataset:  # this gives you an implicit Env around the contained statements.
    new_raster = do_stuff_with_raster(dataset)
    new_raster.close()

I have two issues with the above statement:
  •  Why are you closing the file if it is inside the with block? I thought one of the advantages of using the with  block is that files are closed implicitly?
  • This doesn't completely answer my question, since I would like to return  the file read object and pass it from function to function so that I can pry it open in each function and read/write the necessary values to and from it.
To be more explicit, can I do something like this?:

def open_dataset(filename):
    with rasterio.open(filename) as dataset:
        return dataset

def do_stuff_to dataset(dataset):
    dataset.write()
    return modified_dataset

def get_info_from_dataset(dataset):
    dataset.information
    return information

If so, then what are the disadvantages of using the with block inside a function and returning the object to be used outside? Is this good practice? If not what is the recommended way to write functions which use the rasterio library? This also extends to with blocks containing rasterio.Env(). Should I nest the with statement inside the function as stated above?

I'm sorry about the confusion. I would rather not comment on the structure of your application. Your original question was about the warnings being printed in your shell, yes? I pointed out that in the absence of a custom error/warning handler, GDAL prints these directly to your shell. Rasterio does not register any custom handlers when you import it because I want to avoid import side effects that complicate testing of rasterio's modules.

The rasterio.env.Env class does register a custom error/warning handler when its __enter__() method is called. Within a `with Env():` block you should not see anything printed to the shell: messages will go to Python's logger instead, or be turned into Python exceptions.

I recommend ensuring that there is an activated Env within your function. You could do this by putting `with rasterio.env.Env():` at the top of the functions, or by using a decorator. There are examples of each in the rasterio code. The rasterio.open function itself is so decorated and you could reuse that decorator, it is part of the public API.
 

I ask because I would like to avoid writing a dataset to physical file until I am done modifying it.  For instance, geopandas uses the GeoDataFrame construct which is stored in memory, and not attached to any physical file. Memory files in rasterio come close, but they are still attached to a temporary file.

It seems that in rasterio defining a raster object must be via 3 disjoint entities (numpy array, affine transform and CRS ). It would be nice to have one object which groups all these entities and is somehow detached from physical/temporary files. A sort of glorified numpy array with metadata (a "GeoNumpy" array), just like geopandas glorifies the pandas Dataframe with metadata.

Of course this is just a naive (but hopefully constructive) suggestion, and perhaps also due to the fact that I do not completely understand how the library works :). 

Regards,

Amine

There is a file in MemoryFile, yes, but it is a formatted file in memory, not on disk. See https://www.gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsimem for a brief explanation.

Other programmers have shown interest in a "GeoNumpy" class, and I've seen at least one project like this on GitHub. Georaster, I think. Rasterio doesn't provide such a class and I'm not ready to add one at this time. I believe t's better for Rasterio to focus on reading and writing formatted datasets and leave application-specific classes up to application developers.

--
Sean Gillies