Proposal: Allow other cloud object store providers


ashley.sommer@...
 

Hi Everyone,

I understand that AWS S3 is by far the most common cloud object store provider, especially in the geospatial community. Unfortunately some users don't get to choose which object store provider they're given to use.

Rasterio already has Session support for five different Cloud sessions, including GSSession, OSSSession, AzureSession, SwiftSession and AWSSession. (In this case, a sessions are "classes that configure access to secured resources".)

However it seems that only AWS is a first-class citizen in rasterIO for the following reasons:

  • In the docs, on the topic of the use of objects on the cloud as virtual files, only support for AWS S3 is shown: https://rasterio.readthedocs.io/en/latest/topics/vsi.html
    • Other cloud providers are not even mentioned. I didn't know other sessions existed until I looked into the code.
  • When installing rasterio, you can add support for credentailizing AWS S3 (with boto3) using the package "extras" syntax: `pip install rasterio[s3]`.
    • No other cloud providers have their own addon available at install-time.
  • When configuring a rasterio context-manger with `rasterio.Env()` you have the ability to pass in an AWSSession or aws credentials to credentialize your context.
    • If you don't pass in credentails, `Env` will make a session using `Session.aws_or_dummy()`, but doesn't attempt to check the if other provider sessions should be used.

I propose the change to move the current object-store feature implementation to be less AWS focused, with
  • Ability to install requirement packages for other cloud providers, using additional "extras" options:
    • eg, `rasterio[gs]`, `rasterio[azure]`, `rasterio[swift]`
  • Modify the `session.Env()` context manger to be cloud-platform agnostic, ie, no "aws-or-dummy" behaviour, based on which supporting-packages are installed and which environment variables are present.
  • Document all of the different cloud providers that Rasterio supports, and how to configure and use them.

I understand this is a large piece of work, and an tall proposal for an opensource project. I'm usually one to scratch my own itch, my specific requirement out of this is to be able to use rasterio with my existing OpenStack Swift object store. I want to be able to use swift by passing pre-configured application credentials and using openstack `keystonev3` and `swiftclient` libraries to configure and credentialize the context used for GDAL. Of course I could simply open a PR with some code to patch in that support in the existing SwiftSession, but I think a cleaner solution would involve looking at the issues highlighted above, to make a more generalized fix overall, on top of the extra Swift features that I require.

Thanks for reading.

Join main@rasterio.groups.io to automatically receive all group messages.