Re: Proposal: Allow other cloud object store providers


Sean Gillies
 

Hi,

Thanks for bringing this up! To be sure that I understand, are you proposing that potentially any cloud platform would be made first class in rasterio, concretely, as in GDAL and as with rasterio/S3 today? AWS is special in rasterio because it's what I use at work and is where most of the important public raster datasets were hosted 2-3 years ago. GDAL makes all cloud platforms first class because people or organizations paid the maintainer to do it and because it's a less complex approach than making GDAL extendable.  

What would you think if rasterio were to take the opposite approach and require users to write ~5 lines of code themselves to adapt output of, say, keystonev3 and swiftclient to a standard interface in rasterio?

On Sun, Jun 13, 2021 at 10:03 PM <ashley.sommer@...> wrote:

Hi Everyone,

I understand that AWS S3 is by far the most common cloud object store provider, especially in the geospatial community. Unfortunately some users don't get to choose which object store provider they're given to use.

Rasterio already has Session support for five different Cloud sessions, including GSSession, OSSSession, AzureSession, SwiftSession and AWSSession. (In this case, a sessions are "classes that configure access to secured resources".)

However it seems that only AWS is a first-class citizen in rasterIO for the following reasons:

  • In the docs, on the topic of the use of objects on the cloud as virtual files, only support for AWS S3 is shown: https://rasterio.readthedocs.io/en/latest/topics/vsi.html
    • Other cloud providers are not even mentioned. I didn't know other sessions existed until I looked into the code.
  • When installing rasterio, you can add support for credentailizing AWS S3 (with boto3) using the package "extras" syntax: `pip install rasterio[s3]`.
    • No other cloud providers have their own addon available at install-time.
  • When configuring a rasterio context-manger with `rasterio.Env()` you have the ability to pass in an AWSSession or aws credentials to credentialize your context.
    • If you don't pass in credentails, `Env` will make a session using `Session.aws_or_dummy()`, but doesn't attempt to check the if other provider sessions should be used.

I propose the change to move the current object-store feature implementation to be less AWS focused, with
  • Ability to install requirement packages for other cloud providers, using additional "extras" options:
    • eg, `rasterio[gs]`, `rasterio[azure]`, `rasterio[swift]`
  • Modify the `session.Env()` context manger to be cloud-platform agnostic, ie, no "aws-or-dummy" behaviour, based on which supporting-packages are installed and which environment variables are present.
  • Document all of the different cloud providers that Rasterio supports, and how to configure and use them.

I understand this is a large piece of work, and an tall proposal for an opensource project. I'm usually one to scratch my own itch, my specific requirement out of this is to be able to use rasterio with my existing OpenStack Swift object store. I want to be able to use swift by passing pre-configured application credentials and using openstack `keystonev3` and `swiftclient` libraries to configure and credentialize the context used for GDAL. Of course I could simply open a PR with some code to patch in that support in the existing SwiftSession, but I think a cleaner solution would involve looking at the issues highlighted above, to make a more generalized fix overall, on top of the extra Swift features that I require.

Thanks for reading.



--
Sean Gillies

Join main@rasterio.groups.io to automatically receive all group messages.