Proposal: Allow other cloud object store providers
Sean Gillies
Hi Ashley, On Fri, Jun 25, 2021 at 6:07 PM <ashley.sommer@...> wrote:
Yes, I think that should be the standard way. AWS would have to remain a special case until we deprecate the special parameters properly, but we can already do what you outlined above for AWS.
I'm not familiar with opendatacube, but I think it should be possible to use Swift now without modifying either opendatacube or rasterio. GDAL already has support for 4 different authentication mechanisms (see https://gdal.org/user/virtual_file_systems.html#vsiswift) and as far as I know all of those options can be set using similarly named environmental variables, not only through the GDAL API as rasterio does here: https://github.com/mapbox/rasterio/blob/db03b66e81b489d3f5f01c9edfb6fc720250a2c1/rasterio/env.py#L233-L246. For example, one could call rasterio.session.SwiftSession(...).get_credential_options() and then update os.environ with the result. I hope this is a useful suggestion and not a red herring, Sean Gillies |
|
ashley.sommer@...
Thanks for your reply, Sean. > are you proposing that potentially any cloud platform would be made first class in rasterio, concretely, as in GDAL and as with rasterio/S3 today? Yes, thats right. > What would you think if rasterio were to take the opposite approach and require users to write ~5 lines of code themselves to adapt output of, say, keystonev3 and swiftclient to a standard interface in rasterio Yeah, I actually had the same thought too, but wasn't sure if it would be received well. That is actually kind-of how you can do it in rasterio already. You can manually create a `keystonev3` or `swiftclient` auth session, and populate it with your credentials. You can then manually create a rasterio `SwiftClient`, and give it that auth session. Then pass the pre-configured `SwiftClient` into `session.Env()` and rasterio will use that as the Cloud Session. Its a bit of boilerplate code, but it works across all of the existing `Session` subclasses for the different cloud platforms already. Should that be the "standard" way of doing it? Could it be cleaner? Would AWS S3 still be a special case? My end goal is to be able to use OpenStack Swift ObjectStore as a storage backend for an opendatacube project. Opendatacube only supports AWS S3 for now, because it relies on rasterio's "first-class" interface to S3. I was told, if I want to get other cloud providers working natively in opendatacube, we need them to be fully supported by rasterio first (as you mentioned, they're already fully supported by GDAL). Ashley Sommer |
|
Sean Gillies
Hi, Thanks for bringing this up! To be sure that I understand, are you proposing that potentially any cloud platform would be made first class in rasterio, concretely, as in GDAL and as with rasterio/S3 today? AWS is special in rasterio because it's what I use at work and is where most of the important public raster datasets were hosted 2-3 years ago. GDAL makes all cloud platforms first class because people or organizations paid the maintainer to do it and because it's a less complex approach than making GDAL extendable. What would you think if rasterio were to take the opposite approach and require users to write ~5 lines of code themselves to adapt output of, say, keystonev3 and swiftclient to a standard interface in rasterio? On Sun, Jun 13, 2021 at 10:03 PM <ashley.sommer@...> wrote:
--
Sean Gillies |
|
ashley.sommer@...
Hi Everyone, I understand that AWS S3 is by far the most common cloud object store provider, especially in the geospatial community. Unfortunately some users don't get to choose which object store provider they're given to use. Rasterio already has Session support for five different Cloud sessions, including GSSession, OSSSession, AzureSession, SwiftSession and AWSSession. (In this case, a sessions are "classes that configure access to secured resources".) However it seems that only AWS is a first-class citizen in rasterIO for the following reasons:
I propose the change to move the current object-store feature implementation to be less AWS focused, with
I understand this is a large piece of work, and an tall proposal for an opensource project. I'm usually one to scratch my own itch, my specific requirement out of this is to be able to use rasterio with my existing OpenStack Swift object store. I want to be able to use swift by passing pre-configured application credentials and using openstack `keystonev3` and `swiftclient` libraries to configure and credentialize the context used for GDAL. Of course I could simply open a PR with some code to patch in that support in the existing SwiftSession, but I think a cleaner solution would involve looking at the issues highlighted above, to make a more generalized fix overall, on top of the extra Swift features that I require. Thanks for reading. |
|