Handling non-AWS S3 services

Guillaume Lostis
 

Hi all,

I've been working with a non-AWS S3 file storage lately, so I had to tackle the question of how to use rasterio with it. The service (https://www.openio.io/) is S3-compatible, so it works with AWS libraries such as boto3 or awscli.

I've managed to make it work with rasterio, but in a manner that doesn't really satisfy me. I'm writing this message to ask if you would agree to make some changes to AWSSession in order to better handle non-AWS S3 providers.

Here is some context on the problem: since the endpoint_url of the service is different from GDAL's default (s3.amazonaws.com), I currently need to write code along the lines of:

import rasterio

with rasterio.Env(profile_name="openio", AWS_S3_ENDPOINT="my_endpoint.com"):
    with rasterio.open("s3://bucket/file.tiff") as src:
        print(src.shape)

The code works, but I don't like the fact that I have to use a mix of rasterio-esque arguments to use an AWSSession and some GDAL-esque arguments to patch a missing argument in the AWSSession.

The nice thing about AWSSession is that it uses a boto3.Session, which in turn reads my ~/.aws/config and ~/.aws/credentials files in which I've saved my OpenIO credentials and region name under a profile named openio (this way I can easily switch between AWS and OpenIO buckets).

The not-so-nice thing is that boto3.Session objects do not handle the specification of a custom endpoint_url. This is intentional and is done because a Session is made to talk to different services (EC2, S3, ...), which have different URLs (more info in the first few comments of this issue). A boto3.Session.client, however, accepts a custom endpoint_url. For example, to have boto3 work with OpenIO, I do the following:

import boto3

session = boto3.Session(profile_name="oio")
client = session.client("s3", endpoint_url="https://my_endpoint.com")
# use the client to retrieve files, etc.

From what I've understood by reading the code, AWSSession uses a boto3.Session only to handle the credentials retrieval part, and then stores them in a _creds attribute. After that, the boto3.Session is not used for anything else. Since a boto3.Session cannot handle the retrieval of a custom endpoint_url, would it be acceptable to add an endpoint_url argument to the AWSSession? I have tested this patch and it does what I want, because I can run the following code (which, IMO, is nicer than the first one):

import rasterio

session = rasterio.session.AWSSession(profile_name="openio", endpoint_url="my_endpoint.com")
with rasterio.Env(session=session):
    with rasterio.open("s3://bucket/file.tiff") as src:
        print(src.shape)

What do you think of this? Is it in the scope of rasterio's AWSSession, or not?

Thanks,

Guillaume Lostis

Join dev@rasterio.groups.io to automatically receive all group messages.