Reading from S3
hughes.lloyd@...
I am trying to read a geoTiff from my private S3 bucket (mapping), but am receiving the following error message
The code I am using to open the GeoTiff is:with rasterio.Env(session=AWSSession(aws_secret_access_key=S3_SECRET, aws_access_key_id=S3_KEY, region_name="us-west-1")) as env:
rasterio.open("s3://mapping/bahamas/S1A_20190821T231100.tif") I am using rasterio version 1.0.25 and the application is single-threaded. The file does exist and I can access it using s3fs and awscli. What am I missing? |
|
hughes.lloyd@...
I am trying to read a GeoTIFF from a private AWS S3 bucket. I have configured GDAL and the appropriate files ~/.aws/config and ~/.aws/credentials. I am using a non-standard AWS region as well, so I needed to set the AWS_S3_ENDPOINT environment variable. I am able to read the GeoTIFF information using both gdalinfo and rio:
and using rio:
However, when I try to read it in a script using the rasterio Python API the I received the following error:
The code I am using which produced the issues is
This is using Python 3.7, rasterio 1.0.25, and GDAL 2.4.2 DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f97fb41d898> DEBUG:rasterio.env:Starting outermost env DEBUG:rasterio.env:No GDAL environment exists DEBUG:rasterio.env:New GDAL environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> created DEBUG:rasterio._env:GDAL_DATA found in environment: '/srv/conda/envs/notebook/share/gdal'. DEBUG:rasterio._env:PROJ_LIB found in environment: '/srv/conda/envs/notebook/share/proj'. DEBUG:rasterio._env:Started GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>. DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f97fb41d898> DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> options DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f97fb3c5898> DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> options DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f97fb3c5898> DEBUG:rasterio._base:Sharing flag: 32 DEBUG:rasterio.env:Exiting env context: <rasterio.env.Env object at 0x7f97fb3c5898> DEBUG:rasterio.env:Cleared existing <rasterio._env.GDALEnv object at 0x7f97fb41d908> options DEBUG:rasterio._env:Stopped GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>. DEBUG:rasterio.env:No GDAL environment exists DEBUG:rasterio.env:New GDAL environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> created DEBUG:rasterio._env:GDAL_DATA found in environment: '/srv/conda/envs/notebook/share/gdal'. DEBUG:rasterio._env:PROJ_LIB found in environment: '/srv/conda/envs/notebook/share/proj'. DEBUG:rasterio._env:Started GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>. DEBUG:rasterio.env:Exited env context: <rasterio.env.Env object at 0x7f97fb3c5898> DEBUG:rasterio.env:Exiting env context: <rasterio.env.Env object at 0x7f97fb41d898> DEBUG:rasterio.env:Cleared existing <rasterio._env.GDALEnv object at 0x7f97fb41d908> options DEBUG:rasterio._env:Stopped GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>. DEBUG:rasterio.env:Exiting outermost env DEBUG:rasterio.env:Exited env context: <rasterio.env.Env object at 0x7f97fb41d898> env: AWS_ACCESS_KEY_ID="XXXXXX" env: AWS_SECRET_ACCESS_KEY="XXXXXXXX" env: AWS_S3_ENDPOINT="us-west-1" --------------------------------------------------------------------------- CPLE_OpenFailedError Traceback (most recent call last) rasterio/_base.pyx in rasterio._base.DatasetBase.__init__() rasterio/_shim.pyx in rasterio._shim.open_dataset() rasterio/_err.pyx in rasterio._err.exc_wrap_pointer() CPLE_OpenFailedError: '/vsis3/s1-image-dataset/test.tif' does not exist in the file system, and is not recognized as a supported dataset name. |
|
Sean Gillies
Hi, The following log message catches my eye: env: AWS_S3_ENDPOINT="us-west-1" If that is set in your notebook's environment, it will override the value you pass to Env() in your program, and it looks to be incorrect. On Thu, Sep 5, 2019 at 8:17 AM <hughes.lloyd@...> wrote:
--
Sean Gillies |
|
Guillaume Lostis <g.lostis@...>
Hi,
I will add to the previous message that if you want to specify a non-default region, the environment variable you're looking for is probably AWS_REGION (or AWS_DEFAULT_REGION starting with GDAL 2.3), rather than AWS_S3_ENDPOINT (see https://gdal.org/user/virtual_file_systems.html#vsis3-aws-s3-files-random-reading)Also, I have successfully used rasterio on private AWS S3 buckets without having to touch any environment variable, so unless I incorrectly understand your case, any extra configuration should not be necessary. Best, Guillaume Lostis |
|
hughes.lloyd@...
My bucket is hosted in "us-gov-west-1" region and if I don't set
AWS_S3_ENDPOINT=s3.us-gov-west-1.amazonaws.comthen neither gdalinfo nor rio work, they throw errors about the file not being found as they still continue to access the standard endpoint, even when I specify the correct region (I've tried in ~/.aws/config and in AWS_REGION, AWS_DEFAULT_REGION). You can see from the following that the endpoint remains incorrect: WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://s1-image-dataset.s3.amazonaws.com/test.tif: 403 |
|
hughes.lloyd@...
The issue doesn't exist outside of Jupyter notebooks. It seems once I am inside a notebook that rasterio does not function in the same manner even when the environment variables are identical.
Would be interested to know if you have managed to read form S3 form inside a Notebook |
|
Sean Gillies
Hi Hughes, Yes, I've been able to read raster data from S3 in a Jupyter notebook. What do you make of the observation I made earlier today about the env: AWS_S3_ENDPOINT="us-west-1" log message from your notebook? I think this might be the key. On Thu, Sep 5, 2019 at 1:10 PM <hughes.lloyd@...> wrote: The issue doesn't exist outside of Jupyter notebooks. It seems once I am inside a notebook that rasterio does not function in the same manner even when the environment variables are identical. -- Sean Gillies |
|
hughes.lloyd@...
Hi Sean,
env: AWS_S3_ENDPOINT="us-west-1"This was indeed an error, although changing it did not fix the problem. I have included "fresh" logs below to show the problem still persists. Furthermore, as I stated above I have to set the AWS_S3_ENDPOINT environment variable otherwise rasterio does not work with to the "us-gov-west-1" one (as I showed above). And in the example below which: $ aws configure Now that it works in the Terminal when setting AWS_S3_ENDPOINT let's turn back to the notebook example where only the ~/.aws/config and ~/.aws/credentials files configured, and not AWS_S3_ENDPOINT
This gives the following DEBUG logDEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f14328e5a58>
DEBUG:rasterio.env:Starting outermost env
DEBUG:rasterio.env:No GDAL environment exists
DEBUG:rasterio.env:New GDAL environment <rasterio._env.GDALEnv object at 0x7f1418524e80> created
DEBUG:rasterio._env:GDAL_DATA found in environment: '/srv/conda/envs/notebook/share/gdal'.
DEBUG:rasterio._env:PROJ_LIB found in environment: '/srv/conda/envs/notebook/share/proj'.
DEBUG:rasterio._env:Started GDALEnv <rasterio._env.GDALEnv object at 0x7f1418524e80>.
DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f14328e5a58>
DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f1418524e80> options
DEBUG:botocore.hooks:Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
DEBUG:botocore.hooks:Changing event name from before-call.apigateway to before-call.api-gateway
DEBUG:botocore.hooks:Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
DEBUG:botocore.hooks:Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
DEBUG:botocore.hooks:Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
DEBUG:botocore.hooks:Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
DEBUG:botocore.hooks:Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
DEBUG:botocore.hooks:Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
DEBUG:botocore.hooks:Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
DEBUG:botocore.hooks:Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
DEBUG:botocore.hooks:Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
DEBUG:botocore.credentials:Looking for credentials via: env
DEBUG:botocore.credentials:Looking for credentials via: assume-role
DEBUG:botocore.credentials:Looking for credentials via: assume-role-with-web-identity
DEBUG:botocore.credentials:Looking for credentials via: shared-credentials-file
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f1418396c50>
DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f1418524e80> options
DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f1418396c50>
DEBUG:rasterio._base:Sharing flag: 32
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://
Still the same problem persists (only when running this in a Jupyter Notebook though).DEBUG:botocore.hooks:Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
DEBUG:botocore.hooks:Changing event name from before-call.apigateway to before-call.api-gateway
DEBUG:botocore.hooks:Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
DEBUG:botocore.hooks:Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
DEBUG:botocore.hooks:Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
DEBUG:botocore.hooks:Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
DEBUG:botocore.hooks:Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
DEBUG:botocore.hooks:Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
DEBUG:botocore.hooks:Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
DEBUG:botocore.hooks:Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
DEBUG:botocore.hooks:Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
DEBUG:botocore.session:Setting config variable for region to 'us-gov-west-1'
DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f0b73896c18>
DEBUG:rasterio.env:Starting outermost env
DEBUG:rasterio.env:No GDAL environment exists
DEBUG:rasterio.env:New GDAL environment <rasterio._env.GDALEnv object at 0x7f0b73896c50> created
DEBUG:rasterio._env:GDAL_DATA found in environment: '/srv/conda/envs/notebook/share/gdal'.
DEBUG:rasterio._env:PROJ_LIB found in environment: '/srv/conda/envs/notebook/share/proj'.
DEBUG:rasterio._env:Started GDALEnv <rasterio._env.GDALEnv object at 0x7f0b73896c50>.
DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f0b73896c18>
DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f0b73896c50> options
DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f0b73896fd0>
DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f0b73896c50> options
DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f0b73896fd0>
DEBUG:rasterio._base:Sharing flag: 32
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://
Perhaps this is not a bug, but it seems counter intuitive to directly need to specify AWS Endpoints when there is a region specification needed as well as the two are related. |
|
scott
Hughes,
Have you tried setting `os.environ['AWS_S3_ENDPOINT']='s3.us-gov-west-1.amazonaws.com'` before opening the file? This reminds me of a previous (but resolved) issue with requester pays configuration: https://github.com/mapbox/rasterio/issues/692#issuecomment-362434388 Scott |
|
Sean Gillies
Hughes, Hi Sean, ...
Perhaps this is not a bug, but it seems counter intuitive to directly need to specify AWS Endpoints when there is a region specification needed as well as the two are related. The AWS_S3_ENDPOINT config option is intended to allow GDAL users to work with S3-compatible systems like https://min.io/index.html.
It shouldn't be needed for the Gov Cloud, specification of the region
should suffice, as you expect. I'm going to dig in rasterio and ask on
gdal-dev. I'll follow up here soon. -- Sean Gillies |
|
Sean Gillies
Hughes, would you be willing to run CPL_CURL_VERBOSE=1 rio info "s3://s1-image-dataset/test.tif" on your computer after unsetting AWS_S3_ENDPOINT and show us the output after sanitizing it (replace your key with xxxxx, but otherwise leave the Authorization headers readable)? If you do this, we'll see information about the HTTP requests that are made and can see if GDAL is failing to navigate a redirect or something like that. On Fri, Sep 6, 2019 at 8:47 AM Sean Gillies via Groups.Io <sean.gillies=gmail.com@groups.io> wrote:
-- Sean Gillies |
|