Unable to read COG via /vsicurl using rasterio (but can access via GDAL)


henry@...
 

Hello! I am trying to access the Harmonized Landsat Sentinel cloud optimized geotiffs and am struggling to read the files using rasterio. I am running Ubuntu 20.04, and have installed GDAL 3.4.0 from the ubuntugis-unstable PPA and rasterio 1.2.10 from pypi in a python virtual environment.

This `gdalinfo` call behaves as expected:

```sh
gdalinfo /vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TEK.2021261T175011.v2.0/HLS.S30.T13TEK.2021261T175011.v2.0.B04.tif \
  --config CPL_VSIL_CURL_USE_HEAD FALSE \
  --config CPL_CURL_VERBOSE YES \
  --config GDAL_HTTP_COOKIEJAR /tmp/cookies.txt

```

But reading the raster using rasterio in a python session does not!
``python
import rasterio as rio
url = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TEK.2021261T175011.v2.0/HLS.S30.T13TEK.2021261T175011.v2.0.B04.tif'
with rio.Env(CPL_CURL_VERBOSE=True, GDAL_HTTP_COOKIEJAR='/tmp/cookies.txt', GDAL_DISABLE_READDIR_ON_OPEN=True, CPL_VSIL_CURL_ALLOWED_EXTENSIONS='TIF'):
    r = rio.open(url)

```
The tail of the error message with curl output looks like this:
```

< HTTP/1.1 303 See Other

< Content-Type: application/json

< Content-Length: 0

< Connection: keep-alive

< Server: CloudFront

< Date: Wed, 23 Feb 2022 21:03:31 GMT

< x-amzn-Remapped-x-amzn-RequestId: 984f04ce-4a4f-468f-82aa-d20cab3e1b7b

< x-amzn-Remapped-Content-Length: 0

< x-amzn-Remapped-Connection: keep-alive

< X-Request-Id: NHQIHdHSrm4e-4RWn2R0RiEGoHSoFTicTYI4nH62Q4-TzKX3X66xlA==

< x-amz-apigw-id: OA4djHpePHcFqoQ=

< Cache-Control: private, max-age=3540

< x-amzn-Remapped-Server: Server

< Location: https://d1nklfio7vscoe.cloudfront.net/s3-2d2df3a34830d5223d1e9547cd713408/lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T13TEK.2021261T175011.v2.0/HLS.S30.T13TEK.2021261T175011.v2.0.B04.tif?A-userid=hrodman1&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAZLX6ZES42JF7SMUS%252F20220223%252Fus-west-2%252Fs3%252Faws4_request&X-Amz-Date=20220223T210331Z&X-Amz-Expires=3600&X-Amz-Security-Token=FwoGZXIvYXdzEDYaDPTQu3TsL8Js%252Fk0KxiK3AUa2TuxCD%252Fd1SRCht5WQihvNjeg1F8uQ2Dy%252FY1RJN%252Fayv5ZVAiXkTnYDfJiOgZpDiMw7gWI5fBntcpiz7m5a4yvzfVGucMaCMjlj4%252BdFaqJeOLgpSowuYWw%252B6H5f36uSGgKF1pQw7eeVnuRQ0j3Llp%252BXX1hyP2ymnL5HO6huutM%252Bd6BD%252Bu1ynCZmJqNYoVQqysmTXdp%252Fs2TKIW0R7agT4O1h21SIbZdZHZ%252F9hgtHzLQeCoIHp3HLuyijwtqQBjItdskFBhJKRIDVWU1dG7szzpdLHKN2KjzB%252BeiTo3YuFHXU4aENqpTBawLYBypq&X-Amz-SignedHeaders=host&X-Amz-Signature=35a9a4bc828179c5c565c8bcd9f62575ac006da0494a08a8bf925fa65d1c2549

< X-Amzn-Trace-Id: Root=1-6216a123-370f1e0a2e8065ed6ca302f9;Sampled=0

< x-amzn-Remapped-Date: Wed, 23 Feb 2022 21:03:31 GMT

< X-Content-Type-Options: nosniff

< X-Frame-Options: SAMEORIGIN

< X-XSS-Protection: 1; mode=block

< Strict-Transport-Security: max-age=31536000

< X-Forwarded-For: 75.134.137.166

< x-amzn-RequestId: abb6a9c7-2e15-4809-953d-2e8d16bf5f71

< X-Cache: Miss from cloudfront

< Via: 1.1 3dc94622fb840cab73b3ddb08a5c9680.cloudfront.net (CloudFront)

< X-Amz-Cf-Pop: MSP50-C1

< X-Amz-Cf-Id: NHQIHdHSrm4e-4RWn2R0RiEGoHSoFTicTYI4nH62Q4-TzKX3X66xlA==

* Failed writing header

* stopped the pause stream!

* Closing connection 0

Traceback (most recent call last):

  File "rasterio/_base.pyx", line 261, in rasterio._base.DatasetBase.__init__

  File "rasterio/_shim.pyx", line 78, in rasterio._shim.open_dataset

  File "rasterio/_err.pyx", line 216, in rasterio._err.exc_wrap_pointer

rasterio._err.CPLE_HttpResponseError: HTTP response code: 303

 

During handling of the above exception, another exception occurred:

 

Traceback (most recent call last):

  File "hls.py", line 6, in <module>

    r = rio.open(url)

  File "/root/Envs/sequoia/lib/python3.8/site-packages/rasterio/env.py", line 437, in wrapper

    return f(*args, **kwds)

  File "/root/Envs/sequoia/lib/python3.8/site-packages/rasterio/__init__.py", line 220, in open

    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)

  File "rasterio/_base.pyx", line 263, in rasterio._base.DatasetBase.__init__

rasterio.errors.RasterioIOError: HTTP response code: 303

```

When I install gdal and rasterio from conda-forge into a conda environment, everything works (without those GDAL flags)! Many of the system dependencies have later versions in the conda environment (curl version 7.81.0 in conda vs 7.68.0 without conda).

This works in conda environment
```python

import rasterio as rio
url = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TEK.2021261T175011.v2.0/HLS.S30.T13TEK.2021261T175011.v2.0.B04.tif'
r = rio.open(url)

```

Does anyone have a clue about how to get rasterio working in this context without the conda installation?


vincent.sarago@...
 

My colleague Sean Harkins is telling me that there was an issue in gdal which was fixed in 3.3 (https://github.com/OSGeo/gdal/pull/4656). Could you make sure conda install gdal>=3.3 or that you are using the latest rasterio wheels (1.3a3)  


henry@...
 
Edited

Thank you for your reply, Vincent! I installed rasterio==1.3a3 from pypi and set a few GDAL environment variables and now everything is working as expected in my non-conda installation.

Here are the GDAL environment variables that I need to set to read the file:

export CPL_VSIL_CURL_USE_HEAD=YES

export GDAL_DISABLE_READDIR_ON_OPEN=OPEN_DIR

export GDAL_HTTP_COOKIEJAR=/tmp/cookies.txt

export GDAL_HTTP_COOKIEFILE=/tmp/cookies.txt

export CPL_VSIL_CURL_ALLOWED_EXTENSIONS=TIF