On the names of datasets in GDAL virtual file systems

Sean Gillies

Hi all,

As you know, GDAL has a virtual filesystem abstraction [1] wherein datasets are treated as ordinary files at special, private mount points like /vsis3/, /vsihdfs/, /vsiswift/, etc. Rasterio can open datasets using these paths, but favors web-style URIs formatted like "scheme://authority/path".

AWS S3 was the first virtual filesystem supported by Rasterio and we standardized on "s3://bucket/key" for the names of S3 objects. "s3" is not a registered URI scheme, but is used by the AWS CLI and in other projects. Note that AWS more formally uses URNs to identify resources [2].

Apache Hadoop introduced a "hdfs" scheme [3] and it seems good to reuse that in Rasterio instead of /vsihdfs/.

"gs" is used for Google Storage in the context of gsutil and BigQuery [4] and we've followed suit.

As we add support for new storage systems, I'm discovering that some have no concept of a URI scheme. OpenStack Swift is one of these.

I'd like to take the temperature of this group on whether to coin a URI scheme for OpenStack Swift or to punt for now, rely on /vsiswift/ filenames, and see if the Swift project and its users rally around a particular URI scheme. Thoughts?


Sean Gillies