Date   

Re: Numpy error when masking a Landsat image with a polygon [Rasterio 1.0.28]

Alan Snow
 

After looking at the docs, I think you need to remove `.read(1)` as that returns a numpy array and `mask` expects a dataset: https://rasterio.readthedocs.io/en/stable/api/rasterio.mask.html#rasterio.mask.mask


Re: Numpy error when masking a Landsat image with a polygon [Rasterio 1.0.28]

Alan Snow
 


Numpy error when masking a Landsat image with a polygon [Rasterio 1.0.28]

juliano.ecc@...
 

Hi folks
I'm having the following errors when I'm trying to mask a landsat image with a kml polygon.

[Dbg]>>> a,b = rasterio.mask.mask(b2_bytes.open().read(1),roi['geometry'],crop=True)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Anaconda3\lib\site-packages\rasterio\mask.py", line 174, in mask
    if dataset.nodata is not None:
AttributeError: 'numpy.ndarray' object has no attribute 'nodata'

[Dbg]>>> a,b = rasterio.mask.mask(b2_bytes.open().read(1),roi['geometry'],crop=True,nodata=0)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Anaconda3\lib\site-packages\rasterio\mask.py", line 181, in mask
    pad=pad)
  File "C:\Anaconda3\lib\site-packages\rasterio\mask.py", line 76, in raster_geometry_mask
    north_up = dataset.transform.e <= 0
AttributeError: 'numpy.ndarray' object has no attribute 'transform


The metadata for scene LC08_L1TP_221077_20190815_20190820_01_T1 is 
{'count': 1,
 'crs': CRS.from_dict(init='epsg:32622'),
 'driver': 'GTiff',
 'dtype': 'uint16',
 'height': 7801,
 'nodata': None,
 'transform': Affine(30.0, 0.0, 491985.0,
       0.0, -30.0, -2599185.0),
 'width': 7721}


My code
from google.cloud import storage
import fiona
import geopandas
import rasterio
import rasterio.mask
 
#test.kml PATH 221 ROW 77 #file is attached
#[SENSOR]/01/PATH/ROW/[scene_id]/
PATH = '221'
ROW = '077'
 
#-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
#Download Image B2 band from google cloud to memory

client = storage.Client.from_service_account_json('credential.json')
 
bucket = client.get_bucket('gcp-public-data-landsat')
l1 = list(bucket.list_blobs(prefix='LC08/01/'+PATH+'/'+ROW+'/', max_results=100000)) #,delimiter='T1'
l2 = [i.name for i in l1 if "_T1" in i.name.split('/')[4] ] #only T1 images
 
last_image = l2[len(l2)-1].split('/')[4] #scene folder


b2_binary_blob = bucket.get_blob('LC08/01/'+PATH+'/'+ROW+'/'+last_image+"/"+last_image+"_B2.TIF").download_as_string()  #LC08_L1TP_221077_20190815_20190820_01_T1
b2_bytes = rasterio.MemoryFile(b2_binary_blob)
del b2_binary_blob
#-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

#-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
#Loading  polygon and trying to mask
fiona.drvsupport.supported_drivers['kml'] = 'rw' # enable KML support which is disabled by default
fiona.drvsupport.supported_drivers['KML'] = 'rw' # enable KML support which is disabled by default
geopandas.io.file.fiona.drvsupport.supported_drivers['kml'] = 'rw' # enable KML support which is disabled by default
geopandas.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw' # enable KML support which is disabled by default
 

roi = geopandas.GeoDataFrame.from_file("../data/geo/gmaps/teste_castro.kml") #roi.crs -> {'init': 'epsg:4326'}
roi = roi.to_crs(b2_bytes.open().meta['crs'].to_dict()) #convert polygon coordinate system to the one used on the image

#Mask
a,b = rasterio.mask.mask(b2_bytes.open().read(1),roi['geometry'],crop=True) #AttributeError: 'numpy.ndarray' object has no attribute 'nodata'
a,b = rasterio.mask.mask(b2_bytes.open().read(1),roi['geometry'],crop=True,nodata=0) #AttributeError: 'numpy.ndarray' object has no attribute 'transform
 


Rasterio 1.0.28

Sean Gillies
 

Hi all,

Rasterio 1.0.27 broke a CLI plugin that my team uses at worker and may have broken any of yours that also pass creation options like `BLOCKXSIZE=1024` directly to dataset constructors without coercing those numeric strings to ints. Sorry! The bug is fixed in 1.0.28, which is on PyPI now.

--
Sean Gillies


Re: Reading from S3

Sean Gillies
 

Hughes, would you be willing to run

CPL_CURL_VERBOSE=1 rio info "s3://s1-image-dataset/test.tif"

on your computer after unsetting AWS_S3_ENDPOINT and show us the output after sanitizing it (replace your key with xxxxx, but otherwise leave the Authorization headers readable)? If you do this, we'll see information about the HTTP requests that are made and can see if GDAL is failing to navigate a redirect or something like that.


On Fri, Sep 6, 2019 at 8:47 AM Sean Gillies via Groups.Io <sean.gillies=gmail.com@groups.io> wrote:
Hughes,

On Fri, Sep 6, 2019 at 3:48 AM <hughes.lloyd@...> wrote:
Hi Sean,

...

Perhaps this is not a bug, but it seems counter intuitive to directly need to specify AWS Endpoints when there is a region specification needed as well as the two are related.


The AWS_S3_ENDPOINT config option is intended to allow GDAL users to work with S3-compatible systems like https://min.io/index.html. It shouldn't be needed for the Gov Cloud, specification of the region should suffice, as you expect. I'm going to dig in rasterio and ask on gdal-dev. I'll follow up here soon.

--
Sean Gillies



--
Sean Gillies


Re: Reading from S3

Sean Gillies
 

Hughes,

On Fri, Sep 6, 2019 at 3:48 AM <hughes.lloyd@...> wrote:
Hi Sean,

...

Perhaps this is not a bug, but it seems counter intuitive to directly need to specify AWS Endpoints when there is a region specification needed as well as the two are related.


The AWS_S3_ENDPOINT config option is intended to allow GDAL users to work with S3-compatible systems like https://min.io/index.html. It shouldn't be needed for the Gov Cloud, specification of the region should suffice, as you expect. I'm going to dig in rasterio and ask on gdal-dev. I'll follow up here soon.

--
Sean Gillies


Re: Reading from S3

scott
 

Hughes,

Have you tried setting `os.environ['AWS_S3_ENDPOINT']='s3.us-gov-west-1.amazonaws.com'` before opening the file?

This reminds me of a previous (but resolved) issue with requester pays configuration: https://github.com/mapbox/rasterio/issues/692#issuecomment-362434388 

Scott


Re: Reading from S3

hughes.lloyd@...
 

Hi Sean,

env: AWS_S3_ENDPOINT="us-west-1"
This was indeed an error, although changing it did not fix the problem. I have included "fresh" logs below to show the problem still persists. Furthermore, as I stated above I have to set the AWS_S3_ENDPOINT environment variable otherwise rasterio does not work with to the "us-gov-west-1" one (as I showed above). And in the example below which:

$ aws configure
AWS Access Key ID [****************]:
AWS Secret Access Key [****************]:
Default region name [us-gov-west-1]:
Default output format [None]:

$ rio info "s3://s1-image-dataset/test.tif"
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://s1-image-dataset.s3.amazonaws.com/test.tif: 403 <- Region is not in endpoint even though it is configured!
Traceback (most recent call last):
File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AWSError: The AWS Access Key Id you provided does not exist in our records.

$ export AWS_S3_ENDPOINT='s3.us-gov-west-1.amazonaws.com'
$ rio info "s3://s1-image-dataset/test.tif"
{"bounds": [689299.5634174921, 2622862.3065700093, 1028889.5634174921, 3007932.3065700093], ....

Now that it works in the Terminal when setting AWS_S3_ENDPOINT let's turn back to the notebook example where only the ~/.aws/config and ~/.aws/credentials files configured, and not AWS_S3_ENDPOINT
import rasterio
path = "s3://s1-image-dataset/test.tif"
with rasterio.Env() as env:
    with rasterio.open(path) as f:
        print(f.meta)
This gives the following DEBUG log
So let's try specifying the region to an AWSSession, along with the Key and Secret
import rasterio
path = "s3://s1-image-dataset/test.tif"
with rasterio.Env(AWSSession(aws_access_key_id="XXXX", aws_secret_access_key="XXXX", region_name="us-gov-west-1")) as env:
    with rasterio.open(path) as f:
        print(f.meta)
Still the same problem persists (only when running this in a Jupyter Notebook though).
Upgraded to rasterio 1.0.26 now the following works, but I have to specify AWS_S3_ENDPOINT explicitly otherwise it does not work as shown above
import rasterio
path = "s3://s1-image-dataset/test.tif"
with rasterio.Env(AWSSession(aws_access_key_id="XXXX", aws_secret_access_key="XXXX", region_name="us-gov-west-1"), AWS_S3_ENDPOINT='s3.us-gov-west-1.amazonaws.com') as env:
    with rasterio.open(path) as f:
        print(f.meta)

Perhaps this is not a bug, but it seems counter intuitive to directly need to specify AWS Endpoints when there is a region specification needed as well as the two are related.
 
 


Re: Reading from S3

Sean Gillies
 

Hi Hughes,

Yes, I've been able to read raster data from S3 in a Jupyter notebook.

What do you make of the observation I made earlier today about the
env: AWS_S3_ENDPOINT="us-west-1"
log message from your notebook? I think this might be the key.

On Thu, Sep 5, 2019 at 1:10 PM <hughes.lloyd@...> wrote:
The issue doesn't exist outside of Jupyter notebooks. It seems once I am inside a notebook that rasterio does not function in the same manner even when the environment variables are identical.

Would be interested to know if you have managed to read form S3 form inside a Notebook



--
Sean Gillies


Rasterio 1.0.27

Sean Gillies
 

Hi all,

Rasterio 1.0.27 is on PyPI now and here is the list of the changes.
  • Resolve #1744 by adding a `dtype` keyword argument to the WarpedVRT constructor. It allows a user to specify the working data type for the warp operation and output.
  • All cases of deprecated affine right multiplication have been changed to be forward compatible with affine 3.0. The rasterio tests now pass without warnings.
  • The coordinate transformer used in _base._transform() is now properly deleted, fixing the memory leak reported in #1713.
  • An unavoidable warning about 4-channel colormap entries in DatasetWriterBase.write_colormap() has been removed.
  • All deprecated imports of abstract base classes for collections have been corrected, eliminating the warnings reported in #1742 and #1764.
  • DatasetWriterBase no longer requires that GeoTIFF block sizes be smaller than the raster size (#1760). Block sizes are however checked to ensure that they are multiples of 16.
  • DatasetBase.is_tiled has been made more reliable, fixing #1376. - Tests have been added to demonstrate that image corruption when writing block-wise to an image with extra large block sizes (#520) is no longer an issue.

I'm happy to say that Rasterio's tests run without warnings now. It's been a while since that was true. Thank you, Guillaume Lostis, for helping on this. Thank you, Jason Hight and Vincent Sarago, for reports and feedback on other issues.

There are no changes in the rasterio wheels, it's the same versions of GDAL and dependencies as in the 1.0.26 wheels, built the same way.

Related: the affine module that rasterio uses for affine transformation matrices had a 2.3.0 release and Juan Luis Cano Rodríguez and Darren Weber were instrumental in that.

Share and enjoy,

--
Sean Gillies


Re: Reading from S3

hughes.lloyd@...
 

The issue doesn't exist outside of Jupyter notebooks. It seems once I am inside a notebook that rasterio does not function in the same manner even when the environment variables are identical.

Would be interested to know if you have managed to read form S3 form inside a Notebook


Re: Reading from S3

hughes.lloyd@...
 

My bucket is hosted in "us-gov-west-1" region and if I don't set 
AWS_S3_ENDPOINT=s3.us-gov-west-1.amazonaws.com
then neither gdalinfo nor rio  work, they throw errors about the file not being found as they still continue to access the standard endpoint, even when I specify the correct region (I've tried in ~/.aws/config and in AWS_REGION, AWS_DEFAULT_REGION). You can see from the following that the endpoint remains incorrect:
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://s1-image-dataset.s3.amazonaws.com/test.tif: 403


Re: Reading from S3

Guillaume Lostis <g.lostis@...>
 

Hi,

I will add to the previous message that if you want to specify a non-default region, the environment variable you're looking for is probably AWS_REGION (or AWS_DEFAULT_REGION starting with GDAL 2.3), rather than AWS_S3_ENDPOINT (see https://gdal.org/user/virtual_file_systems.html#vsis3-aws-s3-files-random-reading)

Also, I have successfully used rasterio on private AWS S3 buckets without having to touch any environment variable, so unless I incorrectly understand your case, any extra configuration should not be necessary.

Best,

Guillaume Lostis


Re: Reading from S3

Sean Gillies
 

Hi,

The following log message catches my eye:
env: AWS_S3_ENDPOINT="us-west-1"
If that is set in your notebook's environment, it will override the value you pass to Env() in your program, and it looks to be incorrect.

On Thu, Sep 5, 2019 at 8:17 AM <hughes.lloyd@...> wrote:

I am trying to read a GeoTIFF from a private AWS S3 bucket. I have configured GDAL and the appropriate files ~/.aws/config and ~/.aws/credentials. I am using a non-standard AWS region as well, so I needed to set the AWS_S3_ENDPOINT environment variable.

I am able to read the GeoTIFF information using both gdalinfo and rio:

$ gdalinfo /vsis3/s1-image-dataset/test.tif
Driver: GTiff/GeoTIFF
Files: /vsis3/s1-image-dataset/test.tif
Size is 33959, 38507
Coordinate System is:
PROJCS["WGS 84 / UTM zone 17N",
....

and using rio:

$ rio info s3://s1-image-dataset/test.tif
{"bounds": [689299.5634174921, 2622862.3065700093, 1028889.5634174921, 3007932.3065700093], "colorinterp": ["gray"], "compress": "deflate", "count": 1, "crs": "EPSG:32617", "descriptions": [null], "driver": "GTiff" ....

However, when I try to read it in a script using the rasterio Python API the I received the following error:

CPLE_OpenFailedError: '/vsis3/s1-image-dataset/test.tif' not recognized as a supported file format.

The code I am using which produced the issues is

import rasterio
path = "s3://s1-image-dataset/test.tif"
with rasterio.Env(AWS_S3_ENDPOINT='s3.<my region>.amazonaws.com'):
    with rasterio.open(path) as f:
        img = f.read()

This is using Python 3.7, rasterio 1.0.25, and GDAL 2.4.2

The problem only occurs when running this in a Jupyter Notebook (Pangeo to be precise) and it appears that Rasterio exits the environment prematurely

DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f97fb41d898>
DEBUG:rasterio.env:Starting outermost env
DEBUG:rasterio.env:No GDAL environment exists
DEBUG:rasterio.env:New GDAL environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> created
DEBUG:rasterio._env:GDAL_DATA found in environment: '/srv/conda/envs/notebook/share/gdal'.
DEBUG:rasterio._env:PROJ_LIB found in environment: '/srv/conda/envs/notebook/share/proj'.
DEBUG:rasterio._env:Started GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>.
DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f97fb41d898>
DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> options
DEBUG:rasterio.env:Entering env context: <rasterio.env.Env object at 0x7f97fb3c5898>
DEBUG:rasterio.env:Got a copy of environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> options
DEBUG:rasterio.env:Entered env context: <rasterio.env.Env object at 0x7f97fb3c5898>
DEBUG:rasterio._base:Sharing flag: 32
DEBUG:rasterio.env:Exiting env context: <rasterio.env.Env object at 0x7f97fb3c5898>
DEBUG:rasterio.env:Cleared existing <rasterio._env.GDALEnv object at 0x7f97fb41d908> options
DEBUG:rasterio._env:Stopped GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>.
DEBUG:rasterio.env:No GDAL environment exists
DEBUG:rasterio.env:New GDAL environment <rasterio._env.GDALEnv object at 0x7f97fb41d908> created
DEBUG:rasterio._env:GDAL_DATA found in environment: '/srv/conda/envs/notebook/share/gdal'.
DEBUG:rasterio._env:PROJ_LIB found in environment: '/srv/conda/envs/notebook/share/proj'.
DEBUG:rasterio._env:Started GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>.
DEBUG:rasterio.env:Exited env context: <rasterio.env.Env object at 0x7f97fb3c5898>
DEBUG:rasterio.env:Exiting env context: <rasterio.env.Env object at 0x7f97fb41d898>
DEBUG:rasterio.env:Cleared existing <rasterio._env.GDALEnv object at 0x7f97fb41d908> options
DEBUG:rasterio._env:Stopped GDALEnv <rasterio._env.GDALEnv object at 0x7f97fb41d908>.
DEBUG:rasterio.env:Exiting outermost env
DEBUG:rasterio.env:Exited env context: <rasterio.env.Env object at 0x7f97fb41d898>
env: AWS_ACCESS_KEY_ID="XXXXXX"
env: AWS_SECRET_ACCESS_KEY="XXXXXXXX"
env: AWS_S3_ENDPOINT="us-west-1"
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_shim.pyx in rasterio._shim.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsis3/s1-image-dataset/test.tif' does not exist in the file system, and is not recognized as a supported dataset name.

--
Sean Gillies


Re: Reading from S3

hughes.lloyd@...
 

I am trying to read a GeoTIFF from a private AWS S3 bucket. I have configured GDAL and the appropriate files ~/.aws/config and ~/.aws/credentials. I am using a non-standard AWS region as well, so I needed to set the AWS_S3_ENDPOINT environment variable.

I am able to read the GeoTIFF information using both gdalinfo and rio:

$ gdalinfo /vsis3/s1-image-dataset/test.tif
Driver: GTiff/GeoTIFF
Files: /vsis3/s1-image-dataset/test.tif
Size is 33959, 38507
Coordinate System is:
PROJCS["WGS 84 / UTM zone 17N",
....

and using rio:

$ rio info s3://s1-image-dataset/test.tif
{"bounds": [689299.5634174921, 2622862.3065700093, 1028889.5634174921, 3007932.3065700093], "colorinterp": ["gray"], "compress": "deflate", "count": 1, "crs": "EPSG:32617", "descriptions": [null], "driver": "GTiff" ....

However, when I try to read it in a script using the rasterio Python API the I received the following error:

CPLE_OpenFailedError: '/vsis3/s1-image-dataset/test.tif' not recognized as a supported file format.

The code I am using which produced the issues is

import rasterio
path = "s3://s1-image-dataset/test.tif"
with rasterio.Env(AWS_S3_ENDPOINT='s3.<my region>.amazonaws.com'):
    with rasterio.open(path) as f:
        img = f.read()

This is using Python 3.7, rasterio 1.0.25, and GDAL 2.4.2

The problem only occurs when running this in a Jupyter Notebook (Pangeo to be precise) and it appears that Rasterio exits the environment prematurely

---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_shim.pyx in rasterio._shim.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsis3/s1-image-dataset/test.tif' does not exist in the file system, and is not recognized as a supported dataset name.






Reading from S3

hughes.lloyd@...
 

I am trying to read a geoTiff from my private S3 bucket (mapping), but am receiving the following error message

CPLE_OpenFailedError: '/vsis3/mapping/bahamas/S1A_20190821T231100.tif' does not exist in the file system, and is not recognized as a supported dataset name
The code I am using to open the GeoTiff is:

with rasterio.Env(session=AWSSession(aws_secret_access_key=S3_SECRET, aws_access_key_id=S3_KEY, region_name="us-west-1")) as env:
    rasterio.open("s3://mapping/bahamas/S1A_20190821T231100.tif")

I am using rasterio version 1.0.25 and the application is single-threaded. The file does exist and I can access it using s3fs and awscli. What am I missing?


Re: rasterio.windows.transform seems to not scale my windows correctly, am I using it wrong?

Sean Gillies
 

Hi Ryan,

I've been on vacation, just now getting the time to answer questions. Answers below.

On Wed, Aug 21, 2019 at 3:13 PM Ryan Avery <ravery@...> wrote:
I have a landsat band that I have windowed so that I now have about 200 512x512 windows in a list called chip_list_full. I have plotted these on top of my aoi to confirm that the windowing worked and found that it did not work as expected following this stack overflow example code where I computed a transform for each window using the original dataset transform. the result is that windows are spatially, diagonally offset from the upper left corner of the original image



each window in the image above was plotted like so, with a unique transform called custom_window_transform:
chips_with_labels = []
fig, ax = plt.subplots()
gdf.plot(ax=ax)
for chip in chip_list_full:
    custom_window_transform = windows.transform(chip[0], band.transform)
    window_bbox = coords.BoundingBox(*windows.bounds(chip[0], custom_window_transform))
    window_poly = rio_bbox_to_polygon(window_bbox)
    gpd.GeoDataFrame(geometry=gpd.GeoSeries(window_poly), crs=band.meta['crs'].to_dict()).plot(ax = ax, cmap='cubehelix')

When I change the custom_window_transform to instead be the original band transform, the result is correct



chips_with_labels = []
fig, ax = plt.subplots()
gdf.plot(ax=ax)
for chip in chip_list_full:
    window_bbox = coords.BoundingBox(*windows.bounds(chip[0], band.transform))
    window_poly = rio_bbox_to_polygon(window_bbox)
    gpd.GeoDataFrame(geometry=gpd.GeoSeries(window_poly), crs=band.meta['crs'].to_dict()).plot(ax = ax, cmap='cubehelix')

where "band.transform" is my original image transform, and the dataset I have windowed.

My question is, what is the purpose of windows.transform and is there some other way I should be using it in this case? Or is my use of the original dataset transform correct?

Thanks for the feedback!

Your use of windows.transform in the first case is correct, and your use of windows.bounds in the second case is correct. Where you went wrong, and probably due to sketchy documentation, is in

    window_bbox = coords.BoundingBox(*windows.bounds(chip[0], custom_window_transform))

The second argument of the windows.bounds function (see https://rasterio.readthedocs.io/en/latest/api/rasterio.windows.html#rasterio.windows.bounds) must be the affine transformation for the "dataset" on which we're applying the given window, not any other affine transformation. I really must explain this better in the docs, sorry about that.

--
Sean Gillies


Re: multi-dimensional support

Sean Gillies
 

Hi Norman, Howard,

I'm going to move this discussion over to https://rasterio.groups.io/g/dev/messages and continue there.


On Fri, Aug 23, 2019 at 11:30 AM Norman Barker <norman.barker@...> wrote:
I was one of the stakeholders for subdataset support in GDAL with netCDF and it worked well with what we were trying to achieve back then, serving regularly gridded time series netcdf data through a WCS, I believe others have used subdataset support in the same way. It was possible to make this work by using external indexes and subdatasets. 

I also agree with your comment that Rasterio is a relatively small project and the code needs to have active users.

The main benefit is a common api for multi-dimensional data access within gdal. Currently using gdalinfo against hdf, netcdf or TileDB requires reading the output to understand the available data, or writing a parser for each of these format driver's metadata. These drivers have no common way to advertise through an API the dimensions and attributes they support. Because implementing subdataset support has been a little adhoc the access patterns are slightly different across drivers, the new api enforces a convention.

Killer features? A couple come to mind; Accessing data cubes with a common api to retrieve data along a z dimension, or sliced by time. These use cases would benefit from being supported in rasterio and using xarray/dask to process multi-dimensional data.

I will create a strawman for the API changes and if you and the community are interested then I can start on the code. 

Norman



On Fri, Aug 23, 2019 at 7:51 AM Howard Butler <howard@...> wrote:


On Aug 23, 2019, at 9:29 AM, Sean Gillies <sean.gillies@...> wrote:


I'm also a bit concerned about the small number of stakeholders for the new GDAL API. It appears to be only the HDF Group (yes?) with only three GDAL TC members voting to adopt it. The rest of the GDAL community seemed ambivalent.

Most folks are ambivalent about multi-dimensional support in GDAL, and they were ambivalent about subdatasets before that (which were a deficient implementation in a number of ways which precipitated the RFC). The RFC moved things forward in a positive direction, and it wasn't just about giving HDFLand a clean mapping to GDAL. It was about giving GDALLand the ability to more easily speak to an additional family of raster-like data. 

GDAL drivers that speak zarr, TileDB, Arrow, and HDF can now be adapted without the miserable compromises that subdatasets required in usability and data fidelity. That will allow people to bring the GDAL geo goodness to their data without reformatting simply to push it through the tool. I think these generic data structures are seeing much more action because they allow data-level interop without special purpose drivers across multiple software runtimes. The winds are blowing the same direction in point cloud land too.

Rasterio is a pretty small project and, in my opinion, can't afford to develop code that isn't going to be widely used.

A completely reasonable position.

Howard




--
Sean Gillies


Rasterio 1.0.26

Sean Gillies
 

Hi all,

Rasterio 1.0.26 wheels and source distribution are on PyPI today. There are eight bug fixes in this release.


Thank you for the reports and discussion, and big thanks to Alan Snow for the contributions around coordinate reference system interoperability.

I've finessed the Linux and OS X wheel builds so that for the first time we can include size and speed optimized GDAL shared libraries. An install of one of the Linux wheels now takes up "only" 52 MB, 39 MB of which are shared libraries. The wheels themselves are down to 15 MB.

Share and enjoy,

--
Sean Gillies


Re: multi-dimensional support

Norman Barker
 

I was one of the stakeholders for subdataset support in GDAL with netCDF and it worked well with what we were trying to achieve back then, serving regularly gridded time series netcdf data through a WCS, I believe others have used subdataset support in the same way. It was possible to make this work by using external indexes and subdatasets. 

I also agree with your comment that Rasterio is a relatively small project and the code needs to have active users.

The main benefit is a common api for multi-dimensional data access within gdal. Currently using gdalinfo against hdf, netcdf or TileDB requires reading the output to understand the available data, or writing a parser for each of these format driver's metadata. These drivers have no common way to advertise through an API the dimensions and attributes they support. Because implementing subdataset support has been a little adhoc the access patterns are slightly different across drivers, the new api enforces a convention.

Killer features? A couple come to mind; Accessing data cubes with a common api to retrieve data along a z dimension, or sliced by time. These use cases would benefit from being supported in rasterio and using xarray/dask to process multi-dimensional data.

I will create a strawman for the API changes and if you and the community are interested then I can start on the code. 

Norman



On Fri, Aug 23, 2019 at 7:51 AM Howard Butler <howard@...> wrote:


On Aug 23, 2019, at 9:29 AM, Sean Gillies <sean.gillies@...> wrote:


I'm also a bit concerned about the small number of stakeholders for the new GDAL API. It appears to be only the HDF Group (yes?) with only three GDAL TC members voting to adopt it. The rest of the GDAL community seemed ambivalent.

Most folks are ambivalent about multi-dimensional support in GDAL, and they were ambivalent about subdatasets before that (which were a deficient implementation in a number of ways which precipitated the RFC). The RFC moved things forward in a positive direction, and it wasn't just about giving HDFLand a clean mapping to GDAL. It was about giving GDALLand the ability to more easily speak to an additional family of raster-like data. 

GDAL drivers that speak zarr, TileDB, Arrow, and HDF can now be adapted without the miserable compromises that subdatasets required in usability and data fidelity. That will allow people to bring the GDAL geo goodness to their data without reformatting simply to push it through the tool. I think these generic data structures are seeing much more action because they allow data-level interop without special purpose drivers across multiple software runtimes. The winds are blowing the same direction in point cloud land too.

Rasterio is a pretty small project and, in my opinion, can't afford to develop code that isn't going to be widely used.

A completely reasonable position.

Howard


641 - 660 of 948