WarpedVRT and mixed data types


Thomas Maschler
 

Hi,

I noticed some odd behavior when using WarpedVRT in combination with a VRT with mixed Data Types. 

When trying to open the mixed data type raster using rasterio, it throws an error due to no support for mixed data types. However, when trying to open the same dataset as WarpedVRT, I do not get such error. Instead it returns an all-zero array.

I would either expect to get the same error message, receive an array with all data cast to the largest data type or receive an array with mixed data types.

 

import rasterio
import subprocess
from affine import Affine

from rasterio.vrt import WarpedVRT



# First raster saved as uint16
a = np.array([[0,1,2,3,4]]).astype("uint16")

with rasterio.open("uint16.tif", "w", width=5, height=1, count=1, dtype=rasterio.uint16, nodata=0) as dst:

    dst.write(a, 1)

# Second raster saved as int32
b = np.array([[1,2,3,4,0]]).astype("int32")

with rasterio.open("int32.tif", "w", width=5, height=1, count=1, dtype=rasterio.int32, nodata=0) as dst:

    dst.write(b, 1)

# create a VRT combining both datasets into a two band raster using gdalbuildvrt

subprocess.run(["gdalbuildvrt", "-separate", "mixed_datatype.tif", "uint16.tif", "int32.tif"], capture_output=True)

# Try to open the two band raster using rasterio

with rasterio.open("mixed_datatype.tif") as src:

    print(src.profile)

    print(src.read())  # This throws an error due to mixed data types

{'driver': 'VRT', 'dtype': 'uint16', 'nodata': 0.0, 'width': 5, 'height': 1, 'count': 2, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0, 0.0, 1.0, 0.0), 'tiled': False}

Traceback (most recent call last):

  File "<input>", line 3, in <module>

  File "rasterio/_io.pyx", line 278, in rasterio._io.DatasetReaderBase.read

ValueError: more than one 'dtype' found

 

# Try to open the two band raster using WarpedVRT

with rasterio.open("mixed_datatype.tif") as src:

    with WarpedVRT(src, crs="EPSG:4326", transform=transform, width=5, height=1) as vrt:

        print(vrt.profile)

        print(vrt.read())  # This does not throw an error but returns an array with only zeros

    

{'driver': 'VRT', 'dtype': 'int32', 'nodata': 0.0, 'width': 5, 'height': 1, 'count': 2, 'crs': CRS.from_epsg(4326), 'transform': Affine(1.0, 0.0, 0.0,

       0.0, -1.0, 0.0), 'tiled': False}

[[[0 0 0 0 0]]

 [[0 0 0 0 0]]]




Sean Gillies
 

Hi,

Thanks for bringing this to the list! More response below.

On Wed, Jul 7, 2021 at 9:01 AM Thomas Maschler <thomas.maschler@...> wrote:

Hi,

I noticed some odd behavior when using WarpedVRT in combination with a VRT with mixed Data Types. 

When trying to open the mixed data type raster using rasterio, it throws an error due to no support for mixed data types. However, when trying to open the same dataset as WarpedVRT, I do not get such error. Instead it returns an all-zero array.

I would either expect to get the same error message, receive an array with all data cast to the largest data type or receive an array with mixed data types.

 

import rasterio
import subprocess
from affine import Affine

from rasterio.vrt import WarpedVRT



# First raster saved as uint16
a = np.array([[0,1,2,3,4]]).astype("uint16")

with rasterio.open("uint16.tif", "w", width=5, height=1, count=1, dtype=rasterio.uint16, nodata=0) as dst:

    dst.write(a, 1)

# Second raster saved as int32
b = np.array([[1,2,3,4,0]]).astype("int32")

with rasterio.open("int32.tif", "w", width=5, height=1, count=1, dtype=rasterio.int32, nodata=0) as dst:

    dst.write(b, 1)

# create a VRT combining both datasets into a two band raster using gdalbuildvrt

subprocess.run(["gdalbuildvrt", "-separate", "mixed_datatype.tif", "uint16.tif", "int32.tif"], capture_output=True)

# Try to open the two band raster using rasterio

with rasterio.open("mixed_datatype.tif") as src:

    print(src.profile)

    print(src.read())  # This throws an error due to mixed data types

{'driver': 'VRT', 'dtype': 'uint16', 'nodata': 0.0, 'width': 5, 'height': 1, 'count': 2, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0, 0.0, 1.0, 0.0), 'tiled': False}

Traceback (most recent call last):

  File "<input>", line 3, in <module>

  File "rasterio/_io.pyx", line 278, in rasterio._io.DatasetReaderBase.read

ValueError: more than one 'dtype' found

 

# Try to open the two band raster using WarpedVRT

with rasterio.open("mixed_datatype.tif") as src:

    with WarpedVRT(src, crs="EPSG:4326", transform=transform, width=5, height=1) as vrt:

        print(vrt.profile)

        print(vrt.read())  # This does not throw an error but returns an array with only zeros

    

{'driver': 'VRT', 'dtype': 'int32', 'nodata': 0.0, 'width': 5, 'height': 1, 'count': 2, 'crs': CRS.from_epsg(4326), 'transform': Affine(1.0, 0.0, 0.0,

       0.0, -1.0, 0.0), 'tiled': False}

[[[0 0 0 0 0]]

 [[0 0 0 0 0]]]


Rasterio has no support right now for datasets with mixed data types. This certainly does pose a problem for using VRTs such as the one above.

In the first case, an exception is raised because read() can't return an array of mixed types. This doesn't exist in numpy. And also because GDAL's GDALDatasetRasterIO only accepts a single data type: https://gdal.org/api/raster_c_api.html#_CPPv419GDALDatasetRasterIO12GDALDatasetH10GDALRWFlagiiiiPvii12GDALDataTypeiPiiii.

In the second case I am surprised that the same exception is not raised. It may be because GDALCreateWarpedVRT (called by rasterio's WarpedVRT constructor) makes a mistake in determining the data types of the source bands and assigns "int32" to all of them. To check, I ran your code on my laptop (rasterio 1.2.6 and GDAL 3.3.0) with the addition of lines to print out src.dtypes and vrt.dtypes. I *do* get the same exception in both cases. Are you using different versions of rasterio and GDAL? There were a number of VRT-related fixes in 3.3.0.

--
Sean Gillies


Even Rouault
 


In the second case I am surprised that the same exception is not raised. It may be because GDALCreateWarpedVRT (called by rasterio's WarpedVRT constructor) makes a mistake in determining the data types of the source bands and assigns "int32" to all of them.

The warping kernel only supports one single working data type for all bands. The  GDALWarpResolveWorkingDataType() function

https://github.com/OSGeo/gdal/blob/c6f0ea7d8a4da08e0d633eb381677c23f3934be3/gdal/alg/gdalwarper.cpp#L1439

will select the "largest" data type from input bands.

Even

--

http://www.spatialys.com
My software is free, but my time generally not.