Rasterio read outshape resample does not give correct array


tinak.contact@...
 
Edited

Hi all, 

I have a population grid and want to decrease the resolution by a factor 2. So I would like to get a raster that has 2 times less rows and columns, a resolution that is 2 times as large as the original one, while retaining the same overall number of population.
 
This should be a straightforward operation, but the population numbers change incorrectly if I follow the documentation. When the resampled array is written to file, the numbers are incorrect.  
 
Shouldn't the resample method yield an array with resampled cell values to get the same number of total population? Any tips and hints would be appreciated!

factor = 2

with rasterio.open("mwi_ppp_2020.tif") as dataset:

    arr = dataset.read(1,masked=True)
    #Show total population of Malawi 
    print(arr.sum())

    data = dataset.read(1, masked=True, resampling=Resampling.bilinear,
        out_shape=(
            int(dataset.height * factor),
            int(dataset.width * factor)))

     #Show total population of array 
    print(data.sum())

     # scale image transform
    transform = dataset.transform * dataset.transform.scale(
            (dataset.width / data.shape[1]),


            (dataset.height / data.shape[0])
                                )


Hobart, Geordie (NRCan/RNCan)
 

Hello.

I think that your problem is a little more complex than just resampling. The bilinear interpolation averages the surrounding cells to create a smooth image.

You need to sum the four surrounding cells, and populate the new grid with the result.

Not sure exactly how to do it but a clever google should find what you need.

Good Luck.

 

 

From: main@rasterio.groups.io <main@rasterio.groups.io> On Behalf Of tinak.contact@...
Sent: March 16, 2021 14:26
To: main@rasterio.groups.io
Subject: [rasterio] Rasterio read outshape resample does not give correct array

 

[Edited Message Follows]

Hi all, 

I have a population grid and want to decrease the resolution by a factor 2. So I would like to get a raster that has 2 times less rows and columns, a resolution that is 2 times as large as the original one, while retaining the same overall number of population.

 

This should be a straightforward operation, but the population numbers change incorrectly if I follow the documentation. When the resampled array is written to file, the numbers are incorrect. I have a population grid and want to decrease the resolution by a factor 2. So I would like to get a raster that has 2 times less rows and columns, a resolution that is 2 times as large as the original one, while retaining the same number of population.

 

Shouldn't the resample method yield an array with resampled cell values to get the same number of total population? Any tips and hints would be appreciated!

 

factor = 2
 
with rasterio.open("mwi_ppp_2020.tif") as dataset:
 
    arr = dataset.read(1,masked=True)
    #Show total population of Malawi 
    print(arr.sum())
 
    data = dataset.read(1, masked=True, resampling=Resampling.bilinear,
        out_shape=(
            int(dataset.height * factor),
            int(dataset.width * factor)))
 
     #Show total population of array 
    print(data.sum())
 
     # scale image transform
    transform = dataset.transform * dataset.transform.scale(
            (dataset.width / data.shape[1]),

 

            (dataset.height / data.shape[0])
                                )


Sean Gillies
 

Hi,

The "average" resampling algorithm should give you better results than "bilinear". I'm not sure it is guaranteed to preserve the sum, especially in the case where the original dataset's width and height are not multiples of 2.

Documentation on the resampling algorithms can be found here: https://gdal.org/programs/gdal_translate.html#cmdoption-gdal_translate-r. Rasterio uses these same options.


On Tue, Mar 16, 2021 at 4:05 PM <tinak.contact@...> wrote:

[Edited Message Follows]

Hi all, 

I have a population grid and want to decrease the resolution by a factor 2. So I would like to get a raster that has 2 times less rows and columns, a resolution that is 2 times as large as the original one, while retaining the same overall number of population.
 
This should be a straightforward operation, but the population numbers change incorrectly if I follow the documentation. When the resampled array is written to file, the numbers are incorrect.  
 
Shouldn't the resample method yield an array with resampled cell values to get the same number of total population? Any tips and hints would be appreciated!

factor = 2

with rasterio.open("mwi_ppp_2020.tif") as dataset:

    arr = dataset.read(1,masked=True)
    #Show total population of Malawi 
    print(arr.sum())

    data = dataset.read(1, masked=True, resampling=Resampling.bilinear,
        out_shape=(
            int(dataset.height * factor),
            int(dataset.width * factor)))

     #Show total population of array 
    print(data.sum())

     # scale image transform
    transform = dataset.transform * dataset.transform.scale(
            (dataset.width / data.shape[1]),


            (dataset.height / data.shape[0])
                                )
_

--
Sean Gillies


tinak.contact@...
 
Edited

Thank you very much, I found  scikit block_reduce which did the trick, but I was wondering if there are any gdal or rasterio functions who could 
do the same. I found from (here) and ( here) that

GDAL 3.1 or later   does have     -r sum     or -r average

as interpolation method.
However, if I apply the gdal translate command I am not getting the correct numbers. Could this be a problem with my gdal?


 Solution with scikit block_reduce :

from skimage.measure import block_reduce
import matplotlib.pyplot as plt
factor = 2
with rasterio.open("mwi_ppp_2020_UTM.tif") as dataset:
    data = dataset.read(1)
    print('population file', data[np.where(data>0)].sum())
    # give zero to no dataval 
    data=np.where(data<=0,0, data) 
    
     #  reduce by factor 
    x=block_reduce(data, block_size=(factor,factor), func=np.sum)
   
    # mask out no data vals
    x_masked=np.ma.masked_equal(x,0) 
    plt.imshow(x_masked)
    plt.colorbar()
print('population resampled',x_masked.sum())



Using gdal translate -r average 

(decrease by factor 10) 
import subprocess
subprocess.call("""gdal_translate  -tr 0.0083  0.0083  -r average\
                /Users/admin/Downloads/mwi_ppp_2020.tif gdal_test.tif""", shell=True)
Population after 206120.61


Loïc Dutrieux
 

With count data, like population, it only makes sense to aggregate using a sum resampling function (like you did with block_reduce).
Using rasterio you can perform a decimated read (setting out_shape) using resampling=Resampling.sum. I see from the docs that it requires GDAL >= 3.1 though (https://rasterio.readthedocs.io/en/latest/api/rasterio.enums.html#rasterio.enums.Resampling)
It can give unexpected results if the file has precomputed overviews, then no matter which resampling method you specify, rasterio won't be able to ignore them (unless built against GDAL >= 3.3 which is very unlikely at the moment).

cheers,
Loïc
________________________________________
From: main@rasterio.groups.io <main@rasterio.groups.io> on behalf of tinak.contact@... <tinak.contact@...>
Sent: 17 March 2021 07:06:04
To: main@rasterio.groups.io
Subject: Re: [rasterio] Rasterio read outshape resample does not give correct array

[Edited Message Follows]
[Reason: added gdal option ]

Thank you very much, I found scikit block_reduce<https://urldefense.com/v3/__https://scikit-image.org/docs/dev/api/skimage.measure.html*block-reduce__;Iw!!DOxrgLBm!SYQfUIkp3oT6I4nkx5rhk0VhOIXLuJc03ZvgBB-xB81wfd0QZgYJdbZuqh9ZDbxz4GMJQqQ$> which did the trick, but I was wondering if there are any gdal or rasterio functions who could
do the same. I found from (here<https://urldefense.com/v3/__https://gis.stackexchange.com/questions/152661/downsampling-geotiff-using-summation-gdal-numpy__;!!DOxrgLBm!SYQfUIkp3oT6I4nkx5rhk0VhOIXLuJc03ZvgBB-xB81wfd0QZgYJdbZuqh9ZDbxz3Jieug0$>) and ( here<https://urldefense.com/v3/__https://gdal.org/programs/gdalwarp.html__;!!DOxrgLBm!SYQfUIkp3oT6I4nkx5rhk0VhOIXLuJc03ZvgBB-xB81wfd0QZgYJdbZuqh9ZDbxzYF_dZ3o$>) that
GDAL 3.1 or later does have -r sum or -r average
as interpolation method.
However, if I apply the gdal translate command I am not getting the correct numbers. Could this be a problem with my gdal?


Solution with scikit block_reduce :<https://urldefense.com/v3/__https://scikit-image.org/docs/dev/api/skimage.measure.html*block-reduce__;Iw!!DOxrgLBm!SYQfUIkp3oT6I4nkx5rhk0VhOIXLuJc03ZvgBB-xB81wfd0QZgYJdbZuqh9ZDbxz4GMJQqQ$>

from skimage.measure import block_reduce
import matplotlib.pyplot as plt
factor = 2
with rasterio.open("mwi_ppp_2020_UTM.tif") as dataset:
data = dataset.read(1)
print('population file', data[np.where(data>0)].sum())
# give zero to no dataval
data=np.where(data<=0,0, data)

# reduce by factor
x=block_reduce(data, block_size=(factor,factor), func=np.sum)

# mask out no data vals
x_masked=np.ma.masked_equal(x,0)
plt.imshow(x_masked)
plt.colorbar()
print('population resampled',x_masked.sum())
[cid:attach_0_166D0C1E101B79CD_3895@groups.io]


Using gdal translate -r average

(decrease by factor 10)
import subprocess
subprocess.call("""gdal_translate -tr 0.0083 0.0083 -r average\
/Users/admin/Downloads/mwi_ppp_2020.tif gdal_test.tif""", shell=True)

Population after 206120.61