Topics

Implementing -scale functionality of gdal_translate


linden.kyle@...
 

Is there any way to replicate the -scale functionality of gdal_translate in Rasterio to scale the input pixels of a geotiff to a different range?


Sean Gillies
 

Hi,

Rasterio's rio-convert program can linearly scale pixels and apply offsets:

$ rio convert --help
Usage: rio convert [OPTIONS] INPUTS... OUTPUT

  Copy and convert raster datasets to other data types and formats.

  Data values may be linearly scaled when copying by using the --scale-ratio
  and --scale-offset options. Destination raster values are calculated as

    dst = scale_ratio * src + scale_offset

  For example, to scale uint16 data with an actual range of 0-4095 to 0-255
  as uint8:

    $ rio convert in16.tif out8.tif --dtype uint8 --scale-ratio 0.0625

  Format specific creation options may also be passed using --co. To tile a
  new GeoTIFF output file, do the following.

    --co tiled=true --co blockxsize=256 --co blockysize=256

  To compress it using the LZW method, add

    --co compress=LZW

Options:
  -o, --output PATH               Path to output file (optional alternative to
                                  a positional arg).
  -f, --format, --driver TEXT     Output format driver
  -t, --dtype [ubyte|uint8|uint16|int16|uint32|int32|float32|float64]
                                  Output data type.
  --scale-ratio FLOAT             Source to destination scaling ratio.
  --scale-offset FLOAT            Source to destination scaling offset.
  --rgb                           Set RGB photometric interpretation.
  --co, --profile NAME=VALUE      Driver specific creation options.See the
                                  documentation for the selected output driver
                                  for more information.
  --help                          Show this message and exit.

Different forms of scaling are possible if you use the Python API, numpy, and scikit-image.

Hope this helps,

On Fri, Jan 25, 2019 at 10:44 AM <linden.kyle@...> wrote:
Is there any way to replicate the -scale functionality of gdal_translate in Rasterio to scale the input pixels of a geotiff to a different range?



--
Sean Gillies


linden.kyle@...
 

Thanks, Sean. I apologize for not clarifying this, but even though I referenced the gdal translate tool I was looking to do this through the Rasterio API. I was able to take a look at the code for rio convert, and I think I'll be able to figure it out from there.


linden.kyle@...
 

Hi Sean, I'm having some trouble with the implementation of this and was wondering if you might be able to give me a hand. Here's the code I am working with: https://pastebin.com/kecx0D2b. What I'm trying to do is take a geotiff and convert it to a set of TMS-formatted tiles (including reprojecting to web mercator). In this example `tile_bounds` are the bounds of the tile in web mercator. I originally had this code sample working until I attempted to add in the code to do the pixel scaling. Here's the original working code: https://pastebin.com/dZVRVKi2

Here's what the output looks like with the broken code: https://imgur.com/xLtWrGZ
And here's the correct version (but no support for pixel scaling): https://imgur.com/aoNCyIp


linden.kyle@...
 

Ok, so it appears the issue was that I was attempting to do a windowed read outside the bounds of the geotiff (for TMS edge tiles) and `src.read()` was only returning the data inside the bounds since I wasn't using the `boundless=True` option. However, the next problem I am running into is that I'd like to use `fill_value=numpy.nan` to fill in the edge data with NaN values. However, my data is UInt8. Is there any way to specify an option to cast the data read by the call to `read()` into float32 so that I can support this?


Sean Gillies
 

Hi,

I apologize for being slow to reply, I've been occupied with fixing bugs in the last couple of releases.

In theory it is possible to add an option to the read method which will cast the data because GDAL already supports this well in versions 2 and up. I haven't pursued this because converting unsigned int to float is trivial, if not cheap, with numpy:

   data = dataset.read().astype('float64')

And if you pass masked=True to read, you'll get a masked array and then can apply a fill_value of numpy.nan under the invalid data mask. Is this useful for you or have you already been down this path?

GDAL's data type casting would not produce an extra copy of the data and so would be optimal for some applications.


On Tue, Feb 5, 2019 at 11:09 AM <linden.kyle@...> wrote:
Ok, so it appears the issue was that I was attempting to do a windowed read outside the bounds of the geotiff (for TMS edge tiles) and `src.read()` was only returning the data inside the bounds since I wasn't using the `boundless=True` option. However, the next problem I am running into is that I'd like to use `fill_value=numpy.nan` to fill in the edge data with NaN values. However, my data is UInt8. Is there any way to specify an option to cast the data read by the call to `read()` into float32 so that I can support this?



--
Sean Gillies


linden.kyle@...
 

Sean, no problem. Thanks for writing back. I'll keep that in mind for the future. What I ended up doing instead was modifying the tile window transform (which is passed to the reproject method) in the case where either the left or top coordinate is outside the bounds of the geotiff. In that case I change the translation of the transform to the left and/or top bound of the geotiff respectively. Not sure if this is the "right" way to do it or not, but it seems to be working.


linden.kyle@...
 

So I'm seeing some odd behavior now. I ended up trying the masked_array solution, but I'm seeing breaks in my data at the tile boundaries.

Here's what the data looks like if I use reprojection with just the source band (code: https://pastebin.com/dZVRVKi2): https://imgur.com/WpFKfij

Here's what the data looks like I read into a masked array and use an ndarray source (code: https://pastebin.com/z1xd8mXu): https://imgur.com/2QkQQZf
Ironically enough I get the exact same result with my solution that doesn't use a masked array, but instead modifies the transform: https://pastebin.com/7ph1VYWX


Sean Gillies
 

Hi,

On Wed, Feb 6, 2019 at 2:32 PM <linden.kyle@...> wrote:
So I'm seeing some odd behavior now. I ended up trying the masked_array solution, but I'm seeing breaks in my data at the tile boundaries.

Here's what the data looks like if I use reprojection with just the source band (code: https://pastebin.com/dZVRVKi2): https://imgur.com/WpFKfij

Here's what the data looks like I read into a masked array and use an ndarray source (code: https://pastebin.com/z1xd8mXu): https://imgur.com/2QkQQZf
Ironically enough I get the exact same result with my solution that doesn't use a masked array, but instead modifies the transform: https://pastebin.com/7ph1VYWX

Your code looks okay. I recommend padding your read window by a few pixels on each side so that tile_window, tile_data, and tile_transform are slight dilated. I think this is very likely to eliminate what looks like an off-by-one error in the reprojection, an error that you don't see when you use the entire source dataset as the reprojection source. Does that make sense?

--
Sean Gillies


linden.kyle@...
 

That worked perfectly. Thanks for all your help.