Date   

Re: rasterio Google Cloud Storage (GCS) access

Sean Gillies
 

Hi,

Your GDAL library needs to be built on a curl library that is itself built with SSL support. I think that the GDAL and curl packages in most distros will meet this requirement, but if you are compiling GDAL and curl from source you'll need to assure something like


for curl and


for GDAL.

Hope this helps!

On Sat, Dec 8, 2018 at 3:40 PM <madhav@...> wrote:

Hi 

I forked and am trying to add support for rasterio and rio to handle files on Google Cloud Storage. I made changes to path.py and added remote schemes to include gs:// type urls similar to AWS S3 support. I tested it locally on macOS, the below command works:

rio info gs://pdd-stac/disasters/hurricane-harvey/0831/20170831_172754_101c_3b_Visual.tif`through the container.

But when I package rasterio in Docker container try through a container. I get the following error. Any ideas what I maybe doing wrong?

ERROR 6: CPLRSASHA256Sign() not implemented: GDAL must be built against libcrypto++ or libcrypto (openssl)


--
Sean Gillies


rasterio Google Cloud Storage (GCS) access

Madhav Desetty
 

Hi 

I forked and am trying to add support for rasterio and rio to handle files on Google Cloud Storage. I made changes to path.py and added remote schemes to include gs:// type urls similar to AWS S3 support. I tested it locally on macOS, the below command works:

rio info gs://pdd-stac/disasters/hurricane-harvey/0831/20170831_172754_101c_3b_Visual.tif`through the container.

But when I package rasterio in Docker container try through a container. I get the following error. Any ideas what I maybe doing wrong?

ERROR 6: CPLRSASHA256Sign() not implemented: GDAL must be built against libcrypto++ or libcrypto (openssl)


Re: Best way of doing concurrent window processing

Sean Gillies
 

Hi Pablo,

I really appreciate your analysis of the concurrency example! Things have changed quite a bit in the newest versions of Python since it was written and it needs an update.

While researching the non-laziness of ThreadPoolExecutor.map I discovered a few interesting tickets in the Python tracker. One https://bugs.python.org/issue34168 has a suggestion from Tim Peters for chunking the input and using a combination of c.f.ThreadPoolExecutor.submit and c.f.as_completed as in this example: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example. The chunk size (not to be confused with c.f.ThreadPoolExecutor.map's chunksize) would set the upper limit on memory consumption. This could be worth a try. I'm going to try it soon myself.

I feel like introducing locks is a step back from the nice c.f. abstraction to raw threading. However, an example that used threading only could be nice for users that are still on Python 2.7.

Yours,

On Tue, Dec 4, 2018 at 1:26 PM Pablo Sanfilippo <sanfilippopablo@...> wrote:

I've been looking for the best way of processing a large raster files while achieving both maximum processing parallelism/speed and a low memory footprint. To achieve a low memory footprint, is enough to iterate over windows and read, process and write in the same loop:

with rasterio.open(infile, 'r') as src:
    with rasterio.open(outfile, 'w', **src.profile) as dst:
        for ij, window in src.block_windows():
            array = src.read(window=window)
            result = compute(array)
            dst.write(result, window=window)

This has a low memory footprint: it has only one window in memory at a time. But it doesn't have concurrency.

So I looked at the concurrency examples (the one using asyncio coroutines and the one using ThreadPoolExecutor), which look like this:

with rasterio.open(infile, 'r') as src:
        with rasterio.open(outfile, "w", **src.profile) as dst:
            windows = [window for ij, window in dst.block_windows()]
            data_gen = (src.read(window=window) for window in windows)

            with concurrent.futures.ThreadPoolExecutor(
                max_workers=num_workers
            ) as executor:
                for window, result in zip(windows, executor.map(compute, data_gen)):
                    dst.write(result, window=window)

Now we have computing concurrency. The comment on the original example made it seem like read -> compute -> write was being done in a streaming manner, overlapping all those operations:

We map the compute() function over the raster data generator, zip the resulting iterator with the windows list, and as pairs come back we write data to the destination dataset.

Which is not true. A verbose test showed that computing was being overlap with reading and also (later) with writing, but reading and writing wasn't being overlapped, meaning that the whole file was in memory at some point. Monitoring of memory usage confirmed this.

This made more sense after reading the documentation of ThreadPoolExecutor.mapthe iterables are collected immediately rather than lazily. So it means that it will read the data_gen iterator eagerly, putting all the data in memory if necessary.

The example that uses asyncio coroutines shows the same behavior.

My next idea was to put the read, compute, and write operation all together in the function being passed to ThreadPoolExecutor.map so they can be overlapped too, like so:

with rasterio.open(infile) as src:
    with rasterio.open(outfile, "w", **src.profile) as dst:
        windows = [window for ij, window in dst.block_windows()]

        def process(window):
            src_array = src.read(window=window)
            result = compute(src_array)
            dst.write(result, window=window)

        with concurrent.futures.ThreadPoolExecutor(
            max_workers=num_workers
        ) as executor:
            executor.map(process, windows)

This obviously caused a race condition because reading and writing are not thread safe operations, so the output was filled with GDAL read errors.

So I fixed that by using locks for the read and write operations:

with rasterio.open(infile) as src:
    with rasterio.open(outfile, "w", **src.profile) as dst:
        windows = [window for ij, window in dst.block_windows()]

        read_lock = threading.Lock()
        write_lock = threading.Lock()

        def process(window):
            with read_lock:
                src_array = src.read(window=window)
            result = compute(src_array)
            with write_lock:
                dst.write(result, window=window)

        with concurrent.futures.ThreadPoolExecutor(
            max_workers=num_workers
        ) as executor:
            executor.map(process, windows)

That fixed the race conditions. Computing is now done in parallel, reading and writing is done in a streaming way, and max memory footprint is window_array_size * num_threads.

My questions are:

  • Is this the right way to solve this problem? Am I missing something?
  • If this is the right way, would you accept a PR with a topic write up about this (maybe enhancements in the concurrency topic)?



--
Sean Gillies


Re: Best way of doing concurrent window processing

Matthew Perry
 

Pablo,

I think your approach would make a great example and addition to the concurrency docs. Being able to do input, output and computation in parallel is a huge benefit. The locks seems effective at avoiding race conditions and I get a decent speedup (~3x on 4 cores) for a few test cases. So while I don't think there is or ever will be a single "right" way to solve concurrency, this is certainly a very viable option.

The only potential issue I see is the `compute` function is required release the GIL to get the full benefit. Most numpy array methods do this but if you've got additional calculations in pure python or call non-GIL-aware C functions, there's a chance that compute will block. If blocking computation outweighs IO significantly, you might not see much performance benefit if at all. Those users would be better served by some form of multiprocessing. 

It's hard to know if any arbitrary python code will block the GIL - making it tricky to predict the performance characteristics of this technique. I think a note in the docs re: the compute function and the GIL would be sufficient to warn users of any pitfalls. 

- Matt


Best way of doing concurrent window processing

Pablo Sanfilippo
 

I've been looking for the best way of processing a large raster files while achieving both maximum processing parallelism/speed and a low memory footprint. To achieve a low memory footprint, is enough to iterate over windows and read, process and write in the same loop:

with rasterio.open(infile, 'r') as src:
    with rasterio.open(outfile, 'w', **src.profile) as dst:
        for ij, window in src.block_windows():
            array = src.read(window=window)
            result = compute(array)
            dst.write(result, window=window)

This has a low memory footprint: it has only one window in memory at a time. But it doesn't have concurrency.

So I looked at the concurrency examples (the one using asyncio coroutines and the one using ThreadPoolExecutor), which look like this:

with rasterio.open(infile, 'r') as src:
        with rasterio.open(outfile, "w", **src.profile) as dst:
            windows = [window for ij, window in dst.block_windows()]
            data_gen = (src.read(window=window) for window in windows)

            with concurrent.futures.ThreadPoolExecutor(
                max_workers=num_workers
            ) as executor:
                for window, result in zip(windows, executor.map(compute, data_gen)):
                    dst.write(result, window=window)

Now we have computing concurrency. The comment on the original example made it seem like read -> compute -> write was being done in a streaming manner, overlapping all those operations:

We map the compute() function over the raster data generator, zip the resulting iterator with the windows list, and as pairs come back we write data to the destination dataset.

Which is not true. A verbose test showed that computing was being overlap with reading and also (later) with writing, but reading and writing wasn't being overlapped, meaning that the whole file was in memory at some point. Monitoring of memory usage confirmed this.

This made more sense after reading the documentation of ThreadPoolExecutor.mapthe iterables are collected immediately rather than lazily. So it means that it will read the data_gen iterator eagerly, putting all the data in memory if necessary.

The example that uses asyncio coroutines shows the same behavior.

My next idea was to put the read, compute, and write operation all together in the function being passed to ThreadPoolExecutor.map so they can be overlapped too, like so:

with rasterio.open(infile) as src:
    with rasterio.open(outfile, "w", **src.profile) as dst:
        windows = [window for ij, window in dst.block_windows()]

        def process(window):
            src_array = src.read(window=window)
            result = compute(src_array)
            dst.write(result, window=window)

        with concurrent.futures.ThreadPoolExecutor(
            max_workers=num_workers
        ) as executor:
            executor.map(process, windows)

This obviously caused a race condition because reading and writing are not thread safe operations, so the output was filled with GDAL read errors.

So I fixed that by using locks for the read and write operations:

with rasterio.open(infile) as src:
    with rasterio.open(outfile, "w", **src.profile) as dst:
        windows = [window for ij, window in dst.block_windows()]

        read_lock = threading.Lock()
        write_lock = threading.Lock()

        def process(window):
            with read_lock:
                src_array = src.read(window=window)
            result = compute(src_array)
            with write_lock:
                dst.write(result, window=window)

        with concurrent.futures.ThreadPoolExecutor(
            max_workers=num_workers
        ) as executor:
            executor.map(process, windows)

That fixed the race conditions. Computing is now done in parallel, reading and writing is done in a streaming way, and max memory footprint is window_array_size * num_threads.

My questions are:

  • Is this the right way to solve this problem? Am I missing something?
  • If this is the right way, would you accept a PR with a topic write up about this (maybe enhancements in the concurrency topic)?


Re: GDALRasterizeGeometries - Function Signature

jdcasta@...
 

Yeah I thought so. FYI the code you linked is the same one I embedded above. It looks like it really is just relying upon the in memory raster already having a configured transform associated with it.  I did the same to get it to work like so:

        GDALDatasetH memDS = GDALCreate( GDALGetDriverByName("MEM"), proc_num_str, cols, rows, 0, GDT_Byte, NULL );
        // Set transform
        reinterpret_cast<GDALDataset*>(memDS)->SetGeoTransform(transform);

Here are set the in memory raster dataset to use the specified transform (double [6]). Now I can call GDALRasterizeGeometries with NULL for pTransform and pTransformArg and it is handled appropriately.

Thanks!


Re: GDALRasterizeGeometries - Function Signature

Sean Gillies
 

You're right, the transform that Rasterio passes to GDALRasterizeGeometries is not used. Worse, there looks like risk of corruption. I'm going to change Rasterio to pass NULL instead.

When GDALRasterizeGeometries is called without a transformer, the following code is executed: https://github.com/OSGeo/gdal/blob/master/gdal/alg/gdalrasterize.cpp#L686-L710. Therein you can see how a transformer is constructed and configured.


On Mon, Dec 3, 2018 at 8:52 PM <jdcasta@...> wrote:
Hi this is a little bit  of a strange question about how rasterio is using the GDALRasterizeGeometries function. This c-function is called in _features.pyx here:

with InMemoryRaster(image=image, transform=transform) as mem:
  exc_wrap_int(
  GDALRasterizeGeometries(
  mem.handle(), 1, mem.band_ids,num_geoms, geoms, NULL,
  mem.gdal_transform, pixel_values, options, NULL, NULL))

My question has to do with the bolded arguments, 6 & 7. The python function docstring expresses the notion that the affine matrix, transform, is used convert between the shapes coordinate system to the pixels provided in image. However looking at the CPP source code for this function I see this function declaration:


CPLErr GDALRasterizeGeometries ( GDALDatasetH  hDS,
    int  nBandCount,
    int *  panBandList,
    int  nGeomCount,
    OGRGeometryH *  pahGeometries,
    GDALTransformerFunc  pfnTransformer,
    void *  pTransformArg,
    double *  padfGeomBurnValue,
    char **  papszOptions,
    GDALProgressFunc  pfnProgress,
    void *  pProgressArg 
  )

As you can see its it seems rasterio is passing NULL to pfnTransformer and the affine transform to pTransformArg. This pTransformArg is not really meant to be used for the affine matrix, but as an argument to a valid pfnTransformer.

It *think* everything still works because of this code here that checks the arguments. If it sees that pfnTransformer is NULL it will automatically try to get the transform from the raster dataset. But I have not idead how in memory dataset will have that information
so I keep thinking this theory must be wrong.
if( pfnTransformer == nullptr )
  {
  bNeedToFreeTransformer = true;
   
  char** papszTransformerOptions = nullptr;
  double adfGeoTransform[6] = { 0.0 };
  if( poDS->GetGeoTransform( adfGeoTransform ) != CE_None &&
  poDS->GetGCPCount() == 0 &&
  poDS->GetMetadata("RPC") == nullptr )
  {
  papszTransformerOptions = CSLSetNameValue(
  papszTransformerOptions, "DST_METHOD", "NO_GEOTRANSFORM");
  }
   
  pTransformArg =
  GDALCreateGenImgProjTransformer2( nullptr, hDS,
  papszTransformerOptions );
  CSLDestroy( papszTransformerOptions );
   
  pfnTransformer = GDALGenImgProjTransform;
  if( pTransformArg == nullptr )
  {
  return CE_Failure;
  }
  }


In the end I am just trying to figure out how in the world you are passing just the affine transform into the  GDALRasterizeGeometries as  pTransformArg.

PS. I'm used to github formatting so this has been a little different to type up!
In the end I was just trying to write some CPP code that is similar to rasterio for rasterizing freatures (in memory).



--
Sean Gillies


GDALRasterizeGeometries - Function Signature

jdcasta@...
 

Hi this is a little bit  of a strange question about how rasterio is using the GDALRasterizeGeometries function. This c-function is called in _features.pyx here:

with InMemoryRaster(image=image, transform=transform) as mem:
  exc_wrap_int(
  GDALRasterizeGeometries(
  mem.handle(), 1, mem.band_ids,num_geoms, geoms, NULL,
  mem.gdal_transform, pixel_values, options, NULL, NULL))

My question has to do with the bolded arguments, 6 & 7. The python function docstring expresses the notion that the affine matrix, transform, is used convert between the shapes coordinate system to the pixels provided in image. However looking at the CPP source code for this function I see this function declaration:


CPLErr GDALRasterizeGeometries ( GDALDatasetH  hDS,
    int  nBandCount,
    int *  panBandList,
    int  nGeomCount,
    OGRGeometryH *  pahGeometries,
    GDALTransformerFunc  pfnTransformer,
    void *  pTransformArg,
    double *  padfGeomBurnValue,
    char **  papszOptions,
    GDALProgressFunc  pfnProgress,
    void *  pProgressArg 
  )

As you can see its it seems rasterio is passing NULL to pfnTransformer and the affine transform to pTransformArg. This pTransformArg is not really meant to be used for the affine matrix, but as an argument to a valid pfnTransformer.

It *think* everything still works because of this code here that checks the arguments. If it sees that pfnTransformer is NULL it will automatically try to get the transform from the raster dataset. But I have not idead how in memory dataset will have that information
so I keep thinking this theory must be wrong.
if( pfnTransformer == nullptr )
  {
  bNeedToFreeTransformer = true;
   
  char** papszTransformerOptions = nullptr;
  double adfGeoTransform[6] = { 0.0 };
  if( poDS->GetGeoTransform( adfGeoTransform ) != CE_None &&
  poDS->GetGCPCount() == 0 &&
  poDS->GetMetadata("RPC") == nullptr )
  {
  papszTransformerOptions = CSLSetNameValue(
  papszTransformerOptions, "DST_METHOD", "NO_GEOTRANSFORM");
  }
   
  pTransformArg =
  GDALCreateGenImgProjTransformer2( nullptr, hDS,
  papszTransformerOptions );
  CSLDestroy( papszTransformerOptions );
   
  pfnTransformer = GDALGenImgProjTransform;
  if( pTransformArg == nullptr )
  {
  return CE_Failure;
  }
  }


In the end I am just trying to figure out how in the world you are passing just the affine transform into the  GDALRasterizeGeometries as  pTransformArg.

PS. I'm used to github formatting so this has been a little different to type up!
In the end I was just trying to write some CPP code that is similar to rasterio for rasterizing freatures (in memory).


Re: Rasterio result different than gdal_calc

anand@...
 

Hi Sean,

Sure, I will try those changes you have mentioned. here is the gdal_calc expression I used :

gdal_calc.py -A T1_B5.TIF --A_band=1 -B T1_B4.TIF --B_band=1 --outfile=ndvi_gal_1.TIF --calc='(A.astype(float)-B)/(A.astype(float)+B)' --type='Float32'
--co='COMPRESS=LZW'


Re: Rasterio result different than gdal_calc

Sean Gillies
 

Hi,

Can you also include the expressions that you're using with gdal_calc? Without them we can only guess at the cause of the difference in the results. I don't see any big problems with the way you're using Rasterio. I have a couple comments below:

On Wed, Nov 21, 2018 at 3:35 PM anand via Groups.Io <anand=geospoc.com@groups.io> wrote:
I am trying to recreate gdal_calc for simple calculation using below code.

import numpy
import rasterio
def merge_bands(calculationType, band1Image, band2Image, outputFilePath):
# Read raster bands directly to Numpy arrays.
#
# We handle the connections with "with"
with rasterio.open(band1Image) as dsRed:
bandRed = dsRed.read(1)

You could convert this to a float32 now instead of doing it twice in a statement below.

 
with rasterio.open(band2Image) as dsNIR:
bandNIR = dsNIR.read(1)

Same comment.

 
# Allow division by zero
numpy.seterr(divide='ignore', invalid='ignore')

Unless you're doing this in gdal_calc too, this could be a cause of differences. I think it would be better to add keyword argument `masked=True` when you read the source bands. This will give you masked arrays that are safer to work when calculating the NDVI.

ndvi = (bandNIR.astype(float)-bandRed.astype(float))/(bandNIR.astype(float) + bandRed.astype(float))

If you didn't specify `astype(float)` here, Numpy would convert to float64 as soon as it performed the division operation, and would return a float64 ndarray.
 
kwargs = dsRed.meta
kwargs.update(dtype=rasterio.float32, count=1, compress='lzw')
with rasterio.open(outputFilePath, 'w', **kwargs) as dst:
dst.write_band(1, ndvi.astype(rasterio.float32))

But the result of this script is different than gdal_calc:

gdal_cal result range -0.1 - +0.09
rasterio result range 0.02 - 0.1

Like I said, we need to see the gdal_calc expression. I suspect nodata pixels are involved.

--
Sean Gillies


Rasterio result different than gdal_calc

anand@...
 

I am trying to recreate gdal_calc for simple calculation using below code.

import numpy
import rasterio
def merge_bands(calculationType, band1Image, band2Image, outputFilePath):
# Read raster bands directly to Numpy arrays.
#
# We handle the connections with "with"
with rasterio.open(band1Image) as dsRed:
bandRed = dsRed.read(1)
 
with rasterio.open(band2Image) as dsNIR:
bandNIR = dsNIR.read(1)
 
# Allow division by zero
numpy.seterr(divide='ignore', invalid='ignore')
ndvi = (bandNIR.astype(float)-bandRed.astype(float))/(bandNIR.astype(float) + bandRed.astype(float))
kwargs = dsRed.meta
kwargs.update(dtype=rasterio.float32, count=1, compress='lzw')
with rasterio.open(outputFilePath, 'w', **kwargs) as dst:
dst.write_band(1, ndvi.astype(rasterio.float32))

But the result of this script is different than gdal_calc:

gdal_cal result range -0.1 - +0.09
rasterio result range 0.02 - 0.1
 


Re: Unclear how to implement image alignment based on GCPs

Sean Gillies
 

Hi,

On Tue, Nov 6, 2018 at 10:04 PM <mmweene15@...> wrote:
Hi, thanks for your response. Let me provide some background. I have two sets of images that capture the same scene but were taken using separate cameras. This causes some degree of GPS discrepancy between the two sets of images, and the task is to align them based on GCPs generated either manually or automatically (irrelevant for this problem). Maybe I have completely misunderstood the purpose of the `reproject` function. I have included a sample of my GCPs below:

What an interesting problem. The reproject() function is mainly for cartographic reprojection, but it might be useful in your case.
 

`GroundControlPoint(row=array(357), col=array(682), x=array(22.67637871), y=array(-33.94166626), z=0.0, id='45e0b74c-54a4-48c8-b876-1dfcc184c6c8')`

The row and column reference the position in the aligned image that corresponds to the x and y coordinates in the misaligned image. Perhaps my understanding of how to construct this is wrong as well? The documentation doesn't seem to cover this case all that clearly.

The row, col, x, and y for a GroundControlPoint need to be floats, not arrays of floats. I'm sorry that the documentation of this class is inadequate!


Thanks for your help.

You're welcome! I think that adjusting the GCPs may help, but may not be entirely enough.

--
Sean Gillies


Re: Bug: 'JPEG2000' is not a valid Compression

Sean Gillies
 

Hi, 

This is indeed a bug. I would be much obliged if you reported it in the Rasterio issue tracker.

I haven't used NITF in a long time. I see in https://www.gdal.org/frmt_nitf.html that there's a lack of symmetry in the compression creation option name ("IC"), values ("C3", "M3", "C8"), and what gets stored in the metadata ("COMPRESSION" and "JPEG2000"). This will complicate things a bit, more than in the GeoTIFF situation where I mostly work.


On Wed, Nov 7, 2018 at 12:17 PM <nicholas.maxwell@...> wrote:
Is this the right place to report a bug? I can open an issue on github if you like.

I'm reading a NITF, and getting this error. Should add 'JPEG2000' to the Compression enum?

In[14]: src.tags(ns='IMAGE_STRUCTURE')
Out[14]: {'COMPRESSION': 'JPEG2000'}

In[15]: src.compression
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 840, in rasterio._base.DatasetBase.compression.__get__
  File "/lego/lib/python3.7/enum.py", line 307, in __call__
    return cls.__new__(cls, value)
  File "/lego/lib/python3.7/enum.py", line 555, in __new__
    return cls._missing_(value)
  File "/lego/lib/python3.7/enum.py", line 568, in _missing_
    raise ValueError("%r is not a valid %s" % (value, cls.__name__))
ValueError: 'JPEG2000' is not a valid Compression

thanks



--
Sean Gillies


Bug: 'JPEG2000' is not a valid Compression

nicholas.maxwell@...
 

Is this the right place to report a bug? I can open an issue on github if you like.

I'm reading a NITF, and getting this error. Should add 'JPEG2000' to the Compression enum?

In[14]: src.tags(ns='IMAGE_STRUCTURE')
Out[14]: {'COMPRESSION': 'JPEG2000'}

In[15]: src.compression
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 840, in rasterio._base.DatasetBase.compression.__get__
  File "/lego/lib/python3.7/enum.py", line 307, in __call__
    return cls.__new__(cls, value)
  File "/lego/lib/python3.7/enum.py", line 555, in __new__
    return cls._missing_(value)
  File "/lego/lib/python3.7/enum.py", line 568, in _missing_
    raise ValueError("%r is not a valid %s" % (value, cls.__name__))
ValueError: 'JPEG2000' is not a valid Compression

thanks


Re: Make _boundless_vrt_doc public

Guy Doulberg
 

Hi Sean

I can't find the dev-group to discuss this issue, where can I find it?

Thanks, Guy


Re: Unclear how to implement image alignment based on GCPs

mmweene15@...
 

Hi, thanks for your response. Let me provide some background. I have two sets of images that capture the same scene but were taken using separate cameras. This causes some degree of GPS discrepancy between the two sets of images, and the task is to align them based on GCPs generated either manually or automatically (irrelevant for this problem). Maybe I have completely misunderstood the purpose of the `reproject` function. I have included a sample of my GCPs below:

`GroundControlPoint(row=array(357), col=array(682), x=array(22.67637871), y=array(-33.94166626), z=0.0, id='45e0b74c-54a4-48c8-b876-1dfcc184c6c8')`

The row and column reference the position in the aligned image that corresponds to the x and y coordinates in the misaligned image. Perhaps my understanding of how to construct this is wrong as well? The documentation doesn't seem to cover this case all that clearly.

Thanks for your help.


Re: Unclear how to implement image alignment based on GCPs

Sean Gillies
 

Hi,

I'm having trouble understanding the intent of code. It looks like you are calling reproject() with source and destination arrays of the same shape, with the same coordinate reference system on each side of the reprojection. I'm not what we should expect in this situation; we are not currently testing for this. Can you write back and show your GCPs?


Unclear how to implement image alignment based on GCPs

mmweene15@...
 

I am trying to replicated the gdal functionality for performing image alignment based on GCPs. The equivalent gdal commands are the following:

`gdal_translate -of GTiff -gcp GCP1 -gcp GCP2... src_raster dst_raster`
`gdalwarp -s_srs S_SRS -t_srs T_SRS -tps dst_raster dst_raster2`

This would create a new raster, `dst_raster2` that has been reprojected according to the specified GCPs. It is unclear, based on the documentation, how to achieve this using the rasterio.warp module. I have tried using the rasterio.warp.reproject function to no avail. I've included a snippet of code below. The resulting image does not match the image produced by gdal. Any guidance would be appreciated. Note that the variables are descriptively named.

with rasterio.open(target_image_path, 'r') as f:
target_profile = f.profile
target_image = image_tools.rasterio_read_bands(f)
print(target_profile)

destination = np.empty(target_image.shape, dtype=target_profile['dtype'])
resolution = (target_profile['transform'].a, target_profile['transform'].e)
transform, width, height = rasterio_warp.calculate_default_transform(src_crs,
dst_crs,
gcps=gcps,
resolution=resolution,
height=target_profile['height'],
width=target_profile['width'])

rasterio_warp.reproject(source=target_image,
destination=destination,
gcps=gcps,
src_crs=src_crs,
dst_crs=src_crs,
# src_transform=target_profile['transform'],
dst_transform=target_profile['transform'],
resampling=getattr(enums.Resampling, resampling_method),
num_threads=num_threads)


Re: Make _boundless_vrt_doc public

Sean Gillies
 

Hi Denis,

I'm not prepared to make it public yet. I'm not confident that the signature of the method is finished and I believe we should have a discussion on the dev group about VRTs and how much support we'll have for them in Rasterio's public API.

For now, I recommend copying it into your own project.


On Wed, Oct 31, 2018 at 9:39 AM Denis Rykov <rykovd@...> wrote:
Hi,

We are going to use rasterio.vrt._boundless_vrt_doc for making a XML VRT out of dataset in our library.
But we are little afraid because the name of this function tells that it is a private one.
Are there any reasons for this? What do you think about to make this function public?



--
Sean Gillies


Make _boundless_vrt_doc public

Denis Rykov
 

Hi,

We are going to use rasterio.vrt._boundless_vrt_doc for making a XML VRT out of dataset in our library.
But we are little afraid because the name of this function tells that it is a private one.
Are there any reasons for this? What do you think about to make this function public?