problem with window sizes when parallelizing a funciton
javier lopatin
Hi all,
I'm currently using rasterio to process time series analysis of data. Because the raster tiles are too big, I'm trying to implement a windowed function with parallel processing. I tried your tutorial on concurrent processing (https://github.com/mapbox/rasterio/blob/master/docs/topics/concurrency.rst) and it works beautify with my own functions. I'm just wondering why it is only working with window sizes of 128 (profile.update(blockxsize=128,blockysize=128, tiled=True)), just like in the example. If I change these values to e.g. 200 or 500 it does not work anymore. I'm currently trying with a 3,000 X 3,000 raster size. Because loops are slow, I assume that increasing a bit the window size could be helpful. The error message that I receive if I change blockxsize is: Traceback (most recent call last):
File "TSA.py", line 252, in <module>
main(args.inputImage, args.outputImage, args.j)
File "TSA.py", line 225, in main
parallel_process(infile, outfile, n_jobs)
File "TSA.py", line 187, in parallel_process
with rasterio.open(outfile, "w", **profile) as dst:
File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/env.py", line 398, in wrapper
return f(*args, **kwds)
File "/home/javier/miniconda3/lib/python3.6/site-packages/rasterio/__init__.py", line 226, in open
**kwargs)
File "rasterio/_io.pyx", line 1129, in rasterio._io.DatasetWriterBase.__init__
File "rasterio/_err.pyx", line 194, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: _TIFFVSetField:/home/javier/Documents/SF_delta/Sentinel/TSA/X-004_Y-001/2015-2019_001-365_LEVEL4_TSA_SEN2L_EVI_C0_S0_FAVG_TY_C95T_FBY_TSA.tif: Bad value 500 for "TileWidth" tag
The used function is below (see the whole script at https://github.com/JavierLopatin/Python-Remote-Sensing-Scripts/blob/master/TSA.py): def parallel_process(infile, outfile, n_jobs):
"""
Process infile block-by-block with parallel processing
and write to a new file.
"""
from tqdm import tqdm # progress bar
with rasterio.Env():
with rasterio.open(infile) as src:
# Create a destination dataset based on source params. The
# destination will be tiled, and we'll process the tiles
# concurrently.
profile = src.profile
profile.update(blockxsize=128, blockysize=128,
count=6, dtype='float64', tiled=True)
with rasterio.open(outfile, "w", **profile) as dst:
# Materialize a list of destination block windows
# that we will use in several statements below.
windows = [window for ij, window in dst.block_windows()]
# This generator comprehension gives us raster data
# arrays for each window. Later we will zip a mapping
# of it with the windows list to get (window, result)
# pairs.
data_gen = (src.read(window=window) for window in windows)
with concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs
) as executor:
# We map the TSA() function over the raster
# data generator, zip the resulting iterator with
# the windows list, and as pairs come back we
# write data to the destination dataset.
for window, result in zip(
tqdm(windows), executor.map(TSA, data_gen)
):
dst.write(result, window=window)Hope you guys can help. Cheers, Javier
|
|
Re: Python rasterio for saveing GeoTIFF files and read in ArcGIS or QGIS
Gabriel Cotlier
Dear Sean,
toggle quoted messageShow quoted text
Thank you very much for your help. I appreciate it. Regards Gabriel
On Monday, May 20, 2019, Sean Gillies <sean.gillies@...> wrote:
|
|
Re: Python rasterio for saveing GeoTIFF files and read in ArcGIS or QGIS
Sean Gillies
Thank you for providing a complete example of your code and the ArcMap error message. Without those details, it can be hard to provide help. When you open the files for writing, there is one thing missing: the *transform* keyword argument that determines the spatial extent of the dataset. I think that in your case, you will have success if you copy the value from the source datasets: filename = "20180308_133037_1024_3B_AnalyticMS.tif" # Copy the transform and crs objects from the source dataset.
with rasterio.open(filename) as src: dst_transform = src.transform dst_crs = src.crs # Pass the copied values to rasterio.open(). new_dataset = rasterio.open(
'ReflectanceB1.tif',
'w',
driver='GTiff',
height=band_blue_reflectance.shape[0],
width=band_blue_reflectance.shape[1],
count=1,
dtype=band_blue_reflectance.dtype,
crs=dst_crs, transform=dst_transform
)
new_dataset.write(band_blue_reflectance, 1)
new_dataset.close()
On Sat, May 18, 2019 at 12:39 PM <gabiklm01@...> wrote:
-- Sean Gillies
|
|
Python rasterio for saveing GeoTIFF files and read in ArcGIS or QGIS
Gabriel Cotlier
After following the steps in the tutorial of the Planet website (https://developers.planet.com/tutorials/convert-planetscope-imagery-from-radiance-to-reflectance/), I want to save each individual band of a RapidEye image Level 3B as a separate GeoTIFF layer file to be able to open it in any other GIS-software such as QGIS, ArcGIS other. I could not succeed in saving each band separately as GeoTIFF layer file. Here is the code:
```
#!/usr/bin/env python
# coding: utf-8
import rasterio
import numpy as np
filename = "20180308_133037_1024_3B_AnalyticMS.tif"
# Load red and NIR bands - note all PlanetScope 4-band images have band order BGRN
with rasterio.open(filename) as src:
band_blue_radiance = src.read(1)
with rasterio.open(filename) as src:
band_green_radiance = src.read(2)
with rasterio.open(filename) as src:
band_red_radiance = src.read(3)
with rasterio.open(filename) as src:
band_nir_radiance = src.read(4)
from xml.dom import minidom
xmldoc = minidom.parse("20180308_133037_1024_3B_AnalyticMS_metadata.xml")
nodes = xmldoc.getElementsByTagName("ps:bandSpecificMetadata")
# XML parser refers to bands by numbers 1-4
coeffs = {}
for node in nodes:
bn = node.getElementsByTagName("ps:bandNumber")[0].firstChild.data
if bn in ['1', '2', '3', '4']:
i = int(bn)
value = node.getElementsByTagName("ps:reflectanceCoefficient")[0].firstChild.data
coeffs[i] = float(value)
print "Conversion coefficients:", coeffs
# Multiply the Digital Number (DN) values in each band by the TOA reflectance coefficients
band_blue_reflectance = band_blue_radiance * coeffs[1]
band_green_reflectance = band_green_radiance * coeffs[2]
band_red_reflectance = band_red_radiance * coeffs[3]
band_nir_reflectance = band_nir_radiance * coeffs[4]
import numpy as np
print "Red band radiance is from {} to {}".format(np.amin(band_red_radiance), np.amax(band_red_radiance))
print "Red band reflectance is from {} to {}".format(np.amin(band_red_reflectance), np.amax(band_red_reflectance))
print "Before Scaling, red band reflectance is from {} to {}".format(np.amin(band_red_reflectance), np.amax(band_red_reflectance))
# Here we include a fixed scaling factor. This is common practice.
scale = 10000
blue_ref_scaled = scale * band_blue_reflectance
green_ref_scaled = scale * band_green_reflectance
red_ref_scaled = scale * band_red_reflectance
nir_ref_scaled = scale * band_nir_reflectance
print "After Scaling, red band reflectance is from {} to {}".format(np.amin(red_ref_scaled), np.amax(red_ref_scaled))
```
Here I tried unsuccessfully to save each individual band as GeotiFF file.
I have tried, any idea how to do the save of each band correctly to further open in QGIS orArGIS for calculating NDVI there?
```
dst_crs='EPSG:32720'
## saving bands individualy
new_dataset = rasterio.open(
'ReflectanceB1.tif',
'w',
driver='GTiff',
height=band_blue_reflectance.shape[0],
width=band_blue_reflectance.shape[1],
count=1,
dtype=band_blue_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_blue_reflectance, 1)
new_dataset.close()
new_dataset = rasterio.open(
'ReflectanceB2.tif',
'w',
driver='GTiff',
height=band_red_reflectance.shape[0],
width=band_red_reflectance.shape[1],
count=1,
dtype=band_red_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
new_dataset = rasterio.open(
'ReflectanceB3.tif',
'w',
driver='GTiff',
height=band_red_reflectance.shape[0],
width=band_red_reflectance.shape[1],
count=1,
dtype=band_red_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
new_dataset = rasterio.open(
'ReflectanceB4.tif',
'w',
driver='GTiff',
height=band_red_reflectance.shape[0],
width=band_red_reflectance.shape[1],
count=1,
dtype=band_red_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
```
I'm using as input image "20180308_133037_1024_3B_AnalyticMS.tif" instead of "20180308_133037_1024_3B_AnalyticMS_SR.tif" don't know if is the problem
I'm getting resulting GeoTIFF file but when I open it in ArcMap I get this error
The GeoTIFF band are there and can be seen at ArcMap screen... but the error appear anyway... might be related to coordinate systems to be defined possibly while saving in rasterio? And need to have it correctly projected for clipping with shapefiles.
|
|
save
Gabriel Cotlier
After following the steps in the tutorial of the Planet website (https://developers.planet.com/tutorials/convert-planetscope-imagery-from-radiance-to-reflectance/), I want to save each individual band of a RapidEye image Level 3B as a separate GeoTIFF layer file to be able to open it in any other GIS-software such as QGIS, ArcGIS other. I could not succeed in saving each band separately as GeoTIFF layer file. Here is the code:
```
#!/usr/bin/env python
# coding: utf-8
import rasterio
import numpy as np
filename = "20180308_133037_1024_3B_AnalyticMS.tif"
# Load red and NIR bands - note all PlanetScope 4-band images have band order BGRN
with rasterio.open(filename) as src:
band_blue_radiance = src.read(1)
with rasterio.open(filename) as src:
band_green_radiance = src.read(2)
with rasterio.open(filename) as src:
band_red_radiance = src.read(3)
with rasterio.open(filename) as src:
band_nir_radiance = src.read(4)
from xml.dom import minidom
xmldoc = minidom.parse("20180308_133037_1024_3B_AnalyticMS_metadata.xml")
nodes = xmldoc.getElementsByTagName("ps:bandSpecificMetadata")
# XML parser refers to bands by numbers 1-4
coeffs = {}
for node in nodes:
bn = node.getElementsByTagName("ps:bandNumber")[0].firstChild.data
if bn in ['1', '2', '3', '4']:
i = int(bn)
value = node.getElementsByTagName("ps:reflectanceCoefficient")[0].firstChild.data
coeffs[i] = float(value)
print "Conversion coefficients:", coeffs
# Multiply the Digital Number (DN) values in each band by the TOA reflectance coefficients
band_blue_reflectance = band_blue_radiance * coeffs[1]
band_green_reflectance = band_green_radiance * coeffs[2]
band_red_reflectance = band_red_radiance * coeffs[3]
band_nir_reflectance = band_nir_radiance * coeffs[4]
import numpy as np
print "Red band radiance is from {} to {}".format(np.amin(band_red_radiance), np.amax(band_red_radiance))
print "Red band reflectance is from {} to {}".format(np.amin(band_red_reflectance), np.amax(band_red_reflectance))
print "Before Scaling, red band reflectance is from {} to {}".format(np.amin(band_red_reflectance), np.amax(band_red_reflectance))
# Here we include a fixed scaling factor. This is common practice.
scale = 10000
blue_ref_scaled = scale * band_blue_reflectance
green_ref_scaled = scale * band_green_reflectance
red_ref_scaled = scale * band_red_reflectance
nir_ref_scaled = scale * band_nir_reflectance
print "After Scaling, red band reflectance is from {} to {}".format(np.amin(red_ref_scaled), np.amax(red_ref_scaled))
```
Here I tried unsuccessfully to save each individual band as GeotiFF file.
I have tried, any idea how to do the save of each band correctly to further open in QGIS orArGIS for calculating NDVI there?
```
dst_crs='EPSG:32720'
## saving bands individualy
new_dataset = rasterio.open(
'ReflectanceB1.tif',
'w',
driver='GTiff',
height=band_blue_reflectance.shape[0],
width=band_blue_reflectance.shape[1],
count=1,
dtype=band_blue_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_blue_reflectance, 1)
new_dataset.close()
new_dataset = rasterio.open(
'ReflectanceB2.tif',
'w',
driver='GTiff',
height=band_red_reflectance.shape[0],
width=band_red_reflectance.shape[1],
count=1,
dtype=band_red_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
new_dataset = rasterio.open(
'ReflectanceB3.tif',
'w',
driver='GTiff',
height=band_red_reflectance.shape[0],
width=band_red_reflectance.shape[1],
count=1,
dtype=band_red_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
new_dataset = rasterio.open(
'ReflectanceB4.tif',
'w',
driver='GTiff',
height=band_red_reflectance.shape[0],
width=band_red_reflectance.shape[1],
count=1,
dtype=band_red_reflectance.dtype,
crs=dst_crs,
)
new_dataset.write(band_red_reflectance, 1)
new_dataset.close()
```
I'm using as input image "20180308_133037_1024_3B_AnalyticMS.tif" instead of "20180308_133037_1024_3B_AnalyticMS_SR.tif" don't know if is the problem
I'm getting resulting GeoTIFF file but when I open it in ArcMap I get this error
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/zJ73V.jpg
The GeoTIFF band are there and can be seen at ArcMap screen... but the error appear anyway... might be related to coordinate systems to be defined possibly while saving in rasterio? And need to have it correctly projected for clipping with shapefiles.
|
|
Rasterio 1.0.23
Sean Gillies
Hi all, Rasterio 1.0.23 is on PyPI now. It has a number of important bug fixes: - rio-calc output is divided into chunks of user-specified size to constrain the amount of memory used (#1668). - Attempts to set attributes of datasets opened in "r" mode now raise a custom DatasetAttributeError. This exception derives from both RasterioError and NotImplementedError, which maintains backwards compatibility (#1676). - Block sizes are no longer guarded when creating untiled datasets (#1689). - CRS objects are now hashable and equivalent CRS objects have the same hash value (#1684). - Allow AWS regions to be specified no matter the signing of requests (#1670). - Add links to API documentation from the Python quickstart guide. - Use "CRS.from_epsg({})" instead of "CRS.from_dict(init='epsg:{}')" as the representation for CRS objects that are completely described by an EPSG code. - Use GDAL's string parsing to get metadata item keys and values, which accommodates uncommon "KEY:VALUE" forms. If you're a rio-calc user, be sure to update the snuggs package as well. 1.4.6 has several very important bug fixes. Thank you for the excellent bug reports, Sean Gillies
|
|
Re: Can't read in a Landsat geotiff and use multiple threads to write out different windows at the same time
Thanks Sean, very helpful and makes sense.
On Tue, May 7, 2019 at 9:30 AM Sean Gillies <sean.gillies@...> wrote:
--
Ryan Avery Graduate Student, WAVES Lab Department of Geography University of California, Santa Barbara
|
|
Re: Can't read in a Landsat geotiff and use multiple threads to write out different windows at the same time
Sean Gillies
The VRT_SHARED_SOURCE option only affects the connection pool within the VRT driver code and that's not relevant to your code as far as I can see. A single opened rasterio dataset cannot be safely used by multiple threads, concurrently. There is more about this limitation at https://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe. Each of your threads will need exclusive use of a dataset object. The simplest way to achieve this is to call rasterio.open(..., sharing=False) in new threads to get a new dataset and then close it before the thread is joined. Another way is to create a suitably-sized pool of dataset objects (also using the sharing=False option) within your application and then assign these to your threads on a as-needed basis. I hope this helps,
On Mon, May 6, 2019 at 7:02 PM <ravery@...> wrote:
-- Sean Gillies
|
|
Can't read in a Landsat geotiff and use multiple threads to write out different windows at the same time
I moved this github issue over here since it seems like more of a usage/help question. Below is the issue with a reproducible example if you have a Landsat tif. The program fails at different windows when writing out windows with threads. This issue is related to this issue https://github.com/mapnik/node-mapnik/issues/437#issuecomment-103806098
|
|
Re: NameError: name 'Window' is not defined
quinsen.joel@...
Thanks for both your help! I see the import statement in the docs now, cheers!
-Quin
|
|
Re: NameError: name 'Window' is not defined
Sean Gillies
Thanks for the help, rickD! In https://github.com/mapbox/rasterio/commit/eebf54b74fa0f718cb8baaa44aa20616e405b4e4 I added the missing Window class import. Read the Docs should be updated in a couple minutes. I think this should help.
On Mon, May 6, 2019 at 1:38 PM <debboutr@...> wrote:
--
Sean Gillies
|
|
Re: NameError: name 'Window' is not defined
debboutr@...
you just need the class in your namespace, try: I would agree it's not easy from the docs to know where to go and find this.
~rickD
|
|
Re: NameError: name 'Window' is not defined
Sean Gillies
Hi Quin, The Window class is defined in the rasterio.windows module and you must explicitly import it into your program's code like this: from rasterio.windows import Window Example code in our docs might obscure this need, if you see any docs that don't show import of the Window class, let me know and I'll fix them.
On Mon, May 6, 2019 at 1:23 PM <quinsen.joel@...> wrote:
--
Sean Gillies
|
|
NameError: name 'Window' is not defined
quinsen.joel@...
Hello. I'm new to this forum, so let me know if there are any problems with my question here.
I am trying to use rasterio's windowed reading ability and I am hoping to use the following structure as shown on rasterios documentation page (https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html): ```
with rasterio.open('file.tif') as src:
window = src.read(1, window=Window(0, 0, new_height, new_width))
```
However, when I try to do this I get:
```NameError: name 'Window' is not defined```
Shouldn't Window() be imported with rasterio? Is this a problem with the version of rasterio I have downloaded or am I missing something really easy here?
```
print (rasterio.__version__)
1.0.7
```
Thank you for your guidance, Quin
|
|
Re: WarpedVRT and resampling ?
vincent.sarago@...
Hi Sean, sorry for not being really clear.
> Can you explain what processes produced those images? What program produced the blurry one and what program produced the non-blurry one? Which resampling algorithms were used? Those two image were produced by `rio-tiler`. One on linux (blurry) the other on Mac OS (both with wheels, but also tried with gdal sources). The resampling algorithm that produced blurry lines are non-nearest: cubic, bilinear, cubic spline. > if you use the WarpedVRT, you should define its extents so that you don't need to use any resampling at all when you read from it. Also, I don't fully understand the edge effects when you resample in read(). in https://github.com/cogeotiff/rio-tiler/blob/master/rio_tiler/utils.py#L296-L321, what we do first is to calculated the bounds of the VRT (mercator tiles) and then do (decimated) reads. The interesting part is the bilinear resampling introduce the blurry lines only when using it on the `WarpedVRT` params and not on the `read`. here is a gist showing how to reproduce it: https://gist.github.com/vincentsarago/0b229741c3261a9a447c7ab5dbd9eb05
|
|
Re: WarpedVRT and resampling ?
Sean Gillies
Hi Vincent, Can you explain what processes produced those images? What program produced the blurry one and what program produced the non-blurry one? Which resampling algorithms were used? Yes, you can introduce resampling in two places. I think it is not a good idea to do so: if you use the WarpedVRT, you should define its extents so that you don't need to use any resampling at all when you read from it. Also, I don't fully understand the edge effects when you resample in read().
On Fri, Apr 19, 2019 at 9:37 PM <vincent.sarago@...> wrote: This was previously discussed on multiple Issues and PR: -- Sean Gillies
|
|
WarpedVRT and resampling ?
vincent.sarago@...
This was previously discussed on multiple Issues and PR:
- https://github.com/mapbox/rasterio/issues/1206 - https://github.com/mapbox/rasterio/pull/1238 - https://github.com/mapbox/rasterio/pull/1239 While working with https://github.com/cogeotiff/rio-tiler on different data format (Int16, Float ...) I've encountered some strange behavior when using non-nearest resampling algorithm. ## Contexte When working with `WarpedVRT` you can choose the resampling algorithm twice: ``` with WarpedVRT(src, resampling=enums.Resampling.nearest) as vrt: arr = vrt.read(out_shape=(1, 512, 512, resampling=enums.Resampling.nearest)) ``` Sean try to explain this in https://github.com/mapbox/rasterio/pull/1238#issuecomment-353123776 rio-tiler code: https://github.com/cogeotiff/rio-tiler/blob/master/rio_tiler/utils.py#L296-L321 ## Behavior On the images attached we can see that there are `blurry` lines on the bottom. Those lines appear on the bottom of the tile returned by `rio-tiler` (bottom of the WarpedVRT returned value). Observation: - Only visible when working on Linux (https://github.com/RemotePixel/amazonlinux-gdal/tree/gdal2.4.0, or with rasterio wheels) - Only with Int16 data with negative NoData - Not visible when using gdalwarp directly - Visible only when not using `resampling=enums.Resampling.nearest` in WarpedVRT params I'm not 100% to understand what is going on and if I should or should not use `enums.Resampling.nearest` as default resampling in WarpedVRT params so I'm looking for guidance here. Thanks, Vincent
|
|
Re: tempfile.NamedTemporaryFile behaving as /vsimem and eating all the machine memory
Sean Gillies
Vincent. At https://github.com/mapbox/rasterio/blob/master/rasterio/__init__.py#L191, a big GeoTIFF is created in RAM. Then at https://github.com/mapbox/rasterio/blob/master/rasterio/__init__.py#L199 that GeoTIFF is read into memory *again* so that it can be written to the Python file object. There will be two copies in memory. It's terribly inefficient, but I don't want to spend the time to optimize this case when I should be documenting the limitations instead.
On Tue, Apr 16, 2019 at 12:47 PM <vincent.sarago@...> wrote: Thanks Sean this is really helpful and love the `temp.name` solution. --
Sean Gillies
|
|
Re: tempfile.NamedTemporaryFile behaving as /vsimem and eating all the machine memory
vincent.sarago@...
Thanks Sean this is really helpful and love the `temp.name` solution.
About the second point, do you have any idea why `/vsimem` driver need so much memory when exiting/closing ? Should I raise this to the gdal list?
|
|
Re: tempfile.NamedTemporaryFile behaving as /vsimem and eating all the machine memory
Sean Gillies
Hi Vincent, This is expected (if not well-documented) behavior. tempfile.NamedTemporaryFile() returns an open Python file object, not a filename. GDAL can't use a Python file object, so in that case rasterio.open reads all the bytes from the file object, copies them to the vsimem filesystem, and opens that vsimem file. I think what you want do do is pass the name of the temp file object to GDAL. Like this: with tempfile.NamedTemporaryFile() as temp: with rasterio.open(temp.name) as dataset: print(dataset) No copy in the vsimem filesystem will be made.
On Tue, Apr 16, 2019 at 6:55 AM <vincent.sarago@...> wrote: While working on https://github.com/cogeotiff/rio-cogeo/pull/75 we noticed strange behaviors with `vsimem` driver (this could be a GDAL but TBH). --
Sean Gillies
|
|