Speed up reading rasters
Carlos García Rodríguez
Hello, I need to read many huge datasets and the speed time is very important to avoid a bottleneck.
I have to read a tiff file that has 20 bands, and a window of 224,224. Now I am doing like this, and it takes approx 0.8seconds. with rasterio.open('./sentinel.tif') as src: sentinel1_1 = src.read(window=window) What I realized is that if I try to read only one of the bands the required time is approx the same, but when reading a tiff of only one band the amount of time is 10 times shorter. Can I do something to speed it up? Maybe read bands in parallel, I don't really know. I appreciate your help. Thank you. |
|
Even Rouault
On dimanche 26 avril 2020 03:04:56 CEST carlogarro@... wrote: > Hello, I need to read many huge datasets and the speed time is very > important to avoid a bottleneck. I have to read a tiff file that has 20 > bands, and a window of 224,224. Now I am doing like this, and it takes > approx 0.8seconds. > > with rasterio.open('./sentinel.tif') as src: > sentinel1_1 = src.read(window=window) > > What I realized is that if I try to read only one of the bands the required > time is approx the same, but when reading a tiff of only one band the > amount of time is 10 times shorter. > > Can I do something to speed it up? Maybe read bands in parallel, I don't > really know.
If you have control on how the creation of the TIFF file, make sure it uses Band interleaving instead of Pixel interleaving
For example, with gdal_translate can be done with -co INTERLEAVE=BAND
If the file is not tiled, adding tiling might also help.
I see in https://rasterio.readthedocs.io/en/latest/topics/profiles.html a pure rasterio way of creating such file
-- Spatialys - Geospatial professional services http://www.spatialys.com |
|
Carlos García Rodríguez
Hello, thank you so much for your recommendation, it speed it up x5. Very useful. Now I am having a problem that i do not understand.
I have the following script, where i access 10 random tiles of my raster. train_data is a vector [4822,2] of pixels position in the raster. for i in range(10): idx = np.random.randint(4822) x_idx = train_data[idx][1] y_idx = train_data[idx][0] window = Window(y_idx, x_idx, 224, 224) start_time = time.time() with rasterio.open('./sentinel2_tiled.tif') as src: sentinel2 = src.read(window=window) end_time = (time.time() - start_time) I do not understand why the times of loading a window are so different, as can be seen in the following image. Do you have some explanation? Thank you once more! |
|
Sean Gillies
Hi, On Mon, Apr 27, 2020 at 2:36 AM <carlogarro@...> wrote: Hello, thank you so much for your recommendation, it speed it up x5. Very useful. Now I am having a problem that i do not understand. I can't say for sure about the time differences because I don't know much about your data files or your computer. However, know this: GDAL's I/O system caches blocks of raster data in memory, the size of the cache is generally 5% of your computers memory, and windowed reads may or may not be served directly from the cache depending on their size and adjacency to previously read data. -- Sean Gillies |
|
Carlos García Rodríguez
So, do you think it should be a good idea to increase the cache memory? If so, how to do it? I have plenty of ram so that should not be a problem. On the other side I checked I there is some relation between tiles proximity and time and didn't find it. You can see the position of each tile in the image. El lun., 27 abr. 2020 17:20, Sean Gillies <sean.gillies@...> escribió:
|
|
Carlos García Rodríguez
I would also like to add that the first tiled read is not necessarily slow... El lun., 27 abr. 2020 20:06, Carlos García Rodríguez via groups.io <carlogarro=gmail.com@groups.io> escribió:
|
|
Carlos García Rodríguez
I have already fixed it. One of the main problems was that the tiled tiff was configured with size 256 and I was reading with windows of sizes 224. From here comes the disparity of the times of reading.
Also, I found faster to read pixel-based instead of band based. But it depends on the application. Thank you all for your help! |
|