Re: Request for comment: rasterio wheels on PyPI and GDAL 3.1


Howard Butler
 



On Sep 13, 2020, at 4:21 PM, Sean Gillies <sean.gillies@...> wrote:

Hi all,

The rasterio 1.1.5 wheels on PyPI include GDAL 2.4.4 and a patched PROJ 4.9.3. Neither of these versions are supported anymore. Recently I have heard from users and contributors would like to see GDAL 3.1.x in the PyPI wheels soon, to benefit from new features, and also I have heard from those who would rather stick with 2.4.4 for a while yet due to increased latency noticed with the new dependency on proj.db in PROJ versions 6 and greater. I'm sympathetic to both arguments. Rasterio depends on some new GDAL bug fixes to really shine and be its best. On the other hand, I really don't know how best to distribute GDAL 3.1 and PROJ 7.1, particularly when it comes to including PROJ data in the wheels (as we have) or relying on the PROJ CDN.

I'd love to hear comments on whether rasterio wheels should switch over to GDAL 3.1 and PROJ 7.1, especially those based on experience with the latest PROJ in production in distributed and "serverless" systems.

As an instigator for the CDN and frequent grid shifter due to our use in the geodetic and lidar domain, I've been very happy with PROJ's CDN approach in Lambda scenarios. You can find my Dockerfile at https://github.com/PDAL/lambda and my latest public layer with GDAL 3.1.3 and PROJ 7.1.1 is at arn:aws:lambda:us-east-1:163178234892:layer:pdal:35

As a library packager, I agree the PROJ data is a burden. The wheel approach of Python even makes that burden more pronounced. I like Alan's pyproj approach of pointing the user at how to fetch the data if they need it, or flip on the network bit if they're ok with that. It seems like a good compromise and a packaging approach that is not that much different from the pre-6.0 days when most people ignored the grids entirely.

The latency complaint? Are you doing 100s of lookups per second (MapServer)? There may be more performance to get there, but someone is going to have to spend *a lot* of time to get it. I think the proj.db situation (one actual database across all the projects) is gobs better than before (at least three csv "databases" with different-but-same information in them and indeterminate  code paths for calculation of a definition). 

Howard

Join main@rasterio.groups.io to automatically receive all group messages.