Topics

Switch from PROJ.4 format to WKT format for CRS

Alan Snow
 

So, I have recently learned that the WKT string representation is a better choice to go with instead of the PROJ.4 string. The reason being that exporting to the PROJ.4 string can cause information about the projection to be lost. The PROJ.4 source code has a warning about this here.

However, the rasterio/fiona projects both store the PROJ.4 string representation. What are your thoughts on defaulting to the WKT representation of a projection instead? This will be a big change for users and the functionality of the CRS class. However, it does not mean that the proj.4 string support/initialization will go away. It just means that the projection won't be stored based on a proj.4 string and it could mean that the proj.4 dict structure would eventually go away.

Alan Snow
 

Another option would be to preserve the OSRSpatialReference object as an property on the class and allow it to be created with WKT or PROJ.4 but not export to PROJ.4 and before re-creating it. This way, the base OSRSpatialReference obect will contain all of the information originally provided. This could be a non-breaking change as it would allow the same functionality to exist in a re-organized fashion. But, it would require a refactor of the CRS class.

@cratcliff
 

I agree. The same issue applies to GeoPandas which is also based on fiona and I have raised it there with no success.

In my case ESRI produced coordinate systems causes an issue and as a result a simple read then write of a tiff file will apply the wrong coordinate system.

Take an Australian example ( EPSG: 28354 ) using GDAL I get the following: Take note the two wkt's are different
raster = gdal.Open(src_tif)
srs=osr.SpatialReference(wkt=raster.GetProjection())

print 'projcs: ', srs.GetAttrValue('projcs')
projcs: GDA_1994_MGA_Zone_54

print 'geocs: ', srs.GetAttrValue('geogcs')
geocs: GCS_GDA_1994

print '\nProj4: ', srs.ExportToProj4()
Proj4: +proj=utm +zone=54 +south +ellps=GRS80 +units=m +no_defs

print '\nwkt: ', srs.ExportToWkt()
wkt: PROJCS["GDA_1994_MGA_Zone_54",GEOGCS["GCS_GDA_1994",DATUM["GDA_1994",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]]]

# make a copy of the wkt for later
img_gdal_wkt = srs.ExportToWkt()
Same input file using RasterIO I get
with rasterio.open(src_tif) as src:
print src.crs
+ellps=GRS80 +no_defs +proj=utm +south +units=m +zone=54

print src.crs.wkt PROJCS["UTM Zone 54, Southern Hemisphere",GEOGCS["GRS 1980(IUGG, 1980)",DATUM["unknown",SPHEROID["GRS80",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["Meter",1]
So when I use the profile or meta as a base to create a new file, the coordinate system changes from GDA_1994_MGA_Zone_54 to UTM Zone 54, Southern Hemisphere when I want it to match the original.

Although you can access the WKT via rasterio's crs.wkt it seems to be generated from the Proj4 string not from the WKT of the input image. Maybe writing the actual image WKT instead of deriving it from the Proj4 is a short term workaround as it can then be used to update/specify the CRS.  I've proven this by extracting the WKT using gdal and applying it when writing a new file using rasterio. Accessing the new file via GDAL proves it has written correctly, but accessing via rasterio shows the incorrect WKT.

with rasterio.open(src_tif) as src:   
out_meta = src.meta.copy()

# update the crs with the correct GDA_1994_MGA_Zone_54 wkt string
out_meta['crs'] = img_gdal_wkt

with rasterio.open(r'C:\data\temp\wkt_test.tif', 'w', **out_meta) as dst:
dst.write(src.read())

# Accessing the output image via GDAL displays the correct wkt string
raster = gdal.Open(r'C:\data\temp\wkt_test.tif')
print raster.GetProjectionRef()
PROJCS["GDA_1994_MGA_Zone_54",GEOGCS["GCS_GDA_1994",DATUM["GDA_1994",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]]]

# Accessing it via rasterio produces the incorrect wkt string
with rasterio.open(r'C:\data\temp\wkt_test.tif') as dataset:
print dataset.crs
+ellps=GRS80 +no_defs +proj=utm +south +units=m +zone=54

print dataset.crs.wkt # rasterio reports the wrong crs.wkt
PROJCS["UTM Zone 54, Southern Hemisphere",GEOGCS["GRS 1980(IUGG, 1980)",DATUM["unknown",SPHEROID["GRS80",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["Meter",1]]

As a result I have to try and determine the epsg number and use that instead, but even that is difficult as the example wkt is an ESRI definition that doesn't contain the EPSG number.

Sean Gillies
 

Hi Christina,

Thank you for the explanation and examples. I definitely see your point and agree that a switch to WKT as a canonical representation (as in GDAL/OGR version 2.4 and earlier) is overdue. I don't think we'll have major problems doing this in a backwards compatible way, and I can make the time to do it. This work will smooth things out in advance of work that will need to be done to align with changes coming in GDAL 2.5, when WKT is no longer the canonical representation of spatial references at all: https://trac.osgeo.org/gdal/wiki/rfc73_proj6_wkt2_srsbarn.

On Wed, Jan 9, 2019 at 10:20 PM <christina.ratcliff@...> wrote:
I agree. The same issue applies to GeoPandas which is also based on fiona and I have raised it there with no success.

In my case ESRI produced coordinate systems causes an issue and as a result a simple read then write of a tiff file will apply the wrong coordinate system.

Take an Australian example ( EPSG: 28354 ) using GDAL I get the following: Take note the two wkt's are different
raster = gdal.Open(src_tif)
srs=osr.SpatialReference(wkt=raster.GetProjection())

print 'projcs: ', srs.GetAttrValue('projcs')
projcs: GDA_1994_MGA_Zone_54

print 'geocs: ', srs.GetAttrValue('geogcs')
geocs: GCS_GDA_1994

print '\nProj4: ', srs.ExportToProj4()
Proj4: +proj=utm +zone=54 +south +ellps=GRS80 +units=m +no_defs

print '\nwkt: ', srs.ExportToWkt()
wkt: PROJCS["GDA_1994_MGA_Zone_54",GEOGCS["GCS_GDA_1994",DATUM["GDA_1994",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]]]

# make a copy of the wkt for later
img_gdal_wkt = srs.ExportToWkt()
Same input file using RasterIO I get
with rasterio.open(src_tif) as src:
print src.crs
+ellps=GRS80 +no_defs +proj=utm +south +units=m +zone=54

print src.crs.wkt PROJCS["UTM Zone 54, Southern Hemisphere",GEOGCS["GRS 1980(IUGG, 1980)",DATUM["unknown",SPHEROID["GRS80",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["Meter",1]
So when I use the profile or meta as a base to create a new file, the coordinate system changes from GDA_1994_MGA_Zone_54 to UTM Zone 54, Southern Hemisphere when I want it to match the original.

Although you can access the WKT via rasterio's crs.wkt it seems to be generated from the Proj4 string not from the WKT of the input image. Maybe writing the actual image WKT instead of deriving it from the Proj4 is a short term workaround as it can then be used to update/specify the CRS.  I've proven this by extracting the WKT using gdal and applying it when writing a new file using rasterio. Accessing the new file via GDAL proves it has written correctly, but accessing via rasterio shows the incorrect WKT.

with rasterio.open(src_tif) as src:   
out_meta = src.meta.copy()

# update the crs with the correct GDA_1994_MGA_Zone_54 wkt string
out_meta['crs'] = img_gdal_wkt

with rasterio.open(r'C:\data\temp\wkt_test.tif', 'w', **out_meta) as dst:
dst.write(src.read())

# Accessing the output image via GDAL displays the correct wkt string
raster = gdal.Open(r'C:\data\temp\wkt_test.tif')
print raster.GetProjectionRef()
PROJCS["GDA_1994_MGA_Zone_54",GEOGCS["GCS_GDA_1994",DATUM["GDA_1994",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]]]

# Accessing it via rasterio produces the incorrect wkt string
with rasterio.open(r'C:\data\temp\wkt_test.tif') as dataset:
print dataset.crs
+ellps=GRS80 +no_defs +proj=utm +south +units=m +zone=54

print dataset.crs.wkt # rasterio reports the wrong crs.wkt
PROJCS["UTM Zone 54, Southern Hemisphere",GEOGCS["GRS 1980(IUGG, 1980)",DATUM["unknown",SPHEROID["GRS80",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["Meter",1]]

As a result I have to try and determine the epsg number and use that instead, but even that is difficult as the example wkt is an ESRI definition that doesn't contain the EPSG number.



--
Sean Gillies

Sean Gillies
 

Hi Alan,

Yes, this option seems good, and future-aware, too, as you know from looking into the changes in PROJ 5/6 and GDAL 2.5.

The class has been through some iterations already and I think another one is fine :)

On Wed, Jan 9, 2019 at 8:20 PM Alan Snow <alansnow21@...> wrote:
Another option would be to preserve the OSRSpatialReference object as an property on the class and allow it to be created with WKT or PROJ.4 but not export to PROJ.4 and before re-creating it. This way, the base OSRSpatialReference obect will contain all of the information originally provided. This could be a non-breaking change as it would allow the same functionality to exist in a re-organized fashion. But, it would require a refactor of the CRS class.
_._,_._

--
Sean Gillies

Alan Snow
 

Hi Sean,

Sounds like a good plan. Hopefully it won't be a breaking change (fingers crossed). From the link you sent, it seems some internal behavior will be changing, so the breaking part may be unavoidable between versions of GDAL, so maybe the transition can be eased. I am wondering with this change if it would be a good idea to revisit the idea of potentially creating a new CRS package that could be shared by both rasterio & fiona.

Sean Gillies
 

Hi Alan,

On Thu, Jan 10, 2019 at 9:30 PM Alan Snow <alansnow21@...> wrote:
Hi Sean,

Sounds like a good plan. Hopefully it won't be a breaking change (fingers crossed). From the link you sent, it seems some internal behavior will be changing, so the breaking part may be unavoidable between versions of GDAL, so maybe the transition can be eased. I am wondering with this change if it would be a good idea to revisit the idea of potentially creating a new CRS package that could be shared by both rasterio & fiona.

Making a standalone CRS package is an attractive idea, but there are two things that tell me to hesitate. The rule of three says refactor only when you have 3 separate uses, and Rasterio and Fiona only make 2. The other thing is the fact that this CRS package would link GDAL while Rasterio and Fiona would also continue to link GDAL. I think this would probably increase the size of the DLL Hellmouth by a power of 2. If users get rasterio and crs from different places (Conda vs PyPI, say) they won't be able to start. This already happens quite a bit between Rasterio and Shapely, which share a dependency on GEOS.

--
Sean Gillies

Alan Snow
 

Ah, valid points that I didn't think about. I definitely agree with your reason to not move forward with that as there are more challenges besides the show stoppers you already mentioned. Another option could be to use the functionality from the new pyproj.CRS class in development to have consistency with the CRS class without having to copy too much to fiona. However, that also may have its challenges as proj.4 6.0.0 would be required as well as an additional dependency of pyproj to rasterio. I guess that is not an entirely winning scenario either. Well, I guess any path forward will have its challenges, just need to pick the one you want to deal with :).

Sean Gillies
 

Hi all, particularly Alan and Christina,

I'd love your comments on https://github.com/mapbox/rasterio/pull/1597, in which I believe I have solved the problem in a way that doesn't strictly change Rasterio's API. There could be some minor side effects and I'd like to get a sense for how widespread they may be.

Thanks!


On Fri, Jan 11, 2019 at 10:44 AM Alan Snow <alansnow21@...> wrote:
Ah, valid points that I didn't think about. I definitely agree with your reason to not move forward with that as there are more challenges besides the show stoppers you already mentioned. Another option could be to use the functionality from the new pyproj.CRS class in development to have consistency with the CRS class without having to copy too much to fiona. However, that also may have its challenges as proj.4 6.0.0 would be required as well as an additional dependency of pyproj to rasterio. I guess that is not an entirely winning scenario either. Well, I guess any path forward will have its challenges, just need to pick the one you want to deal with :).



--
Sean Gillies

@cratcliff
 

Hi Sean,

I've had a look at the code changes and it looks as if you have fixed my issue. Unfortunately I'm a windows user and rely on the gohlke wheel builds or conda to install RasterIO into my python environment unless you or Alan know of another way to do this. As a result I can't test these changes using my scripts to help you find the side effects.

Thanks for your prompt efforts at addressing this issue.

Cheers,
Christina

Joris Van den Bossche
 

Op vr 11 jan. 2019 om 16:12 schreef Sean Gillies via Groups.Io <sean=mapbox.com@groups.io>:
Hi Alan,

On Thu, Jan 10, 2019 at 9:30 PM Alan Snow <alansnow21@...> wrote:
Hi Sean,

Sounds like a good plan. Hopefully it won't be a breaking change (fingers crossed). From the link you sent, it seems some internal behavior will be changing, so the breaking part may be unavoidable between versions of GDAL, so maybe the transition can be eased. I am wondering with this change if it would be a good idea to revisit the idea of potentially creating a new CRS package that could be shared by both rasterio & fiona.

Making a standalone CRS package is an attractive idea, but there are two things that tell me to hesitate. The rule of three says refactor only when you have 3 separate uses, and Rasterio and Fiona only make 2.

As a GeoPandas developer, I would also be interested in a shared CRS class / package (so that would make 3 :)).

It's one of my longer term goals for GeoPandas to have better and more user-friendly CRS support (e.g. to not loose information as mentioned earlier in this thread, to be able to robustly compared CRSses, ...), which in practice would mean creating a CRS class instead of using a simple proj4 dict as we have now. And in that light, I have already been wondering about how this would fit into the ecosystem, or how it would be possible to avoid that each package starts to have its own CRS class (rasterio, cartopy, maybe fiona and then geopandas in the future, ...). Or at least to think about good interoperability.

However, reading the thread here, what I am looking for (for GeoPandas) might actually be the future pyproj.CRS ? (https://github.com/jswhit/pyproj/pull/155) That might be sufficient for our needs in GeoPandas.

One question (and excuse my ignorance, I am not yet that familiar with all the GDAL SRS / pyproj details): from reading the thread, I seem to understand that proj/pyproj is not sufficient for rasterio/fiona's use case, as Sean mentioned that such a standalone CRS package would have to link to GDAL. But I thought that one of the goalds of the gdal barn work was to more proj-related functionality from GDAL to proj4 and use proj4 in GDAL?
Or is it just because rasterio/fiona already need GDAL anyway, that it would be easier to use the proj-related functionality through GDAL as well?
But as Alan notes, having the shared CRS class in pyproj also adds that as an explicit dependency for fiona/rasterio (although proj itself is already a dependency through GDAL, pyproj would of course be an additional package that needs to be correctly build against the same proj version).

Joris

The other thing is the fact that this CRS package would link GDAL while Rasterio and Fiona would also continue to link GDAL. I think this would probably increase the size of the DLL Hellmouth by a power of 2. If users get rasterio and crs from different places (Conda vs PyPI, say) they won't be able to start. This already happens quite a bit between Rasterio and Shapely, which share a dependency on GEOS.

--
Sean Gillies

Op vr 11 jan. 2019 om 18:44 schreef Alan Snow <alansnow21@...>:
Ah, valid points that I didn't think about. I definitely agree with your reason to not move forward with that as there are more challenges besides the show stoppers you already mentioned. Another option could be to use the functionality from the new pyproj.CRS class in development to have consistency with the CRS class without having to copy too much to fiona. However, that also may have its challenges as proj.4 6.0.0 would be required as well as an additional dependency of pyproj to rasterio. I guess that is not an entirely winning scenario either. Well, I guess any path forward will have its challenges, just need to pick the one you want to deal with :).

 

Alan Snow
 

> I have already been wondering about how this would fit into the ecosystem, or how it would be possible to avoid that each package starts to have its own CRS class

I have seen various threads where users get thrown off by the differences between CRS classes used in the different libraries. In my opinion, it would be benefit to the community for the python geospatial libraries to use the same CRS class (or as you mentioned, it would be good to have interoperability). That was my hope/motivation behind adding the pyproj.CRS class.


> I seem to understand that proj/pyproj is not sufficient for rasterio/fiona's use case, as Sean mentioned that such a standalone CRS package would have to link to GDAL. But I thought that one of the goals of the gdal barn work was to more proj-related functionality from GDAL to proj4 and use proj4 in GDAL? ...

You are correct that the crs/proj related functionality is being moved into the PROJ.4 library; it should be in the next release 6.0.0 in march sometime. As such, the `pyproj.CRS` class could be used in the rasterio/fiona/geopandas libraries in the future.


However, the new version of PROJ.4 6.0.0 has some breaking changes in the API (so I am not sure if it will work with previous versions of GDAL). Also, version 6.0.0 adds the use of sqlite3 to store projection information in a database, which means the old and new versions will have compatibility issues with the PROJ_DATA directory.


This will likely be okay for users that:

  1. Use wheels only for installing the geospatial packages

  2. Use conda/conda-forge to install the packages

  3. Use the most recent version of GDAL/PROJ.4


However, it could potentially cause conflicts with users that are pinned to old versions of GDAL and PROJ.4 when attempting to use the new version of pyproj. This will become less of an issue as the new versions of GDAL/PROJ.4 are adopted mainstream.

In my head, the solution to addressing this issue could happen in a two phased approach:

 

Phase 1: Continue to support previous version of GDAL (currently in progress)


Since rasterio/fiona still supports versions all the way back to GDAL 1.11, adding the pyproj.CRS class has the potential to cause issues for users of an older version of GDAL/PROJ.4. One way around the incompatibility issue is to have duplicate the `rasterio.CRS` class in fiona using the GDAL bindings. This will allow for backwards compatibility with older versions of GDAL and a common CRS class among two of the packages (more interoperability between CRS classes). Additionally, it will allow geopandas the ability to use a class-based CRS class that can take advantage of the fixes addressed in this issue.


Phase 2: Drop support for previous versions of GDAL in rasterio/fiona 2.x (optional - yet to be considered)

In this phase, the development cycle would probably need to split into 1.x and 2.x for rasterio/fiona. There would be continued support previous versions of GDAL in rasterio/fiona 1.x. However, in the 2.x versions, only the new GDAL/PROJ.4 versions would be supported and the `pyproj.CRS` class could be used (this could also be a good time to drop Python 2 support). This could potentially enable using features in the new version(s) of GDAL that did not exist previously.

To ease the transition, it could be useful to create alpha releases in rasterio/fiona 2.x (similar to the ones that existed before the official releases of rasterio 1.0) for early adopters. Once the new version of PROJ.4/GDAL becomes mainstream, officially releasing the 2.x versions would be a smoother transition.


[end of phases]


However, there are probably other approaches that could work just as well. And, the `pyproj.CRS` class does not necessarily have to be used in rasterio. With only applying Phase 1 and a bit of collaboration, interoperability would be attainable between the python geospatial libraries. Either way, I am looking forward to the gdalbarn changes, `pyproj.CRS`, the updated `rasterio.CRS`, the usage of a `CRS` class in fiona & geopandas, and their being interoperable!


Hopefully that helped to begin to answer your question :). I look forward to reading y'alls thoughts on the subject.

Alan Snow
 

I have a version of pyproj that "should" support existing alongside an older version of PROJ.4 without conflicts. A link to the wheel is here if you want to give it a try and see if it works for you.