Topics

Rasterio and pip 19


Sean Gillies
 

Hi all,

I was tripped up by the release of pip 19 and want to share what I learned so that you can avoid the same bruises.

Rasterio has a pyproject.toml file since version 1.0. This file specifies the build requirements of Rasterio, the packages you must have to build working modules from the source. These include Cython, because Rasterio is largely written in the Cython language, and Numpy, because Cython needs Numpy's C headers.

When pip 10 was released, it began to use this pyproject.toml file and you may have noticed that the old two-step build of Rasterio

  pip install numpy cython
  pip install --no-binary rasterio rasterio

wasn't necessary anymore. Pip would check the pyproject.toml file and install numpy and cython before moving on to Rasterio's own setup script. Note well that pip 10 would install numpy and cython into the site-packages folder of the currently active Python environment. Pip 19 changes this.

With pip 19, the default is to create a new temporary environment and install numpy and cython into it. And because I didn't specify versions in pyproject.toml, this means the latest numpy (1.16.0 as I type this). Extension modules compiled using the 1.16.0 headers aren't compatible with older versions of numpy. Importing them will result in a ValueError and a message about mismatched struct sizes.

In Rasterio's wheel-building system we have a different way of satisfying the build-time dependencies, but the new temporary environment snuck in, entailed a dependency on numpy 1.16.0 and then when we tested the wheels with versions of numpy that *should* have been compatible, we had failed tests. It was rather perplexing until I reminded myself to check the pip change log. The problem was easy enough to solve: we use the --no-build-isolation option of pip install and our builds go on as they did with earlier pip versions.

You'll see the same thing if you build wheels for production and then try to use them with an older version of numpy.

I'm hesitant to pin numpy in pyproject.toml because the right version sort of depends on your Python version. However, there's a potential stumbling block here that needs to be addressed. If anybody has more experience and wisdom to share about the new features of pip, I'm all ears.

--
Sean Gillies


Joris Van den Bossche
 

Op vr 25 jan. 2019 om 22:45 schreef Sean Gillies via Groups.Io <sean=mapbox.com@groups.io>:
I'm hesitant to pin numpy in pyproject.toml because the right version sort of depends on your Python version. However, there's a potential stumbling block here that needs to be addressed. If anybody has more experience and wisdom to share about the new features of pip, I'm all ears.

You can (nowadays) specify the numpy version depending on the python version in pyproject.toml, which should make this possible. Something like:
    "numpy==1.8.2; python_version<='3.4'",
    "numpy==1.9.3; python_version=='3.5'",
    "numpy==1.12.1; python_version=='3.6'",
    "numpy==1.13.1; python_version>='3.7'",
In principle we should always pin the numpy version to the oldest supported version (if using build isolation), as otherwise using the built package in an environment with another numpy version breaks, as you explained.

Last year, we added a pyproject.toml to pandas, but it gave a lot of problems (https://github.com/pandas-dev/pandas/issues/20775), related to the fact that the above python-dependent numpy version was not possible yet (environment markers in pyproject.toml files were not yet supported at that time) + build dependencies needed to be installed as wheels instead of from source (which can give problems on certain platforms for which there are no wheels; this is also fixed in the meantime in pip I think). So in the end we decided to remove it again for the next release.
But it seems we should maybe look into adding it again.

Joris

 


Sean Gillies
 

Thanks, Joris! I'm going to capture this in a rasterio ticket.

On Fri, Jan 25, 2019 at 3:28 PM Joris Van den Bossche <jorisvandenbossche@...> wrote:
Op vr 25 jan. 2019 om 22:45 schreef Sean Gillies via Groups.Io <sean=mapbox.com@groups.io>:
I'm hesitant to pin numpy in pyproject.toml because the right version sort of depends on your Python version. However, there's a potential stumbling block here that needs to be addressed. If anybody has more experience and wisdom to share about the new features of pip, I'm all ears.

You can (nowadays) specify the numpy version depending on the python version in pyproject.toml, which should make this possible. Something like:
    "numpy==1.8.2; python_version<='3.4'",
    "numpy==1.9.3; python_version=='3.5'",
    "numpy==1.12.1; python_version=='3.6'",
    "numpy==1.13.1; python_version>='3.7'",
In principle we should always pin the numpy version to the oldest supported version (if using build isolation), as otherwise using the built package in an environment with another numpy version breaks, as you explained.

Last year, we added a pyproject.toml to pandas, but it gave a lot of problems (https://github.com/pandas-dev/pandas/issues/20775), related to the fact that the above python-dependent numpy version was not possible yet (environment markers in pyproject.toml files were not yet supported at that time) + build dependencies needed to be installed as wheels instead of from source (which can give problems on certain platforms for which there are no wheels; this is also fixed in the meantime in pip I think). So in the end we decided to remove it again for the next release.
But it seems we should maybe look into adding it again.

Joris

 



--
Sean Gillies