Dear pip, we have to talk (part 2 of 3)

Majic

Dear pip, we have to talk (part 2 of 3)

Pingbacks

Comments

In previous post in these series I tried to describe all the different issues I had/have with the Python packaging ecosystem, in a slightly poetic (annoying) way. In this part of the series, it'll get a bit more technical, and you will be shown how to manage your dependencies using pip-tools.

But before that - let's do a small recap of standard tools for installing Pyhton packages.

If you have or still are working with Python (both as a dev and/or sysadmin), at some point in your life you realised that the Python package you are using (or need) might be either outdated or even non-existing in your favourite distribution of choice.

Given the fast-paced nature of Python package development, it is nearly impossible for most distributions to keep-up.

Luckily, it is (usually) fairly easy to grab a new Python package using pip - a dedicated Python package manager. This makes package installation as easy as:

sudo pip install package_name

The above example uses sudo on purpose, in order to emphasize that basic usage of pip requires system-wide privileges for installation.

The first problem is that doing something like this could result in system files getting trampled over. Second problem is that you affect entire system, even when you need the package just for a very specific project. Finally, and not least, it is a very dangerous thing to do from security perspective. For now, we'll deal with the first two aspects, while security will be covered in more details in part 3 of the series.

An immediate improvement for the above process is to have pip install packages in a dedicated directory, preferably available just to your own user. This can be done easily with something along the lines of:

mkdir ~/python/
pip install -t ~/python/ package_name

While using the -t option does install package in dedicated directory, it does not make it readily available to Python interpreter. Easiest way to fix that is to point the PYTHONPATH environment variable to the directory:

export PYTHONHOME="$HOME/python"
python -c "import module_name"

Next stumbling block is having different versions of Python packages installed at the same time, for use with different projects. Luckily, this is easily solved using virtualenv, which provides us with (amongst others):

Environment isolation (each virtual environment has its own, separate set of pacakges).
Ability to have multiple versions of same package available in different contexts (for different projects).
Ability to easily try out different packages and different package versions without affecting existing work.

Common workflow for using virtualenv is:

# Create virtual environment.
virtualenv ~/virtualenv

# Activate virtual environment so packages are installed to and used from it.
source ~/virtualenv/bin/activate

# After it is activated, install packages using pip.
pip install package_name

# Use package modules in one way or another.
python -c "import module_name"

# Deactivate the virtual environment.
deactivate

Building on top of virtualenv is the convenience tool called virtualenvwrapper - a set of scripts that make it much easier to manage virtual environments. These scripts provide:

Set of commands for managing and working with multiple virtual environments.
Centralised storage of all virtual environments under one directory (~/.virtualenvs/ by default).
Support for bash, ksh, and zsh.

Following the initial example, we can improve the workflow:

# Create a named virtual environment.
mkvirtualenv myenv

# Activate virtual environment so packages are installed to and used from
# it. Can be run from anywhere.
workon myenv

# After it is activated, install packages using pip.
pip install package_name

# Use package modules in one way or another.
python -c "import module_name"

# Deactivate the virtual environment.
deactivate

Before moving on, let's also mention that pip provides ability to pin packages to a specific version, or to specify packages that should be installed via specially formatted requirements file. Both of these features work both with and without Python virtual environments.

For example, you could install a very specific version of Django framework with:

pip install django==1.10.7

You could also create a file with the (pinned) package listed in it:

# requirements.txt
django==1.10.7

And then install all packages lised in it with:

pip install -r requirements.txt

Since nobody likes all the typing, one can also use pip to produce listing of packages installed in the virtual environment for pinning purposes:

pip freeze > requirements.txt

In addition to exact matches, more recent versions of pip also come with cool things like approximate matches for installing latest patch update (django~=1.10.0), or ability to specify minimum/maximum version (django>1.8 and django<1.11).

In spite of all this goodness (especially with pip freeze and pip install -r), one of the big issues with Python virtual environments is keeping packages up to date. pip does provide ability to upgrade a package, via pip install --upgrade, but:

You need to essentially specify full list of packages you want to upgrade.
pip will not correctly resolve dependencies - you might end-up with newer dependency than what you need. This is true for both upgrading and merely checking (pip list --outdated) for available updates.
pip will not prune unused packages. E.g. packages that were required by another package in previous version but are not used in the newer version.

Because of all of these limitations, up until recently my usual process for doing and checking for upgrades was:

Create separate Python virtual environment.
Install packages that are direct requirements of a project (letting pip resolve the dependencies).
Run pip freeze to populate an updated requirements file.
Switch back to original environment and install updated packages via pip install -r requirements.txt.

Unfortunately, this is quite a tedious process, and it still does not solve issue of pruning unused packages.

Luckily, there is a better way to take care of all this with the help of pip-tools.

So, what is pip-tools? Essentially, pip-tools makes it possible to:

Provide list of packages you need (without dependencies).
Produce up-to-date version of (pinned) requirements file.
Synchronize Python virtual environment with requirements file, removing, installing, and upgrading dependencies as necessary to make it identical to what is listed in the requirements file.

The basic workflow with pip-tools is (all within virtual environment):

Create your input file, listing only the packages you are directly interested into installing (without dependencies):
```
# requirements.in
django
django-contrib-comments
```

Convert this input file into pinned requirements file:

# This will produce requirements.txt.
pip-compile requirements.in

Synchronise your virtual environment so it matches the requirementes file:
```
pip-sync requirements.txt
```

Every once in a while upgrade the requirements file and synchronise the environment again:

# Produce updated requirements.txt
pip-compile --upgrade requirements.in

# Apply the changes.
pip-sync requirements.txt

As you can see, the process is quite straighforward, and solves most of the issues listed. pip-compile can also work with includes, and even supports pinning within the input files themselves (for example if you want to keep to LTS version of Django or some other package):

# base.in
django~=1.10.0

# requirements.in
-r base.in
django-contrib-comments

The only feature missing at the moment is ability to check if the generated requirements file is up-to-date or not.

This is somewhat easy to script around by producing new requirements file and doing some diffing (with some sorting and lower-casing involved). You may be interested to have a look at one of the scripts user for my own infrastructure needs.

As a final note, I cannot emphasize enough how useful the pip-tools has proven to be, and I would really advise you to try it out and understand how it works. Once you get used to it, you will never want to work with Python virtual environments and packages in any other way.

April 2017

October 2017