Fri 06 January 2023

Think before you Pip

The Python package archive PyPI is a hugely successful platform for sharing open source libraries and applications. With such a large audience, it's attractive for bad actors too and there are some risks you must be aware of, whether using it on your personal laptop or at work. In this article we'll summarize what can go wrong, and then provide some ideas on using PyPI more safely.

What are the risks?

The reason to be mindful of pip install is that when you type pip install foo, you're fetching and running somebody else's code from the internet, and that person may be a bad actor. Always check your system is safe before installing the package, not after: customizable build system hooks mean that malicious code can be executed as soon as the install process begins.

The most common attacks in recent years have been:

  • stealing passwords and login details from your local PC. See the W4SP Stealer for a leaked example of what can be done.
  • using your CPU to perform energy-intensive calculations, usually cryptocurrency mining.

There may be other attacks too which security researchers aren't aware of yet. Remember that if you run Pip with admin permissions (e.g. sudo pip) then attackers can even access other user accounts and corrupt the operating system.

Even if you are installing a well-known package from a trusted maintainer, there are things to be aware of:

  • a developer may have their login comprised by an attacker, who then publishes malicious code.
  • a developer may do something you don't expect, such as the 2022 node-ipc protest.
  • you may hit a PyPI instance other than the one you expect. This happened at the end of 2022 with nightly PyTorch builds.
  • you may typo the name and fall for a typo-squatting attack.

Why doesn't PyPI prevent this happening?

PyPI has a security team who do amazing work taking down malicious packages and ensuring the site is as safe as they can make it. The scale of the site coupled with its open access philosophy means a small team of volunteers simply can't catch every problem as it happens.

The site is managed by the Python Software Foundation, a charity which welcomes help to improve PyPI from companies using it. A list of specific fundable projects can be found here.

If you do see a security issue on PyPI such as typo-squatting, instructions for reporting it are here.

What can I do to stay safe?

  1. Consider using distro packages

A Linux distribution such as Debian, Ubuntu, Fedora or SuSE has a closed package repository, where only trusted maintainers can publish new code. This greatly reduces the risk of attack compared to PyPI. Good distributions have a security team that will alert you to known issues in packages.

Note that some distributions have paid security teams, while others are volunteers working on a best-effort basis. Make sure you know who is maintaining your distro and consider paying for commercial support.

The tradeoff with using distribution packages is that the extra step between you and the package developer means it takes more time for new versions to become available in the distro, and not everything from PyPI is going to be available. Some people use a mix of distro packages and PyPI for this reason.

  1. Use a sandbox

Malicious code can only steal data if you give it access. Setup a container or VM for development and only share the directory containing your current project, then you can pip install from PyPI knowing that your secrets are safe.

Here's an example of starting a tiny container for development with only /home/sam/MyProject/ available in the container at /src:

podman run  -i -t --mount=type=bind,source=/home/sam/MyProject,destination=/src alpine:latest /bin/sh

Note that virtualenv and venv do not provide any sandboxing. Running pip install in a virtualenv is just as dangerous as running it outside the virtualenv.

Integration tools such as Yocto and Buildroot do not provide sandboxing, while BuildStream does careful sandboxing to ensure that the software being built cannot access the internet or home directory. Note that Yocto and Buildroot DO still check source hashes (see below).

  1. Mirroring

There are many good reasons that companies and even large open source projects should be mirroring source code of their dependencies. One motivation is that you are insulated from unexpected changes on the public PyPI server, such as if an attacker compromises a developer's login and publishes a malicious release of a popular package. You're also safe from outages - what would happen to your project if Github.com went down for a day?

Codethink published a whitepaper on mirroring which you can request here.

  1. Check the hash of what you receive

Pip's package resolver looks at version numbers, which are created by humans and are easy to fake. If you want to be certain that you're installing the correct package, you need to calculate a cryptographic hash based on the actual contents of the package, and check that at install time.

Pip has an optional hash checking mode since version 8.0. Specifying the hashes in requirements.txt increases safety and reliability because when you create a new environment, you'll get the exact same packages every time. If an attacker modifies an existing package on PyPI, Pip will notice that the cryptographic hashes do not match and will report an error.

Calculating the hashes is boring to do manually. Pipenv provides a new workflow built around Pip and virtualenv which includes hash checking by default. Alternately you can use a tool like pip-compile to calculate the hashes.

If you're integrating Python packages into a larger system, you'll probably have your own integration tool. At Codethink we often work with BuildStream, BitBake and Buildroot, all of which check the cryptographic hash of the packages during the integration process. BuildStream provides a track command that speeds up the workflow of integrating and updating packages.

If you're importing packages directly from the upstream project's Git repo, reference an exact commit (fae0123) rather than a branch (main), as this is already a cryptographic hash. If you're importing source or binary packages from PyPI, use the view hashes link to see the checksum:

List of package hashes provided by PyPI website

  1. Code review

The reality is nobody has time to manually review all of the code on PyPI, and that's why we left this option until last. You should allocate some time to manually reviewing code of your dependencies. Based on the adage "given enough eyes, all bugs are shallow", try finding the least popular (least widely used) dependency and start there.

Security issues in more widely used packages will usually be quickly found by other people, and added to the PyPA vulnerability database. You can use a tool such as pip audit to automatically check if your Python dependencies have known vulnerabilities.

Conclusion

PyPI is a great, free resource which you can use with care following the advice above.

Here we've focused on Pip and Python, but the same principles apply to other Python package managers, and to any other open-access package repository online: that includes RubyGems, NPM, and crates.io.

Codethink are experts in supply chain security and are available for guidance and commercial project work. Get in touch for more details.

Other Content

Get in touch to find out how Codethink can help you

sales@codethink.co.uk +44 161 660 9930

Contact us