The Python package archive PyPI is a hugely successful platform for sharing open source libraries and applications. With such a large audience, it's attractive for bad actors too and there are some risks you must be aware of, whether using it on your personal laptop or at work. In this article we'll summarize what can go wrong, and then provide some ideas on using PyPI more safely.
What are the risks?
The reason to be mindful of pip install
is that when you type pip install foo
, you're fetching and running somebody else's code from the internet, and that person may be a bad actor. Always check your system is safe before installing the package, not after: customizable build system hooks mean that malicious code can be executed as soon as the install process begins.
The most common attacks in recent years have been:
- stealing passwords and login details from your local PC. See the W4SP Stealer for a leaked example of what can be done.
- using your CPU to perform energy-intensive calculations, usually cryptocurrency mining.
There may be other attacks too which security researchers aren't aware of yet. Remember that if you run Pip with admin permissions (e.g. sudo pip
) then attackers can even access other user accounts and corrupt the operating system.
Even if you are installing a well-known package from a trusted maintainer, there are things to be aware of:
- a developer may have their login comprised by an attacker, who then publishes malicious code.
- a developer may do something you don't expect, such as the 2022 node-ipc protest.
- you may hit a PyPI instance other than the one you expect. This happened at the end of 2022 with nightly PyTorch builds.
- you may typo the name and fall for a typo-squatting attack.
Why doesn't PyPI prevent this happening?
PyPI has a security team who do amazing work taking down malicious packages and ensuring the site is as safe as they can make it. The scale of the site coupled with its open access philosophy means a small team of volunteers simply can't catch every problem as it happens.
The site is managed by the Python Software Foundation, a charity which welcomes help to improve PyPI from companies using it. A list of specific fundable projects can be found here.
If you do see a security issue on PyPI such as typo-squatting, instructions for reporting it are here.
What can I do to stay safe?
- Consider using distro packages
A Linux distribution such as Debian, Ubuntu, Fedora or SuSE has a closed package repository, where only trusted maintainers can publish new code. This greatly reduces the risk of attack compared to PyPI. Good distributions have a security team that will alert you to known issues in packages.
Note that some distributions have paid security teams, while others are volunteers working on a best-effort basis. Make sure you know who is maintaining your distro and consider paying for commercial support.
The tradeoff with using distribution packages is that the extra step between you and the package developer means it takes more time for new versions to become available in the distro, and not everything from PyPI is going to be available. Some people use a mix of distro packages and PyPI for this reason.
- Use a sandbox
Malicious code can only steal data if you give it access. Setup a container or VM for development and only share the directory containing your current project, then you can pip install
from PyPI knowing that your secrets are safe.
Here's an example of starting a tiny container for development with only /home/sam/MyProject/
available in the container at /src
:
podman run -i -t --mount=type=bind,source=/home/sam/MyProject,destination=/src alpine:latest /bin/sh
Note that virtualenv and venv do not provide any sandboxing. Running pip install
in a virtualenv is just as dangerous as running it outside the virtualenv.
Integration tools such as Yocto and Buildroot do not provide sandboxing, while BuildStream does careful sandboxing to ensure that the software being built cannot access the internet or home directory. Note that Yocto and Buildroot DO still check source hashes (see below).
- Mirroring
There are many good reasons that companies and even large open source projects should be mirroring source code of their dependencies. One motivation is that you are insulated from unexpected changes on the public PyPI server, such as if an attacker compromises a developer's login and publishes a malicious release of a popular package. You're also safe from outages - what would happen to your project if Github.com went down for a day?
Codethink published a whitepaper on mirroring which you can request here.
- Check the hash of what you receive
Pip's package resolver looks at version numbers, which are created by humans and are easy to fake. If you want to be certain that you're installing the correct package, you need to calculate a cryptographic hash based on the actual contents of the package, and check that at install time.
Pip has an optional hash checking mode since version 8.0. Specifying the hashes in requirements.txt
increases safety and reliability because when you create a new environment, you'll get the exact same packages every time. If an attacker modifies an existing package on PyPI, Pip will notice that the cryptographic hashes do not match and will report an error.
Calculating the hashes is boring to do manually. Pipenv provides a new workflow built around Pip and virtualenv which includes hash checking by default. Alternately you can use a tool like pip-compile to calculate the hashes.
If you're integrating Python packages into a larger system, you'll probably have your own integration tool. At Codethink we often work with BuildStream, BitBake and Buildroot, all of which check the cryptographic hash of the packages during the integration process. BuildStream provides a track
command that speeds up the workflow of integrating and updating packages.
If you're importing packages directly from the upstream project's Git repo, reference an exact commit (fae0123
) rather than a branch (main
), as this is already a cryptographic hash. If you're importing source or binary packages from PyPI, use the view hashes
link to see the checksum:
- Code review
The reality is nobody has time to manually review all of the code on PyPI, and that's why we left this option until last. You should allocate some time to manually reviewing code of your dependencies. Based on the adage "given enough eyes, all bugs are shallow", try finding the least popular (least widely used) dependency and start there.
Security issues in more widely used packages will usually be quickly found by other people, and added to the PyPA vulnerability database. You can use a tool such as pip audit to automatically check if your Python dependencies have known vulnerabilities.
Conclusion
PyPI is a great, free resource which you can use with care following the advice above.
Here we've focused on Pip and Python, but the same principles apply to other Python package managers, and to any other open-access package repository online: that includes RubyGems, NPM, and crates.io.
Codethink are experts in supply chain security and are available for guidance and commercial project work. Get in touch for more details.
Other Content
- A new way to develop on Linux - Part II
- GUADEC 2024
- Developing a cryptographically secure bootloader for RISC-V in Rust
- Philip Martin, Meet the Team
- Improving systemd’s integration testing infrastructure (part 1)
- A new way to develop on Linux
- RISC-V Summit Europe 2024
- Safety Frontier: A Retrospective on ELISA
- Codethink sponsors Outreachy
- The Linux kernel is a CNA - so what?
- GNOME OS + systemd-sysupdate
- Codethink has achieved ISO 9001:2015 accreditation
- Outreachy internship: Improving end-to-end testing for GNOME
- Lessons learnt from building a distributed system in Rust
- FOSDEM 2024
- Introducing Web UI QAnvas and new features of Quality Assurance Daemon
- Outreachy: Supporting the open source community through mentorship programmes
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: Exploring a Bug in Stack Unwinding
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Long Term Maintainability
- FOSDEM 2023
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Deterministic Construction Service
- Codethink becomes a Microchip Design Partner
- Hamsa: Using an NVIDIA Jetson Development Kit to create a fully open-source Robot Nano Hand
- Using STPA with software-intensive systems
- Codethink achieves ISO 26262 ASIL D Tool Certification
- RISC-V: running GNOME OS on SiFive hardware for the first time
- Automated Linux kernel testing
- Native compilation on Arm servers is so much faster now
- Full archive