Managing Python Environments: pyenv and uv Tutorial (Data Science Engineering Gap Part 1)

crowintelligenceteam

6 months ago

This is the second post in a series about bridging the gap from beginner programmer to advanced data science practitioner. These aren’t programming concepts – they’re software engineering practices that enable you to build robust, maintainable systems.

How to Fix the “Works On My Machine” Problem in Python

You’ve written some Python code that works perfectly on your laptop. You share it with a colleague, and suddenly nothing runs. Or worse – you come back to your own project from last year, and it’s completely broken. Python has been updated, some packages followed the new version, others didn’t, and your carefully crafted solution is now a pile of import errors.

This isn’t a hypothetical scenario. It’s the daily reality of working with Python without proper environment management.

I’ve seen this play out in painful ways. A colleague once spent hours trying to figure out why a package was running slowly, only to discover that the original implementation used PyPy (a super-fast Python implementation), but nobody had documented this crucial detail. Another project mysteriously failed because one developer used conda’s Python, another used the system Python, and a third had installed vanilla Python from python.org. Same code, three different Python installations, three different sets of problems.

The fundamental issue: Python isn’t just Python. There are different versions (3.10, 3.11, 3.12), different implementations (CPython, PyPy, Jython), and countless package versions that may or may not work together. Without managing these variables explicitly, reproducibility becomes impossible.

Why Python Environment Management Matters for Data Scientists

Environment management isn’t about being fastidious or following best practices for their own sake. It’s about:

Reproducibility – If you can’t recreate your environment, you can’t share your work, deploy to production, or even guarantee it will run tomorrow.

Collaboration – Your colleagues need to run your code without spending hours debugging version conflicts.

Maintenance – Being kind to your future self. When you return to a project months later, you need to know exactly which versions of everything it requires.

Production deployment – Code that works on your laptop but fails in production is worse than no code at all.

This is where the transition from beginner to advanced practitioner really begins. It’s not about writing better algorithms – it’s about building systems that actually work reliably.

Quick Reference: Python Environment Setup

Essential Commands:

# 1. Install and set Python version
pyenv install 3.11.7
pyenv local 3.11.7

# 2. Initialize project
uv init my-project
cd my-project

# 3. Add dependencies
uv add pandas numpy
uv add --dev pytest ruff

# 4. Install all dependencies
uv sync

# 5. Run your code
uv run python main.py

Files Created:

.python-version – Specifies Python version
pyproject.toml – Project configuration and dependencies
uv.lock – Locked dependency versions for reproducibility
.venv/ – Virtual environment (don’t commit to git)

New to these concepts? Read on for the full explanation of why and how this workflow solves the “works on my machine” problem.

Choosing Python Environment Management Tools: pyenv and uv

There are many tools out there for managing Python environments and dependencies: pip, virtualenv, pipenv, poetry, conda, and probably others I’m forgetting. I’ve used most of them over the years. Each has strengths and weaknesses, and new tools will inevitably emerge.

For this series, we’ll use two tools that work excellently together:

pyenv – A lightweight Python version manager that lets you install and switch between different Python versions. Think of it as your Python version switcher.

uv – A fast, modern Python package and project manager that handles virtual environments and dependency management. It’s written in Rust and offers the best user experience currently available. It’s 10-100x faster than pip, simple to use, and handles both virtual environments and dependency management in one unified tool. More importantly, it makes reproducibility easy by default rather than an afterthought.

Will these be the best tools forever? Probably not. But the principles we’re covering – version pinning, virtual environments, dependency locking – are universal. Learn them with pyenv and uv today, and you’ll be able to adapt to whatever tools come next.

Part 1: Managing Python Versions with pyenv

Before we even talk about packages, we need to talk about Python itself. Different projects often require different Python versions. Maybe you’re maintaining legacy code that needs Python 3.9, while your new project uses features from Python 3.12. Or your production environment is locked to a specific Python version, and you need to match it locally.

Why Multiple Python Versions Matter

You might encounter situations like:

Legacy projects that depend on Python 3.8 or 3.9
New projects that want to use the latest Python features
Production constraints where you need to match a specific version
Special implementations like PyPy for performance-critical code
Testing your code across multiple Python versions

Without a Python version manager, you’re stuck with whatever version came with your system, or you’re manually downloading and installing different versions, which quickly becomes a mess.

Installing pyenv

pyenv is a lightweight Python version manager. Installation varies by platform – on macOS/Linux, you typically use a shell script; on Windows, we recommend using Windows Subsystem for Linux (WSL2), then following the Linux installation instructions. Check the official pyenv documentation for detailed instructions for your platform.

For quick reference:

# macOS
brew install pyenv

# Linux
curl https://pyenv.run | bash

# Windows - use WSL2, then follow Linux instructions

After installation, you’ll need to add pyenv to your shell configuration (the documentation provides specific instructions for your shell).

Essential pyenv Commands

Once pyenv is installed, you have control over Python versions:

# See all available Python versions you can install
pyenv install --list

# Install a specific Python version
pyenv install 3.11.7

# See which Python versions you have installed
pyenv versions

# Set the global default Python version
pyenv global 3.11.7

# Set a Python version for a specific project directory
pyenv local 3.11.7

Note: pyenv can install not just standard CPython versions, but also alternative implementations like PyPy and even conda distributions (miniconda, anaconda). This flexibility means you can use pyenv regardless of which Python ecosystem you prefer.

The pyenv local command is particularly powerful – it creates a .python-version file in your project directory. Anyone with pyenv who enters that directory will automatically use the correct Python version. This is your first step toward reproducibility.

Part 2: Virtual Environments and Package Management with uv

Now that you can manage Python versions, you need to manage packages. This is where things get complicated with traditional tools, but uv makes it straightforward.

Why uv?

uv is a modern, all-in-one Python package and project manager that replaces multiple traditional tools:

Speed: 10-100x faster than pip for package operations
Simplicity: One tool instead of juggling pip, venv, pip-tools, and poetry
Modern workflow: Project-based approach with automatic dependency locking
Reproducibility by default: The uv.lock file ensures everyone gets the same environment
Compatibility: Works with standard CPython installations, conda Python environments, and most Python distributions

Note: PyPy support is still in development, so if you need PyPy for performance-critical code, check the current compatibility status in the uv documentation.

The Modern uv Workflow

Here’s how you work with uv for a new project:

# Initialize a new project
uv init my-analysis-project
cd my-analysis-project

# Add your main dependencies
uv add pandas numpy scikit-learn

# Add development dependencies (testing, linting, formatting)
uv add --dev pytest ruff

# Sync your environment (install everything)
uv sync

# Run your code in the project environment
uv run python main.py

That’s it. No manual virtual environment creation, no separate requirements files, no confusion about what’s installed where.

Understanding the Files uv Creates

When you work with uv, you’ll see these files in your project:

pyproject.toml – Your project manifest. This is the human-readable file where your dependencies are declared:

[project]
name = "my-analysis-project"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "pandas>=2.0.0",
    "numpy>=1.24.0",
    "scikit-learn>=1.3.0",
]

[dependency-groups]
dev = [
    "pytest>=8.0.0",
    "ruff>=0.1.0",
]

uv.lock – The lockfile with exact versions of every package and all their dependencies. This file is automatically generated and updated by uv. You commit this to version control to ensure everyone gets identical environments.

.venv/ – The virtual environment directory. This is where packages are actually installed. You don’t commit this to version control – it’s recreated from the lockfile.

Dev Dependencies: The Right Way

Not all dependencies are created equal. Your code needs pandas and numpy to run, but it doesn’t need pytest or ruff to run – those are only needed during development.

This distinction matters for several reasons:

Production deployments should only install what’s needed to run the code
Clarity about what’s required vs what’s optional
Security – fewer packages in production means fewer potential vulnerabilities

With uv, you add development dependencies using the --dev flag:

# Add testing framework
uv add --dev pytest

# Add code formatter and linter
uv add --dev ruff

# Add type checker
uv add --dev mypy

These go into the [dependency-groups] section of pyproject.toml. By default, uv sync installs everything including dev dependencies. For production, you’d use:

uv sync --no-dev

You can also create custom dependency groups for better organization:

# Add to a custom "lint" group
uv add --group lint ruff

# Add to a custom "docs" group
uv add --group docs sphinx

A Note on `uv pip install`

You might see references to uv pip install in documentation or blog posts. This exists primarily for compatibility with pip workflows – it mimics pip’s behavior for people transitioning from pip.

We recommend using uv add instead. Here’s why:

uv add updates both pyproject.toml and uv.lock automatically
uv add removes transitive dependencies when you uninstall packages
uv add is the intended, modern way to use uv as a project manager
uv pip install doesn’t update your project files, breaking reproducibility

The only time you might use uv pip install is when you’re explicitly trying to maintain compatibility with old pip-based workflows. For new projects, stick with uv add.

Putting It Together: Complete Workflow

Let’s walk through setting up a complete data science project from scratch:

# Set the Python version for this project
pyenv local 3.11.7

# Initialize the uv project
uv init housing-price-analysis
cd housing-price-analysis

# Add data science dependencies
uv add pandas numpy scikit-learn matplotlib

# Add development tools
uv add --dev pytest ruff jupyterlab

# Sync everything (creates .venv and installs packages)
uv sync

# Run your analysis
uv run python analyze_housing.py

# Or start Jupyter
uv run jupyter lab

Now, when you share this project with a colleague:

# They clone your repo
git clone your-repo

# They have pyenv, so the .python-version file sets the right Python
cd housing-price-analysis

# One command installs everything exactly as you had it
uv sync

# They can immediately run your code
uv run python analyze_housing.py

No hunting for the right Python version. No confusion about which packages are needed. No “works on my machine” problems. Just reproducible, shareable code.

IDE Integration: Automating the Setup

Remember in Part 0 when we talked about choosing your IDE? Here’s where that investment pays off even more. Modern IDEs can automate much of this environment setup, making the workflow we just covered nearly effortless.

PyCharm allows you to configure uv as your default project tool. When you create a new project in PyCharm, you can:

Automatically create the virtual environment using uv
Select the Python version from your pyenv installations
Have the interpreter configured without manual steps

VS Code with the Python extension also supports this workflow, allowing you to select uv for environment management and choose your Python interpreter from the available versions.

The exact setup steps vary between IDEs and change with updates, so consult your IDE’s documentation for current instructions. But the principle is the same: your IDE can handle the mechanics once you’ve configured it properly.

This doesn’t replace understanding what’s happening under the hood – but once you know the concepts, having your IDE handle the mechanics saves time and reduces errors. You’re not blindly clicking buttons; you understand that the IDE is creating a .venv, installing from uv.lock, and setting the correct Python interpreter. The automation serves you rather than mystifying you.

The Bigger Picture

In Part 0, we covered choosing the right IDE for your development work. Now you’ve set up proper environment management with pyenv and uv. These aren’t separate concerns – they work together to create a professional development workflow.

Environment management feels like overhead when you’re starting out. It’s tempting to just pip install things globally and get on with your analysis. But this is exactly the transition from beginner to advanced practitioner – recognizing that the time spent setting up proper environments pays back exponentially in reduced debugging, easier collaboration, and maintainable systems.

When you return to your project from last year, you’ll thank yourself for having a locked environment. When your colleague can run your code on the first try, they’ll appreciate your professionalism. When you deploy to production and it actually works, you’ll understand why this matters.

This isn’t about following rules for their own sake. It’s about building systems that work reliably, not just once on your machine, but everywhere, for everyone, every time.

What’s Next?

We’ve laid the foundation with Python versions and virtual environments. But throwing all your code into a single directory isn’t sustainable. In the next post, we’ll cover project structure – how to organize your code, where things should live, and why templates like cookiecutter can save you from reinventing the wheel on every project.

The environment is stable. Now let’s build something on top of it.

Coming up next: “Part 2 – Project Structure and Templates: Organizing Code That Scales”