UV: The Definitive Solution for PyTorch, flash-attn, VeRL and OpenRLHF

Background

在配置 flash-attn 环境时，经常会遇到安装报错，由于 flash-attn 需要在本地进行编译构建。此外，根据 vLLM GPU Installation，vLLM 团队也在积极使用 uv 这一包管理工具。本文借助最新版本的 uv，给出 linux 下的 flash-attn, PyTorch 等安装的终极解决方案。本文的测试环境为 Ubuntu 22.04.5 WSL2，cuda 版本如下：

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83                 Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler_0

Installation

根据官方文档 Installation | uv，存在多种 uv 安装方式，例如 pip, homebrew, winget 等。这里我们给出 linux 下的独立安装方式：

curl -LsSf https://astral.sh/uv/install.sh | sh
# or
# wget -qO- https://astral.sh/uv/install.sh | sh

或者使用 GitHub URL 进行下载：

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/astral-sh/uv/releases/download/0.8.15/uv-installer.sh | sh

此外，可以为 GitHub release 添加 https://hk.gh-proxy.com/ 等加速前缀，如下：

curl --proto '=https' --tlsv1.2 -LsSf https://hk.gh-proxy.com/https://github.com/astral-sh/uv/releases/download/0.8.15/uv-installer.sh | sh

根据官方文档 Environment variables | uv，可以设置环境变量 UV_INSTALL_DIR 来指定 uv 安装路径，默认值为 ~/.local/bin。

接着，检查是否安装成功：

$ uv -V
uv 0.8.15

通过该方式下载的 uv，我们可以通过如下命令进行更新：

uv self update

更多安装方式见 Installer options | uv。

Python Versions

我们可以使用 project.requires-python 配置项目支持的 Python 版本：

[project]
requires-python = ">=3.11"

此外，uv 支持配置 Python 使用的优先级。换句话说，我们的机器上可能存在 conda 安装的 Python，系统自身安装的 Python，也可能是 uv 安装的 Python。这里我们可以配置优先使用 uv 安装的 Python:

[tool.uv]
python-preference = "managed"

实际上，managed 也是默认配置。然而，managed Python 安装包默认位于 astral-sh/python-build-standalone。我们可以为 GitHub release 配置加速：

[tool.uv]
python-install-mirror = "https://github.com/astral-sh/python-build-standalone/releases/download"

或者配置 tool.uv.python-preference="system"，提高 system Python 的使用优先级。

Development Dependencies

根据 Managing dependencies | uv，我们可以定义一些开发依赖，这些依赖仅限于本地，不会随 requirements 被发布到 pypi 等索引中。例如，我们可以定义三个分组 dev, lint 与 format:

[dependency-groups]
dev = [{ include-group = "lint" }, { include-group = "format" }]
lint = ["ruff"]
format = ["black", "isort"]

其中，uv sync 自动同步 dev 分组。可以使用 --no-dev 来取消对 dev 分组的同步。

此外，uv 支持对这些开发依赖的配置，一个典型的配置如下：

[tool.black]
line-length = 88

[tool.isort]
profile = "black"
line_length = 88
py_version = 311
extra_standard_library = ["typing_extensions"]

[tool.ruff]
line-length = 88

[tool.ruff.format]
docstring-code-format = true
quote-style = "double"

Platform Constraint

根据 Configuring projects | uv，为了便于依赖的解析，我们可以对平台进行限制，相关配置如下：

required-environments 表示项目必须支持的环境。这里配置为项目必须支持 x86_64 linux
environments 表示仅需支持的环境。这里配置为项目仅需支持 linux

实际上，由于深度学习环境大部分为 linux x86 平台，故便于简化依赖解析，可以配置如下：

[tool.uv]
environments = ["sys_platform == 'linux' and platform_machine == 'x86_64'"]

Examples

本节会依次介绍 LLM 训练推理，尤其是 RLHF 相关库的安装配置，以具体案例展开对 uv 核心功能的讲解。首先，分析各个库安装时的痛点需求，以及 uv 对应的解决方案：

Packages	Requirements	Solutions
PyTorch	区分 cpu, cuda 等版本，分别指定 index url	`[tool.uv.sources]`, `[[tool.uv.index]]`
Flash Attention	本地编译构建	`[tool.uv.extra-build-dependencies]`, `[tool.uv.extra-build-variables]`
VeRL	以可编辑模式安装	`[tool.uv.sources]`
OpenRLHF	官方依赖设置不合理，可能需要手动修改	`[[tool.uv.dependency-metadata]]`

以下完整示例见 GitHub Magnicord/llm-env-templates。

PyTorch

在安装 flash-attn 前，我们先分析 torch 的安装配置。torch 常见的版本有 cuda 版与 cpu 版，当我们直接使用 pip install torch 时，会从 pypi 进行索引，默认下载的是 cpu 版。PyTorch 官方提供了自己的 index url:

这里 cuda 版本的 index url 后缀是版本名，可以根据自身机器 cuda 版本进行修改，例如：

cuda 11.8: https://download.pytorch.org/whl/cu118
cuda 12.6: https://download.pytorch.org/whl/cu126
cuda 12.9: https://download.pytorch.org/whl/cu129

pytorch/RELEASE.md at main · pytorch/pytorch 给出了 torch 与 cuda 的版本兼容表格。

那么 uv 中如何配置 index 呢？我们首先利用 uv 初始化一个项目 torch-demo：

uv init torch-demo
# Initialized project `torch-demo` at `/path/to/torch-demo`

uv 会在项目目录创建如下文件：

├── .gitignore
├── .python-version
├── README.md
├── main.py
└── pyproject.toml

更多 init 方式见 Creating projects | uv。

我们的配置均在 pyproject.toml 中进行，默认文件内容如下：

[project]
name = "torch-demo"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = []

这里的 project.dependencies 类似 requirements.txt，所有必选依赖都在该列表中指定。

Using uv with PyTorch | uv 中详细说明了 uv 安装 PyTorch 的配置。首先，声明 torch 的 index url:

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cpu"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu128"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu126"
url = "https://download.pytorch.org/whl/cu126"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu126"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu124"
url = "https://download.pytorch.org/whl/cu124"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu124"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu118"
url = "https://download.pytorch.org/whl/cu118"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu118"
explicit = true

其中，explicit = true 表示向 uv 声明存在该 index url，但只有明确指定时才会使用。因此，这里可以声明很多 index url。这里仅保留 PyTorch 官方源，注释了南京大学镜像源，原因在于该镜像源实测速度并不是太快，且有些 wheel 并没有同步。另一个常见的 PyTorch 镜像源为阿里云镜像源。当我们配置如下：

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://mirrors.aliyun.com/pytorch-wheels/cu128"
explicit = true

输入命令 uv sync 后，会报错如下：

× No solution found when resolving dependencies for split (markers: platform_machine == 'x86_64' and
  │ sys_platform == 'linux'):
  ╰─▶ Because there is no version of torch{sys_platform == 'linux'}==2.8.0 and your project depends on torch{sys_platform == 'linux'}==2.8.0, we can conclude that your project's requirements are unsatisfiable.

这是因为阿里源并不是 PyPI-style 格式，详见 Package indexes | uv。实际上，当使用常规 pip install 时，也需使用 -f (--find-links) 来指定阿里源，而非 --extra-index-url 或 -i (--index-url)。为了能够使用阿里源，我们需要配置 format = "flat"，如下：

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://mirrors.aliyun.com/pytorch-wheels/cu128"
explicit = true
format = "flat"

接着，我们可以选择指定清华源作为 pypi 默认镜像源：

[[tool.uv.index]]
url = "https://pypi.tuna.tsinghua.edu.cn/simple"
default = true

然后，为 torch 等库配置刚才设定的 index url:

[tool.uv.sources]
torch = [
    { index = "pytorch-cpu", marker = "sys_platform != 'linux'" },
    { index = "pytorch-cu128", marker = "sys_platform == 'linux'" },
]
torchvision = [
    { index = "pytorch-cpu", marker = "sys_platform != 'linux'" },
    { index = "pytorch-cu128", marker = "sys_platform == 'linux'" },
]
torchaudio = [
    { index = "pytorch-cpu", marker = "sys_platform != 'linux'" },
    { index = "pytorch-cu128", marker = "sys_platform == 'linux'" },
]

其中，当系统为 linux 时，采用 pytorch-cu128 index，否则使用 pytorch-cpu。这样，当我们在 linux 系统上安装环境时，就会默认安装 cuda 版本的 PyTorch。注意，这里只是指定 torch, torchvision, torchaudio 的 index，并非指定我们项目需要依赖这三个包。依赖需要在 project.dependencies 中指定：

[project]
dependencies = ["torch==2.8.0"]

最后，同步环境：

uv sync

此时，会生成 uv.lock 的锁文件，该文件记录了各个库之间的依赖关系，索引以及版本等详细信息。如果该锁文件存在，当我们运行 uv sync, uv lock 等命令时，都会倾向于使用锁文件中记录的版本号。

Flash Attention

配置完 torch 后，我们开始配置 flash-attn。由于 uv 0.8.4 的版本更新，flash-attn 的配置变得极为简单，相关 issue 如下：

首先，直接给出配置如下：

[tool.uv.extra-build-dependencies]
flash-attn = [{ requirement = "torch", match-runtime = true }]

[tool.uv.extra-build-variables]
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }

然后，将 flash-attn 添加至 project.dependencies 中：

[project]
dependencies = ["torch==2.8.0", "flash-attn==2.8.3"]

最后，同步环境：

uv sync

VeRL

由于 RLHF 场景下，存在修改框架的需求，verl Installation 给出标准的 pip 可编辑模式安装命令如下：

git clone https://github.com/volcengine/verl.git
cd verl
pip install --no-deps -e .

为了简化场景，假设 verl 位于项目根目录：

├── verl
├── .gitignore
├── .python-version
├── README.md
└── pyproject.toml

同时，假设已经按照前文配置好了 torch 与 flash-attn 的安装。

Managing dependencies | uv 给出了可编辑模式安装的命令：

uv add --editable ./verl

该命令会默认采用 Workspace 格式进行安装，具体表现为在 pyproject.toml 中添加配置如下：

[project]
dependencies = [
    # ...
    "verl",
    # ...
]

[tool.uv.sources]
verl = { workspace = true }

[tool.uv.workspace]
members = ["verl"]

运行该命令后，报错如下：

× No solution found when resolving dependencies for split (markers: sys_platform == 'linux'):
  ╰─▶ Because verl[sglang] depends on torch{sys_platform == 'linux'}==2.7.1 and verl-demo depends
      on torch{sys_platform == 'linux'}==2.8.0, we can conclude that verl-demo and verl[sglang]
      are incompatible.
      And because your workspace requires verl[sglang] and verl-demo, we can conclude that your
      workspace's requirements are unsatisfiable.

因为 uv 在 workspace 模式下，会解析 verl 所有 extra dependency，上述报错为 sglang 这一 extra dependency。通过查看 verl/setup.py 相关源码发现，verl[sglang] 依赖于 torch==2.7.1，与上述报错一致。

为了解决这一问题，我们当然可以修改 project.dependencies，将 torch 从 2.8.0 降级至 2.7.1。然而，这种方案并不完美，原因如下：

假设项目仅将 vllm 作为推理引擎，我们并不需要考虑 sglang 的依赖问题
尽管解决了 verl[sglang] 依赖的冲突，如果还有其他 extra 冲突且无法解决时，应该怎么办

因此，我们选择手动指定 editable 安装配置：

[project]
dependencies = [
    # ...
    "verl",
    # ...
]

[tool.uv.sources]
verl = { path = "verl", editable = true }

最后，同步环境：

uv sync

注意，当前最新版 vllm==0.10.1.1 亦尚未支持最新版 torch==2.8.0，因此如果需要安装 vllm，则同样需要降低版本至 torch==2.7.1。

OpenRLHF

OpenRLHF 与 verl 类似，一般通过 editable 模式安装，配置如下：

[project]
dependencies = [
    # ...
    "openrlhf",
    # ...
]

[tool.uv.sources]
openrlhf = { path = "OpenRLHF", editable = true }

观察 OpenRLHF 的核心依赖中的 transformers==4.55.2，而当前 transformers 已经更新至 4.56.0。假如我们想依赖最新版本 transformers 添加的新模型，但同时声明 transformers==4.56.0 与 openrlhf 依赖则会发生冲突。如果我们确信 transformers==4.56.0 与 OpenRLHF 兼容时，则可以手动修改 OpenRLHF 的依赖，大致步骤如下：

找到需要修改依赖的项目的 setup.py（通常为 setup.py，也有可能是 pyproject.toml 等）
在 setup.py 中找到 setup() 方法传入的 name, version, install_requires 与 extras_require 参数
在本项目 pyproject.toml 中分别转换为 [[tool.uv.dependency-metadata]] 属性中的 name, version, requires-dist 与 provides-extras
修改相关依赖

转换后如下：

[[tool.uv.dependency-metadata]]
name = "openrlhf"
version = "0.8.11"
requires-dist = [
    # install_requires
    "accelerate",
    "bitsandbytes",
    "datasets",
    "deepspeed==0.17.5",
    "einops",
    "flash-attn==2.8.3",
    "grpcio>=1.74.0",
    "isort",
    "jsonlines",
    "loralib",
    "optimum",
    "optree>=0.13.0",
    "packaging",
    "peft",
    "pynvml>=12.0.0",
    "ray[default]==2.48.0",
    "tensorboard",
    "torch",
    "torchdata",
    "torchmetrics",
    "tqdm",
    "transformers==4.56.0", # upgrade from 4.55.2
    "transformers_stream_generator",
    "wandb",
    "wheel",
    # extra_require
    "vllm==0.10.1.1; extra == 'vllm'",
    "vllm>0.10.1.1; extra == 'vllm_latest'",
    "ring_flash_attn; extra == 'ring_flash_attn'",
    "liger_kernel; extra == 'liger_kernel'",
]
provides-extras = ["vllm", "vllm_latest", "ring", "liger"]

以上问题是由于对依赖版本更新不及时导致，另一个问题可能是强制将依赖版本设置为最新。假设服务器的硬件环境最高仅支持 cuda 12.4，则必须至少降级 vllm<=0.8.5.post1，进而 torch<=2.6.0。一个简易的依赖配置如下：

[project]
name = "openrlhf-cu124-demo"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "torch==2.6.0",
    "transformers==4.56.0",
    "vllm==0.8.5.post1",
    "openrlhf",
]

如果此时运行 uv sync，则至少会报错如下：

× No solution found when resolving dependencies for split (markers: platform_machine == 'x86_64' and sys_platform ==
  │ 'linux'):
  ╰─▶ Because only openrlhf==0.8.11 is available and openrlhf==0.8.11 depends on ray[default]==2.48.0, we can conclude that all versions of openrlhf depend on ray[default]==2.48.0.
      And because ray[default]==2.48.0 depends on opentelemetry-sdk>=1.30.0 and vllm==0.8.5.post1 depends on opentelemetry-sdk>=1.26.0,<1.27.0, we can conclude that all versions of openrlhf and vllm==0.8.5.post1 are incompatible.
      And because your project depends on openrlhf and vllm==0.8.5.post1, we can conclude that your project's requirements are unsatisfiable.

这是由于 OpenRLHF 依赖 ray[default]==2.48.0 与 vllm==0.8.5.post1 冲突。此外，OpenRLHF 依赖最新版的 flash-attn==2.8.3，而 2.8.0 以上的 flash-attn 与 torch==2.6.0+cu124 兼容性不佳。经过测试，虽然不影响依赖解析，但为了正常使用，至少降级为 flash-attn==2.7.4.post1。因此，配置大致如下：

[project]
name = "openrlhf-cu124-demo"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "torch==2.6.0",
    "transformers==4.56.0",
    "flash-attn==2.7.4.post1",
    "vllm==0.8.5.post1",
    "openrlhf",
]

[tool.uv.extra-build-dependencies]
flash-attn = [{ requirement = "torch", match-runtime = true }]

[tool.uv.extra-build-variables]
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }

[[tool.uv.dependency-metadata]]
name = "openrlhf"
version = "0.8.11"
requires-dist = [
    # install_requires
    "accelerate",
    "bitsandbytes",
    "datasets",
    "deepspeed==0.17.5",
    "einops",
    # degrade the flash-attn
    # "flash-attn==2.8.3",
    "flash-attn==2.7.4.post1",
    "grpcio>=1.74.0",
    "isort",
    "jsonlines",
    "loralib",
    "optimum",
    "optree>=0.13.0",
    "packaging",
    "peft",
    "pynvml>=12.0.0",
    # degrade the ray[default] for compatibility with vllm==0.8.5.post1
    # "ray[default]==2.48.0",
    "ray[default]==2.47.1",
    "tensorboard",
    "torch",
    "torchdata",
    "torchmetrics",
    "tqdm",
    # upgrade the transformers
    # "transformers==4.55.2",
    "transformers==4.56.0",
    "transformers_stream_generator",
    "wandb",
    "wheel",
    # extra_require
    "vllm==0.10.1.1; extra == 'vllm'",
    "vllm>0.10.1.1; extra == 'vllm_latest'",
    "ring_flash_attn; extra == 'ring_flash_attn'",
    "liger_kernel; extra == 'liger_kernel'",
]
provides-extras = ["vllm", "vllm_latest", "ring", "liger"]