Background

在配置 flash-attn 环境时,经常会遇到安装报错,由于 flash-attn 需要在本地进行编译构建。此外,根据 vLLM GPU Installation,vLLM 团队也在积极使用 uv 这一包管理工具。本文借助最新版本的 uv,给出 linux 下的 flash-attn, PyTorch 等安装的终极解决方案。本文的测试环境为 Ubuntu 22.04.5 WSL2,cuda 版本如下:

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83 Driver Version: 572.83 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler_0

Installation

根据官方文档 Installation | uv,存在多种 uv 安装方式,例如 pip, homebrew, winget 等。这里我们给出 linux 下的独立安装方式:

curl -LsSf https://astral.sh/uv/install.sh | sh
# or
# wget -qO- https://astral.sh/uv/install.sh | sh

或者使用 GitHub URL 进行下载:

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/astral-sh/uv/releases/download/0.8.15/uv-installer.sh | sh

此外,可以为 GitHub release 添加 https://hk.gh-proxy.com/ 等加速前缀,如下:

curl --proto '=https' --tlsv1.2 -LsSf https://hk.gh-proxy.com/https://github.com/astral-sh/uv/releases/download/0.8.15/uv-installer.sh | sh

根据官方文档 Environment variables | uv,可以设置环境变量 UV_INSTALL_DIR 来指定 uv 安装路径,默认值为 ~/.local/bin

接着,检查是否安装成功:

uv -V
# uv 0.8.15

通过该方式下载的 uv,我们可以通过如下命令进行更新:

uv self update

更多安装方式见 Installer options | uv

Python Versions

我们可以使用 project.requires-python 配置项目支持的 Python 版本:

[project]
requires-python = ">=3.11"

此外,uv 支持配置 Python 使用的优先级。换句话说,我们的机器上可能存在 conda 安装的 Python,系统自身安装的 Python,也可能是 uv 安装的 Python。这里我们可以配置优先使用 uv 安装的 Python:

[tool.uv]
python-preference = "managed"

实际上,managed 也是默认配置。然而,managed Python 安装包默认位于 astral-sh/python-build-standalone。我们可以为 GitHub release 配置加速:

[tool.uv]
python-install-mirror = "https://github.com/astral-sh/python-build-standalone/releases/download"

或者配置 tool.uv.python-preference="system",提高 system Python 的使用优先级。

Development Dependencies

根据 Managing dependencies | uv,我们可以定义一些开发依赖,这些依赖仅限于本地,不会随 requirements 被发布到 pypi 等索引中。例如,我们可以定义三个分组 dev, lintformat:

[dependency-groups]
dev = [{ include-group = "lint" }, { include-group = "format" }]
lint = ["ruff"]
format = ["black", "isort"]

其中,uv sync 自动同步 dev 分组。可以使用 --no-dev 来取消对 dev 分组的同步。

此外,uv 支持对这些开发依赖的配置,一个典型的配置如下:

[tool.black]
line-length = 88

[tool.isort]
profile = "black"
line_length = 88
py_version = 311
extra_standard_library = ["typing_extensions"]

[tool.ruff]
line-length = 88

[tool.ruff.format]
docstring-code-format = true
quote-style = "double"

Platform Constraint

根据 Configuring projects | uv,为了便于依赖的解析,我们可以对平台进行限制,相关配置如下:

  • required-environments 表示项目必须支持的环境。这里配置为项目必须支持 x86_64 linux
  • environments 表示仅需支持的环境。这里配置为项目仅需支持 linux

实际上,由于深度学习环境大部分为 linux x86 平台,故便于简化依赖解析,可以配置如下:

[tool.uv]
environments = ["sys_platform == 'linux' and platform_machine == 'x86_64'"]

Examples

本节会依次介绍 LLM 训练推理,尤其是 RLHF 相关库的安装配置,以具体案例展开对 uv 核心功能的讲解。首先,分析各个库安装时的痛点需求,以及 uv 对应的解决方案:

Packages Requirements Solutions
PyTorch 区分 cpu, cuda 等版本,分别指定 index url [tool.uv.sources], [[tool.uv.index]]
Flash Attention 本地编译构建 [tool.uv.extra-build-dependencies], [tool.uv.extra-build-variables]
VeRL 以可编辑模式安装 [tool.uv.sources]
OpenRLHF 官方依赖设置不合理,可能需要手动修改 [[tool.uv.dependency-metadata]]

以下完整示例见 GitHub Magnicord/llm-env-templates

PyTorch

在安装 flash-attn 前,我们先分析 torch 的安装配置。torch 常见的版本有 cuda 版与 cpu 版,当我们直接使用 pip install torch 时,会从 pypi 进行索引,默认下载的是 cpu 版。PyTorch 官方提供了自己的 index url:

这里 cuda 版本的 index url 后缀是版本名,可以根据自身机器 cuda 版本进行修改,例如:

pytorch/RELEASE.md at main · pytorch/pytorch 给出了 torch 与 cuda 的版本兼容表格。

那么 uv 中如何配置 index 呢?我们首先利用 uv 初始化一个项目 torch-demo

uv init torch-demo
# Initialized project `torch-demo` at `/path/to/torch-demo`

uv 会在项目目录创建如下文件:

├── .gitignore
├── .python-version
├── README.md
├── main.py
└── pyproject.toml

更多 init 方式见 Creating projects | uv

我们的配置均在 pyproject.toml 中进行,默认文件内容如下:

[project]
name = "torch-demo"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = []

这里的 project.dependencies 类似 requirements.txt,所有必选依赖都在该列表中指定。

Using uv with PyTorch | uv 中详细说明了 uv 安装 PyTorch 的配置。首先,声明 torch 的 index url:

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cpu"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu128"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu126"
url = "https://download.pytorch.org/whl/cu126"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu126"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu124"
url = "https://download.pytorch.org/whl/cu124"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu124"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu118"
url = "https://download.pytorch.org/whl/cu118"
# url = "https://mirrors.nju.edu.cn/pytorch/whl/cu118"
explicit = true

其中,explicit = true 表示向 uv 声明存在该 index url,但只有明确指定时才会使用。因此,这里可以声明很多 index url。这里仅保留 PyTorch 官方源,注释了 南京大学镜像源,原因在于该镜像源实测速度并不是太快,且有些 wheel 并没有同步。另一个常见的 PyTorch 镜像源为 阿里云镜像源。当我们配置如下:

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://mirrors.aliyun.com/pytorch-wheels/cu128"
explicit = true

输入命令 uv sync 后,会报错如下:

× No solution found when resolving dependencies for split (markers: platform_machine == 'x86_64' and
│ sys_platform == 'linux'):
╰─▶ Because there is no version of torch{sys_platform == 'linux'}==2.8.0 and your project depends on torch{sys_platform == 'linux'}==2.8.0, we can conclude that your project's requirements are unsatisfiable.

这是因为阿里源并不是 PyPI-style 格式,详见 Package indexes | uv。实际上,当使用常规 pip install 时,也需使用 -f (--find-links) 来指定阿里源,而非 --extra-index-url-i (--index-url)。为了能够使用阿里源,我们需要配置 format = "flat",如下:

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://mirrors.aliyun.com/pytorch-wheels/cu128"
explicit = true
format = "flat"

接着,我们可以选择指定 清华源 作为 pypi 默认镜像源:

[[tool.uv.index]]
url = "https://pypi.tuna.tsinghua.edu.cn/simple"
default = true

然后,为 torch 等库配置刚才设定的 index url:

[tool.uv.sources]
torch = [
{ index = "pytorch-cpu", marker = "sys_platform != 'linux'" },
{ index = "pytorch-cu128", marker = "sys_platform == 'linux'" },
]
torchvision = [
{ index = "pytorch-cpu", marker = "sys_platform != 'linux'" },
{ index = "pytorch-cu128", marker = "sys_platform == 'linux'" },
]
torchaudio = [
{ index = "pytorch-cpu", marker = "sys_platform != 'linux'" },
{ index = "pytorch-cu128", marker = "sys_platform == 'linux'" },
]

其中,当系统为 linux 时,采用 pytorch-cu128 index,否则使用 pytorch-cpu。这样,当我们在 linux 系统上安装环境时,就会默认安装 cuda 版本的 PyTorch。注意,这里只是指定 torch, torchvision, torchaudio 的 index,并非指定我们项目需要依赖这三个包。依赖需要在 project.dependencies 中指定:

[project]
dependencies = ["torch==2.8.0"]

最后,同步环境:

uv sync

此时,会生成 uv.lock 的锁文件,该文件记录了各个库之间的依赖关系,索引以及版本等详细信息。如果该锁文件存在,当我们运行 uv sync, uv lock 等命令时,都会倾向于使用锁文件中记录的版本号。

Flash Attention

配置完 torch 后,我们开始配置 flash-attn。由于 uv 0.8.4 的版本更新,flash-attn 的配置变得极为简单,相关 issue 如下:

首先,直接给出配置如下:

[tool.uv.extra-build-dependencies]
flash-attn = [{ requirement = "torch", match-runtime = true }]

[tool.uv.extra-build-variables]
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }

然后,将 flash-attn 添加至 project.dependencies 中:

[project]
dependencies = ["torch==2.8.0", "flash-attn==2.8.3"]

最后,同步环境:

uv sync

VeRL

由于 RLHF 场景下,存在修改框架的需求,verl Installation 给出标准的 pip 可编辑模式安装命令如下:

git clone https://github.com/volcengine/verl.git
cd verl
pip install --no-deps -e .

为了简化场景,假设 verl 位于项目根目录:

├── verl
├── .gitignore
├── .python-version
├── README.md
└── pyproject.toml

同时,假设已经按照前文配置好了 torch 与 flash-attn 的安装。

Managing dependencies | uv 给出了可编辑模式安装的命令:

uv add --editable ./verl

该命令会默认采用 Workspace 格式进行安装,具体表现为在 pyproject.toml 中添加配置如下:

[project]
dependencies = [
# ...
"verl",
# ...
]

[tool.uv.sources]
verl = { workspace = true }

[tool.uv.workspace]
members = ["verl"]

运行该命令后,报错如下:

× No solution found when resolving dependencies for split (markers: sys_platform == 'linux'):
╰─▶ Because verl[sglang] depends on torch{sys_platform == 'linux'}==2.7.1 and verl-demo depends
on torch{sys_platform == 'linux'}==2.8.0, we can conclude that verl-demo and verl[sglang]
are incompatible.
And because your workspace requires verl[sglang] and verl-demo, we can conclude that your
workspace's requirements are unsatisfiable.

因为 uv 在 workspace 模式下,会解析 verl 所有 extra dependency 的依赖,上述报错为 sglang 这一 extra dependency。通过查看 verl/setup.py 相关源码发现,verl[sglang] 依赖于 torch==2.7.1,与上述报错一致。

为了解决这一问题,我们当然可以修改 project.dependencies,将 torch 从 2.8.0 降级至 2.7.1。然而,这种方案并不完美,原因如下:

  1. 假设项目仅将 vllm 作为推理引擎,我们并不需要考虑 sglang 的依赖问题
  2. 尽管解决了 verl[sglang] 依赖的冲突,如果还有其他 extra 冲突且无法解决时,应该怎么办

因此,我们选择手动指定 editable 安装配置:

[project]
dependencies = [
# ...
"verl",
# ...
]

[tool.uv.sources]
verl = { path = "verl", editable = true }

最后,同步环境:

uv sync

注意,当前最新版 vllm==0.10.1.1 尚未支持最新版 torch==2.8.0,因此如果需要安装 vllm,则降低版本至 torch==2.7.1

OpenRLHF

OpenRLHF 与 verl 类似,一般通过 editable 模式安装,配置如下:

[project]
dependencies = [
# ...
"openrlhf",
# ...
]

[tool.uv.sources]
openrlhf = { path = "OpenRLHF", editable = true }

观察 OpenRLHF 的 核心依赖 中的 transformers==4.55.2,而当前 transformers 已经更新至 4.56.0。假如我们想依赖最新版本 transformers 添加的新模型,但同时声明 transformers==4.56.0openrlhf 依赖则会发生冲突。如果我们确信 transformers==4.56.0 与 OpenRLHF 兼容时,则可以手动修改 OpenRLHF 的依赖,大致步骤如下:

  1. 找到需要修改依赖的项目的 setup.py(通常为 setup.py,也有可能是 pyproject.toml 等)
  2. setup.py 中找到 setup() 方法传入的 name, version, install_requiresextras_require 参数
  3. 在本项目 pyproject.toml 中分别转换为 [[tool.uv.dependency-metadata]] 属性中的 name, version, requires-distprovides-extras
  4. 修改相关依赖

转换后如下:

[[tool.uv.dependency-metadata]]
name = "openrlhf"
version = "0.8.11"
requires-dist = [
# install_requires
"accelerate",
"bitsandbytes",
"datasets",
"deepspeed==0.17.5",
"einops",
"flash-attn==2.8.3",
"grpcio>=1.74.0",
"isort",
"jsonlines",
"loralib",
"optimum",
"optree>=0.13.0",
"packaging",
"peft",
"pynvml>=12.0.0",
"ray[default]==2.48.0",
"tensorboard",
"torch",
"torchdata",
"torchmetrics",
"tqdm",
"transformers==4.56.0", # upgrade from 4.55.2
"transformers_stream_generator",
"wandb",
"wheel",
# extra_require
"vllm==0.10.1.1; extra == 'vllm'",
"vllm>0.10.1.1; extra == 'vllm_latest'",
"ring_flash_attn; extra == 'ring_flash_attn'",
"liger_kernel; extra == 'liger_kernel'",
]
provides-extras = ["vllm", "vllm_latest", "ring", "liger"]

以上问题是由于对依赖版本更新不及时导致,另一个问题可能是强制将依赖版本设置为最新。假设服务器的硬件环境最高仅支持 cuda 12.4,则必须至少降级 vllm<=0.8.5.post1,进而 torch<=2.6.0。一个简易的依赖配置如下:

[project]
name = "openrlhf-cu124-demo"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"torch==2.6.0",
"transformers==4.56.0",
"vllm==0.8.5.post1",
"openrlhf",
]

如果此时运行 uv sync,则至少会报错如下:

× No solution found when resolving dependencies for split (markers: platform_machine == 'x86_64' and sys_platform ==
│ 'linux'):
╰─▶ Because only openrlhf==0.8.11 is available and openrlhf==0.8.11 depends on ray[default]==2.48.0, we can conclude that all versions of openrlhf depend on ray[default]==2.48.0.
And because ray[default]==2.48.0 depends on opentelemetry-sdk>=1.30.0 and vllm==0.8.5.post1 depends on opentelemetry-sdk>=1.26.0,<1.27.0, we can conclude that all versions of openrlhf and vllm==0.8.5.post1 are incompatible.
And because your project depends on openrlhf and vllm==0.8.5.post1, we can conclude that your project's requirements are unsatisfiable.

这是由于 OpenRLHF 依赖 ray[default]==2.48.0vllm==0.8.5.post1 冲突。此外,OpenRLHF 依赖最新版的 flash-attn==2.8.3,而 2.8.0 以上的 flash-attn 与 torch==2.6.0+cu124 兼容性不佳。经过测试,虽然不影响疑难解析,但为了正常使用,至少降级为 flash-attn==2.7.4.post1。因此,配置大致如下:

[project]
name = "openrlhf-cu124-demo"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"torch==2.6.0",
"transformers==4.56.0",
"flash-attn==2.7.4.post1",
"vllm==0.8.5.post1",
"openrlhf",
]

[tool.uv.extra-build-dependencies]
flash-attn = [{ requirement = "torch", match-runtime = true }]

[tool.uv.extra-build-variables]
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }

[[tool.uv.dependency-metadata]]
name = "openrlhf"
version = "0.8.11"
requires-dist = [
# install_requires
"accelerate",
"bitsandbytes",
"datasets",
"deepspeed==0.17.5",
"einops",
# degrade the flash-attn
# "flash-attn==2.8.3",
"flash-attn==2.7.4.post1",
"grpcio>=1.74.0",
"isort",
"jsonlines",
"loralib",
"optimum",
"optree>=0.13.0",
"packaging",
"peft",
"pynvml>=12.0.0",
# degrade the ray[default] for compatibility with vllm==0.8.5.post1
# "ray[default]==2.48.0",
"ray[default]==2.47.1",
"tensorboard",
"torch",
"torchdata",
"torchmetrics",
"tqdm",
"transformers==4.56",
"transformers_stream_generator",
"wandb",
"wheel",
# extra_require
"vllm==0.10.1.1; extra == 'vllm'",
"vllm>0.10.1.1; extra == 'vllm_latest'",
"ring_flash_attn; extra == 'ring_flash_attn'",
"liger_kernel; extra == 'liger_kernel'",
]
provides-extras = ["vllm", "vllm_latest", "ring", "liger"]