Skip to content

GPU users: onnxruntime (CPU) overwrites onnxruntime-gpu binaries when both are installed by pip/uv #608

@michelkluger

Description

@michelkluger

Summary

When fastembed is used in a GPU environment alongside onnxruntime-gpu, the CUDAExecutionProvider silently disappears at runtime because pip/uv end up installing both onnxruntime (CPU) and onnxruntime-gpu simultaneously. Since both packages install into the same site-packages/onnxruntime/ directory, whichever is installed last wins — and in practice the CPU build's onnxruntime_pybind11_state.so overwrites the GPU build's, stripping CUDAExecutionProvider from the available providers list.

Root cause

fastembed declares a hard dependency on onnxruntime > 1.20.0 (by name). Until onnxruntime-gpu ~1.19.x, the GPU wheel declared Provides-Dist: onnxruntime in its metadata, which instructed pip/uv that onnxruntime-gpu satisfies any onnxruntime requirement. This metadata is absent from onnxruntime-gpu >= 1.20.0 (confirmed in 1.24.2). As a result:

  1. uv resolves onnxruntime > 1.20.0 → installs onnxruntime==1.24.2 (CPU build)
  2. User also has onnxruntime-gpu==1.24.2 in their project requirements
  3. Both install to site-packages/onnxruntime/; the CPU onnxruntime_pybind11_state.so overwrites the GPU one
  4. ort.get_available_providers() returns ['CPUExecutionProvider'] instead of ['CUDAExecutionProvider', 'CPUExecutionProvider']

The failure mode is silent — no import error, no warning, just no GPU acceleration.

Minimal repro (Docker)

FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
# Project that depends on fastembed + onnxruntime-gpu
RUN uv pip install fastembed onnxruntime-gpu
RUN python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Output: ['CPUExecutionProvider']  <-- CUDAExecutionProvider is gone

The libonnxruntime_providers_cuda.so is present and all its .so dependencies resolve correctly, but it links against Provider_GetHost from libonnxruntime_providers_shared.so — which is only exported by the GPU pybind11 build, not the CPU one. So dlopen of the CUDA provider fails silently.

Workaround

Reinstall onnxruntime-gpu after uv sync to restore the GPU binaries:

RUN uv sync --frozen --no-dev
RUN uv pip install --python .venv/bin/python --reinstall "onnxruntime-gpu[cuda,cudnn]==1.24.2"

This is fragile (order-dependent, easy to get wrong) and can't be expressed cleanly in pyproject.toml.

Suggested fixes

Option A — fastembed side (preferred): Add a gpu extra that replaces the onnxruntime dep with onnxruntime-gpu:

[project.optional-dependencies]
gpu = ["onnxruntime-gpu"]

[project.dependencies]
# Remove direct onnxruntime pin or make it conditional

And guard the import with try/except so either package works. This lets GPU users do pip install "fastembed[gpu]" and get a coherent environment.

Option B — onnxruntime side: Restore Provides-Dist: onnxruntime in onnxruntime-gpu's wheel metadata so package managers treat them as interchangeable. This was present in onnxruntime-gpu <= 1.19.x. A related issue is tracked at microsoft/onnxruntime#22107.

Environment

  • fastembed==0.7.4, onnxruntime==1.24.2, onnxruntime-gpu==1.24.2
  • Python 3.13, uv 0.6.x
  • Docker image: ghcr.io/astral-sh/uv:python3.13-bookworm-slim
  • Host: NVIDIA RTX 4090 + RTX 5090, driver 570.x, nvidia-container-toolkit 1.17.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions