Skip to content

feat(gpu): route device selection through driver config#1716

Draft
elezar wants to merge 5 commits into
pull-request/1156from
poc/gpu-device-driver-config-count
Draft

feat(gpu): route device selection through driver config#1716
elezar wants to merge 5 commits into
pull-request/1156from
poc/gpu-device-driver-config-count

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented Jun 3, 2026

Summary

Draft POC for routing driver-specific GPU device selection through the selected runtime driver_config while keeping portable GPU presence/count in resource_requirements.gpu. This targets the updated PR #1156 shape and keeps Kubernetes out of exact device-id support for now.

Related Issue

Related to #1156 and #1589.

Changes

  • Add per-driver gpu_device_ids handling for Docker, Podman, and VM driver config.
  • Allow --gpu-device to imply GPU intent and set the portable GPU count to match the requested device IDs.
  • Select the active driver config block in the gateway and pass it to the driver without interpreting nested fields.
  • Validate gpu_device_ids at the driver level: non-empty IDs require a non-zero GPU count, duplicates are rejected, and the number of unique IDs must equal resource_requirements.gpu.count.
  • Leave Kubernetes without exact device-id handling for this POC.
  • Update driver docs, sandbox docs, architecture notes, and focused tests.

Testing

  • mise run pre-commit passes
  • mise exec -- cargo test -p openshell-core -p openshell-driver-docker -p openshell-driver-podman -p openshell-driver-vm --lib gpu
  • mise run check
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

elezar added 4 commits June 3, 2026 12:57
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the poc/gpu-device-driver-config-count branch from 69be336 to e106f02 Compare June 3, 2026 18:09
@copy-pr-bot copy-pr-bot Bot force-pushed the pull-request/1156 branch from 009a6ee to 4c18100 Compare June 3, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant