Added llama3.1-70b Benchmarking recipe on A3-Mega nodes by krishnakanthankam-qt · Pull Request #246 · AI-Hypercomputer/gpu-recipes

krishnakanthankam-qt · 2026-06-05T11:08:17Z

Description

Title

Add Llama 3.1 70B Recipe and Optimized Sequential Benchmarking

Summary

Introduces a high-performance recipe for serving and benchmarking Llama 3.1 70B on A3mega GKE node pools. This PR also delivers a significant upgrade to the TensorRT-LLM launcher to support streamlined, sequential multi-experiment benchmarking.

Key Improvements

🔄 Sequential Multi-Experiment Support

Simplified CLI: The launcher now accepts comma-separated lists for --isl and --osl.
Efficiency: Users can run a suite of tests (e.g., varying sequence lengths) in a single deployment.
Auto-pairing: Logic ensures sequence lengths are processed in pairs, matching inputs to outputs.
Note: Due to Helm parsing logic, commas in these list strings must be escaped with a backslash (\,).

🦙 Llama 3.1 70B A3mega Recipe

Optimized Config: Includes llama3.1-70b.yaml configured with Tensor Parallelism (TP=8) and FP8 quantization.
Complete GKE Infrastructure: Helm templates for optimized serving and internal load balancing.
Documentation: Comprehensive README.md with verified benchmarking commands.

🛡️ Reliability & GKE Stability

Automated NCCL Fix: Added LD_PRELOAD detection for GKE-specific NCCL symbol mismatches, ensuring H100 nodes start reliably.
Robust Cleanup: Implemented an EXIT trap to ensure failed benchmarks automatically clean up multi-gigabyte engine files and datasets from local SSDs.
Model Isolation: Downloads are isolated by model ID to prevent cache corruption on shared disks.

Verification Results

Validated sequential ISL/OSL processing (e.g., --isl 2048,2048 --osl 128,2048).
Confirmed single-value backward compatibility.

google-cla · 2026-06-05T11:08:27Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

krishnakanthankam-qt added 2 commits June 5, 2026 16:07

Added new recipe for llama3.1-70b on A3-mega nodes

21ba9f6

modified trtllm-launcher.sh for backward compatibility

d4c2a9c

depksingh marked this pull request as draft June 5, 2026 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added llama3.1-70b Benchmarking recipe on A3-Mega nodes#246

Added llama3.1-70b Benchmarking recipe on A3-Mega nodes#246
krishnakanthankam-qt wants to merge 2 commits into
AI-Hypercomputer:mainfrom
krishnakanthankam-qt:main

krishnakanthankam-qt commented Jun 5, 2026

Uh oh!

google-cla Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krishnakanthankam-qt commented Jun 5, 2026

Description

Title

Summary

Key Improvements

🔄 Sequential Multi-Experiment Support

🦙 Llama 3.1 70B A3mega Recipe

🛡️ Reliability & GKE Stability

Verification Results

Uh oh!

google-cla Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant