Skip to content

Added llama3.1-70b Benchmarking recipe on A3-Mega nodes#246

Draft
krishnakanthankam-qt wants to merge 2 commits into
AI-Hypercomputer:mainfrom
krishnakanthankam-qt:main
Draft

Added llama3.1-70b Benchmarking recipe on A3-Mega nodes#246
krishnakanthankam-qt wants to merge 2 commits into
AI-Hypercomputer:mainfrom
krishnakanthankam-qt:main

Conversation

@krishnakanthankam-qt

Copy link
Copy Markdown

Description

Title

Add Llama 3.1 70B Recipe and Optimized Sequential Benchmarking

Summary

Introduces a high-performance recipe for serving and benchmarking Llama 3.1 70B on A3mega GKE node pools. This PR also delivers a significant upgrade to the TensorRT-LLM launcher to support streamlined, sequential multi-experiment benchmarking.


Key Improvements

🔄 Sequential Multi-Experiment Support

  • Simplified CLI: The launcher now accepts comma-separated lists for --isl and --osl.
  • Efficiency: Users can run a suite of tests (e.g., varying sequence lengths) in a single deployment.
  • Auto-pairing: Logic ensures sequence lengths are processed in pairs, matching inputs to outputs.
  • Note: Due to Helm parsing logic, commas in these list strings must be escaped with a backslash (\,).

🦙 Llama 3.1 70B A3mega Recipe

  • Optimized Config: Includes llama3.1-70b.yaml configured with Tensor Parallelism (TP=8) and FP8 quantization.
  • Complete GKE Infrastructure: Helm templates for optimized serving and internal load balancing.
  • Documentation: Comprehensive README.md with verified benchmarking commands.

🛡️ Reliability & GKE Stability

  • Automated NCCL Fix: Added LD_PRELOAD detection for GKE-specific NCCL symbol mismatches, ensuring H100 nodes start reliably.
  • Robust Cleanup: Implemented an EXIT trap to ensure failed benchmarks automatically clean up multi-gigabyte engine files and datasets from local SSDs.
  • Model Isolation: Downloads are isolated by model ID to prevent cache corruption on shared disks.

Verification Results

  • Validated sequential ISL/OSL processing (e.g., --isl 2048,2048 --osl 128,2048).
  • Confirmed single-value backward compatibility.

@google-cla

google-cla Bot commented Jun 5, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@depksingh depksingh marked this pull request as draft June 5, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant