perf: parallel multipart UploadPart

## Problem

\`Client::PutObject\` uploads parts strictly sequentially in [src/client.cc:497-636](https://github.com/minio/minio-cpp/blob/main/src/client.cc#L497-L636). A 10 GiB object with 5 MiB parts means ~2000 sequential HTTP round-trips on a single TCP stream — single-stream bandwidth is the cap, even though the path is now backed by a shared CURL handle pool (PR #215).

## Why now

PR #215 made the HTTP layer concurrency-safe (one-time \`curl_global_init\`, \`CURLSH\` with per-slot mutex, \`region_map_\` shared_mutex). The transport is ready to be exercised from multiple threads; the upload pipeline isn't.

## Design tradeoffs (need input)

The current loop has a single shared aligned buffer (alloc'd once at \`client.cc:914\`), reads sequentially from \`args.stream\` via \`utils::ReadPart\`, registers ONE buffer for the whole multipart RDMA upload, and has a one-byte read-ahead to detect the last part. Parallelizing means:

- **N buffers** (page-aligned, each individually cuObj-registered for RDMA)
- **Producer/consumer split** — one thread drains the stream into the next free buffer; N threads post UploadParts
- **Inflight cap** as new public API (e.g., \`PutObjectArgs::max_inflight_parts\`)
- **Memory pressure** scales linearly with parallelism — large \`part_size\` × N can blow up RSS

## Suggested approach

1. Add \`PutObjectArgs::max_inflight_parts\` (default 1 = current behavior)
2. Refactor the loop into a small producer (single thread reading the stream) + bounded executor (N consumers posting UploadParts via \`UploadPart\`)
3. For RDMA: register N buffers up front via N \`ScopedRDMARegistration\` slots
4. Preserve part ordering via part numbers (CompleteMultipartUpload doesn't care about completion order)

## Impact

**Large.** Cited as the single biggest throughput unlock in the original audit alongside the now-landed handle pool. For large objects on fast networks, expect Nx improvement up to network/server limit.

## Roadmap

T1.2 from the Tier 1 modernization audit; was paused because of the API/memory design questions above. Related: PR #215.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallel multipart UploadPart #216

Problem

Why now

Design tradeoffs (need input)

Suggested approach

Impact

Roadmap

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf: parallel multipart UploadPart #216

Description

Problem

Why now

Design tradeoffs (need input)

Suggested approach

Impact

Roadmap

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions