Bugfix refill by mihai145 · Pull Request #55 · spcl/rFaaS

mihai145 · 2026-03-24T13:39:25Z

Summary by CodeRabbit

New Features
- Refill behavior now accepts an optional threshold so callers can control when receive buffers are replenished.
Bug Fixes
- Corrected request-count tracking to apply updates to the intended connection.
- Receive buffer sizing now properly scales with the configured number of cores.
- Resource-manager receive handling now updates and refills work-credits at appropriate points.
Chores
- Increased maximum work-queue send capacity for higher throughput.

coderabbitai · 2026-03-24T13:39:41Z

📝 Walkthrough

Walkthrough

Changed rdmalib::RecvWorkCompletions::refill() to accept an optional int threshold = -1 and use it when deciding to post receives; updated callers (executor, client, manager) to pass thresholds and adjust request accounting; increased RDMA QP send WR capacity from 40 to 128; ensured executor receive buffer ≥ numcores.

Changes

Cohort / File(s)	Summary
RDMA receive queue API & impl `rdmalib/include/rdmalib/queue.hpp`, `rdmalib/lib/queue.cpp`	`RecvWorkCompletions::refill()` signature changed to `bool refill(int threshold = -1)`; implementation treats `-1` as fallback to internal `_refill_threshold` and posts receives when `_requests < threshold`.
RDMA QP configuration `rdmalib/lib/rdmalib.cpp`	Increased `_cfg.attr.cap.max_send_wr` from `40` to `128` in `RDMAActive` and `RDMAPassive` constructors.
Executor: caller updates & sizing `rfaas/include/rfaas/executor.hpp`, `rfaas/lib/executor.cpp`	Initial refill now calls `receive_wcs().refill(numcores)`; per-connection `update_requests(-1)` applied correctly to each `_connections[i]`; receive buffer size initialized to at least `numcores` (`std::max(dev.default_receive_buffer_size, (int16_t)numcores) + 1`).
Manager: resource message handler signature `server/executor_manager/manager.hpp`, `server/executor_manager/manager.cpp`	`Manager::_handle_res_mgr_message` now accepts `int update_requests` (default `0`); handler calls `receive_wcs().update_requests(update_requests)` before `receive_wcs().refill()`; polling loop sites updated to pass `-1`.
Client: lease path accounting `rfaas/include/rfaas/client.hpp`	After polling RM response, calls `receive_wcs().update_requests(-1)` then `receive_wcs().refill()` to adjust receive-work accounting immediately after each poll.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(135,206,235,0.5)
    participant Client
    end
    rect rgba(144,238,144,0.5)
    participant Manager
    end
    rect rgba(255,182,193,0.5)
    participant RDMA_Lib
    end
    Client->>Manager: send request / wait for reply
    Manager->>RDMA_Lib: poll_wc() -> receives ibv_wc
    Manager->>RDMA_Lib: receive_wcs().update_requests(-1)
    Manager->>RDMA_Lib: receive_wcs().refill(threshold=-1)
    RDMA_Lib-->>Manager: posts receives as needed
    Manager-->>Client: propagate response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I tweak a threshold, hop and hum,
Buffers grow where small ones were glum.
More send windows open wide and bright,
Counts adjusted, queues sleep light.
A rabbit's nibble makes the network run right.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Bugfix refill' is vague and generic—it indicates a bug fix to something called 'refill' but provides minimal context about the scope or significance of the changes.	Consider using a more descriptive title that clarifies the specific issue being fixed, e.g., 'Fix receive work-queue refill accounting across RDMA connections' or 'Fix refill threshold handling and receive buffer sizing'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bugfix-refill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rdmalib/lib/queue.cpp`:
- Around line 145-154: The replenish logic in RecvWorkCompletions::replenish
resets _requests to _rcv_buf_size even if post_batched_empty_recv(...) fails,
corrupting queue accounting; modify replenish to check the return/result of
post_batched_empty_recv(qty) and only set _requests = _rcv_buf_size when that
call succeeds, otherwise leave _requests unchanged (and propagate/log the
failure/return false) so receive-queue depletion is not hidden; locate this
behavior in RecvWorkCompletions::replenish and the helper
post_batched_empty_recv to add the success check and appropriate error handling.

In `@rfaas/lib/executor.cpp`:
- Line 41: The initializer that constructs _state uses a narrowing cast of
numcores to int16_t which can overflow; remove that cast and ensure the std::max
comparison is done in a sufficiently wide integer type (e.g. promote
dev.default_receive_buffer_size to int or use std::max<int>) so you compare two
ints and then add 1, then pass the result (converted if the _state constructor
expects a narrower type) — update the expression in the _state(...) initializer
in executor.cpp to perform the max in an int-sized type instead of casting
numcores to int16_t.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d393705-3303-4e88-8d40-41614520face

📥 Commits

Reviewing files that changed from the base of the PR and between 4d57ce0 and ecf64e0.

📒 Files selected for processing (7)

rdmalib/include/rdmalib/queue.hpp
rdmalib/lib/queue.cpp
rdmalib/lib/rdmalib.cpp
rfaas/include/rfaas/executor.hpp
rfaas/lib/executor.cpp
server/executor_manager/manager.cpp
server/executor_manager/manager.hpp

coderabbitai · 2026-03-24T13:45:01Z

+  bool RecvWorkCompletions::replenish()
+  {
+    if (_requests < _rcv_buf_size) {
+      SPDLOG_DEBUG("Post {} requests to buffer at QP {}", _rcv_buf_size - _requests, fmt::ptr(this->qp()));
+      this->post_batched_empty_recv(_rcv_buf_size - _requests);
+      _requests = _rcv_buf_size;
+      return true;
+    }
+    return false;
+  }


⚠️ Potential issue | 🔴 Critical

Don’t mark requests as replenished when posting fails.

post_batched_empty_recv(...) can fail, but _requests is still reset to full. That corrupts queue accounting and can hide receive-queue depletion.

🔧 Proposed fix

bool RecvWorkCompletions::replenish() { - if (_requests < _rcv_buf_size) { - SPDLOG_DEBUG("Post {} requests to buffer at QP {}", _rcv_buf_size - _requests, fmt::ptr(this->qp())); - this->post_batched_empty_recv(_rcv_buf_size - _requests); - _requests = _rcv_buf_size; - return true; - } - return false; + if(_requests < 0) { + _requests = 0; + } + + if(_requests < _rcv_buf_size) { + const int missing = _rcv_buf_size - _requests; + SPDLOG_DEBUG("Post {} requests to buffer at QP {}", missing, fmt::ptr(this->qp())); + if(this->post_batched_empty_recv(missing) < 0) { + return false; + } + _requests = _rcv_buf_size; + return true; + } + return false; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bool RecvWorkCompletions::replenish()

{

if (_requests < _rcv_buf_size) {

SPDLOG_DEBUG("Post {} requests to buffer at QP {}", _rcv_buf_size - _requests, fmt::ptr(this->qp()));

this->post_batched_empty_recv(_rcv_buf_size - _requests);

_requests = _rcv_buf_size;

return true;

}

return false;

}

bool RecvWorkCompletions::replenish()

{

if(_requests < 0) {

_requests = 0;

}

if(_requests < _rcv_buf_size) {

const int missing = _rcv_buf_size - _requests;

SPDLOG_DEBUG("Post {} requests to buffer at QP {}", missing, fmt::ptr(this->qp()));

if(this->post_batched_empty_recv(missing) < 0) {

return false;

}

_requests = _rcv_buf_size;

return true;

}

return false;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@rdmalib/lib/queue.cpp` around lines 145 - 154, The replenish logic in RecvWorkCompletions::replenish resets _requests to _rcv_buf_size even if post_batched_empty_recv(...) fails, corrupting queue accounting; modify replenish to check the return/result of post_batched_empty_recv(qty) and only set _requests = _rcv_buf_size when that call succeeds, otherwise leave _requests unchanged (and propagate/log the failure/return false) so receive-queue depletion is not hidden; locate this behavior in RecvWorkCompletions::replenish and the helper post_batched_empty_recv to add the success check and appropriate error handling.

coderabbitai · 2026-03-24T13:45:01Z


  executor::executor(const std::string& address, int port, int numcores, int memory, int lease_id, device_data & dev):
-    _state(dev.ip_address, dev.port, dev.default_receive_buffer_size + 1),
+    _state(dev.ip_address, dev.port, std::max(dev.default_receive_buffer_size, (int16_t)numcores) + 1),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect device_data definition and default_receive_buffer_size type. rg -n "struct device_data|class device_data|default_receive_buffer_size" -C3 --type=cpp --type=hpp # 2) Inspect executor constructor signature and usage of numcores. rg -n "executor\\(const std::string& address, int port, int numcores" -C2 --type=cpp --type=hpp rg -n "std::max\\(.*numcores|\\(int16_t\\)numcores|_numcores" -C2 --type=cpp --type=hpp # 3) Check for any explicit input validation/range checks for numcores. rg -nP "\\bnumcores\\b.*(<=|>=|<|>|assert|expect_|throw)" -C2 --type=cpp --type=hpp

Repository: spcl/rFaaS

Length of output: 81

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect device_data definition and default_receive_buffer_size type. echo "=== Searching for device_data definition ===" rg -n "struct device_data|class device_data|default_receive_buffer_size" -C3 # 2) Inspect executor constructor signature and usage of numcores. echo "=== Searching for executor constructor ===" rg -n "executor\(const std::string& address, int port, int numcores" -C2 echo "=== Searching for numcores usage with std::max or int16_t cast ===" rg -n "std::max.*numcores|\(int16_t\)numcores|_numcores" -C2 # 3) Check for any explicit input validation/range checks for numcores. echo "=== Searching for numcores range checks ===" rg -n "numcores.*(<=|>=|<|>|assert)" -C2

Repository: spcl/rFaaS

Length of output: 15755

Remove explicit narrowing cast from buffer-size calculation.

Casting numcores (int) to int16_t causes integer overflow if numcores exceeds 32,767, producing an undersized receive buffer.

💡 Proposed fix

- _state(dev.ip_address, dev.port, std::max(dev.default_receive_buffer_size, (int16_t)numcores) + 1), + _state(dev.ip_address, dev.port, std::max<int>(dev.default_receive_buffer_size, numcores) + 1),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_state(dev.ip_address, dev.port, std::max(dev.default_receive_buffer_size, (int16_t)numcores) + 1),

_state(dev.ip_address, dev.port, std::max<int>(dev.default_receive_buffer_size, numcores) + 1),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@rfaas/lib/executor.cpp` at line 41, The initializer that constructs _state uses a narrowing cast of numcores to int16_t which can overflow; remove that cast and ensure the std::max comparison is done in a sufficiently wide integer type (e.g. promote dev.default_receive_buffer_size to int or use std::max<int>) so you compare two ints and then add 1, then pass the result (converted if the _state constructor expects a narrower type) — update the expression in the _state(...) initializer in executor.cpp to perform the max in an int-sized type instead of casting numcores to int16_t.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

rfaas/include/rfaas/client.hpp (1)
45-51: ⚠️ Potential issue | 🔴 Critical

Incorrect decrement value causes _requests tracking inconsistency.

poll_wc() can return multiple completions (up to 32), as evidenced by the warning check on line 49. However, update_requests(-1) always decrements by 1, not by the actual response_count. This under-decrements _requests when N > 1 completions are returned, causing the refill logic to malfunction over time.

The correct call should use -response_count:
🐛 Proposed fix
       auto [responses, response_count] = _resource_mgr.connection().poll_wc(rdmalib::QueueType::RECV, true);
-      _resource_mgr.connection().receive_wcs().update_requests(-1);
+      _resource_mgr.connection().receive_wcs().update_requests(-response_count);
       _resource_mgr.connection().receive_wcs().refill();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rfaas/include/rfaas/client.hpp` around lines 45 - 51, The decrement of the
pending request counter is wrong: when poll_wc returns multiple completions you
must decrement by the actual response_count instead of 1; update the call to
receive_wcs().update_requests(...) to pass -response_count (using the
response_count variable returned from poll_wc) so the _requests tracking stays
consistent and refill() behaves correctly; locate this in the block that calls
_resource_mgr.connection().poll_wc(...),
_resource_mgr.connection().receive_wcs().update_requests(...), and
_resource_mgr.connection().receive_wcs().refill().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@rfaas/include/rfaas/client.hpp`:
- Around line 45-51: The decrement of the pending request counter is wrong: when
poll_wc returns multiple completions you must decrement by the actual
response_count instead of 1; update the call to
receive_wcs().update_requests(...) to pass -response_count (using the
response_count variable returned from poll_wc) so the _requests tracking stays
consistent and refill() behaves correctly; locate this in the block that calls
_resource_mgr.connection().poll_wc(...),
_resource_mgr.connection().receive_wcs().update_requests(...), and
_resource_mgr.connection().receive_wcs().refill().

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 03633af2-75ae-445e-b9fa-97a0f7c4a45f

📥 Commits

Reviewing files that changed from the base of the PR and between 04527d5 and 5982753.

📒 Files selected for processing (1)

rfaas/include/rfaas/client.hpp

mihai145 added 3 commits March 24, 2026 10:39

correctly account for recv wrs

6f364bf

allow allocations with numcores > default_recv_buf_size

8a7b7a8

correctly refill exec mgr to res mgr recv wc

ecf64e0

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

mihai145 added 2 commits March 24, 2026 23:07

reuse refill function instead of replenish

04527d5

refill client to RM connection

5982753

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix refill#55

Bugfix refill#55
mihai145 wants to merge 5 commits into
masterfrom
bugfix-refill

mihai145 commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 24, 2026

Uh oh!

coderabbitai Bot Mar 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	_state(dev.ip_address, dev.port, std::max(dev.default_receive_buffer_size, (int16_t)numcores) + 1),
	_state(dev.ip_address, dev.port, std::max<int>(dev.default_receive_buffer_size, numcores) + 1),

Uh oh!

Conversation

mihai145 commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mihai145 commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading