Skip to content

Fix/kernel termination return#297

Open
guosran wants to merge 1 commit into
tancheng:masterfrom
guosran:fix/kernel-termination-return
Open

Fix/kernel termination return#297
guosran wants to merge 1 commit into
tancheng:masterfrom
guosran:fix/kernel-termination-return

Conversation

@guosran

@guosran guosran commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Addresses #292 with a minimized dynamic-control termination fix. This PR only updates CtrlMemDynamicRTL.py and its focused test.

Changes

  • Mark the final real dynamic control word with is_last_ctrl.
  • Avoid relying on a synthetic timeout COMPLETE packet after the final step.
  • Add a concrete example as an inline RTL comment.

No new ports or interfaces are introduced.

@tancheng tancheng requested review from tancheng and yyan7223 June 22, 2026 06:43
Comment thread fu/single/CompRTL.py Outdated
@tancheng tancheng requested a review from ShangkunLi June 22, 2026 06:52
@guosran guosran force-pushed the fix/kernel-termination-return branch from 7726455 to 83e1c87 Compare June 23, 2026 20:41
@guosran guosran force-pushed the fix/kernel-termination-return branch from 83e1c87 to 6358200 Compare June 25, 2026 14:43
@guosran guosran changed the base branch from kernel-submit to master June 25, 2026 14:52
@guosran guosran marked this pull request as draft June 25, 2026 14:53
@guosran guosran force-pushed the fix/kernel-termination-return branch from 6358200 to 5904e1a Compare June 25, 2026 19:52
@guosran guosran marked this pull request as ready for review June 26, 2026 00:32
@guosran

guosran commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

Concrete example:
Completion should come from the explicit RET path, not from a control-memory timeout.

ctrl steps:
ADD
MUL
RET value=42

Before:
ctrl memory could send COMPLETE when the step counter reached the end.

After:
CtrlMemDynamicRTL waits for the RET response from the element path and forwards that result to the controller.

(s.recv_pkt_from_controller_queue.send.msg.payload.cmd == CMD_CONFIG_GEP_STRIDE) | \
(s.recv_pkt_from_controller_queue.send.msg.payload.cmd == CMD_UPDATE_COUNTER_SHADOW_VALUE) | \
(s.recv_pkt_from_controller_queue.send.msg.payload.cmd == CMD_RESET_LEAF_COUNTER):
(s.recv_pkt_from_controller_queue.send.msg.payload.cmd == CMD_RESET_LEAF_COUNTER) | \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes are unrelated to this PR, right?

s.send_pkt_to_controller.msg @= \
IntraCgraPktType(zext(s.tile_id, IntraPktTileIdType), num_tiles, 0, 0, 0, 0, 0, 0, 0, 0,
IntraCgraPktType(s.tile_id, num_tiles, 0, 0, 0, 0, 0, 0, 0, 0,
s.recv_from_element_queue.send.msg)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure the result of OPT_RET is sent back to CPU? recv_from_element_queue may contain the results of any operations.

# Sets each FU's op code as NAH when prologue execution has not completed.
# As FU is supposed to do nothing during prologue.
if s.prologue_count_outport_fu != 0:
s.send_ctrl.msg.operation @= OPT_NAH

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is removed?

# FlexibleFuRTL so the real control word remains visible to the
# crossbars and debug signals.
if ~s.send_ctrl.val:
s.send_ctrl.msg.operation @= OPT_START

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why OPT_START?

s.send_ctrl.msg.is_last_ctrl @= \
s.reg_file.rdata[0].is_last_ctrl | \
((s.total_ctrl_steps_val > 0) &
(s.times == s.total_ctrl_steps_val - TimeType(1)))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need ((s.total_ctrl_steps_val > 0) & (s.times == s.total_ctrl_steps_val - TimeType(1))) ?


@update
def update_send_ctrl_val():
def update_send_ctrl():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@yyan7223 yyan7223 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using AI to help you with the PR?

@tancheng

Copy link
Copy Markdown
Owner

Are you using AI to help you with the PR?

Of coarse. And using AI is totally fine. And we should encourage that. @yyan7223 do you have any concern~?

But for this PR, AI might not aware of our original motivation, we indeed provided the total_steps to make the kernel be able to return COMPLETE for better debugging or for the systolic array case (where there is no RETURN instruction mapped).

@yyan7223

Copy link
Copy Markdown
Collaborator

Concrete example: Completion should come from the explicit RET path, not from a control-memory timeout.

ctrl steps: ADD MUL RET value=42

Before: ctrl memory could send COMPLETE when the step counter reached the end.

After: CtrlMemDynamicRTL waits for the RET response from the element path and forwards that result to the controller.

I agree with your proposal.
For implementation, I think we should simultaneously check whether the opcode of current send_ctrl is OPT_RET and the predicate of its execution result is 1.

@yyan7223

Copy link
Copy Markdown
Collaborator

Are you using AI to help you with the PR?

Of coarse. And using AI is totally fine. And we should encourage that. @yyan7223 do you have any concern~?

But for this PR, AI might not aware of our original motivation, we indeed provided the total_steps to make the kernel be able to return COMPLETE for better debugging or for the systolic array case (where there is no RETURN instruction mapped).

I'm totally fine with AI, I also use it for coding, and I check every line of code it generated carefully to make sure it is reasonable.

@tancheng

Copy link
Copy Markdown
Owner

Concrete example: Completion should come from the explicit RET path, not from a control-memory timeout.
ctrl steps: ADD MUL RET value=42
Before: ctrl memory could send COMPLETE when the step counter reached the end.
After: CtrlMemDynamicRTL waits for the RET response from the element path and forwards that result to the controller.

I agree with your proposal. For implementation, I think we should simultaneously check whether the opcode of current send_ctrl is OPT_RET and the predicate of its execution result is 1.

hmm, I suggest we keep the original solution as well. i.e., it can also send COMPLETE if total_ctrl_steps_val is reached. It already check whether it is non-zero:

elif ((s.total_ctrl_steps_val > 0

I thought including this total_ctrl_steps_val won't introduce any bug..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants