qkc/cluster: add pyquarkchain-compatible cluster protocol wire library#19
Draft
iteyelmp wants to merge 12 commits into
Draft
qkc/cluster: add pyquarkchain-compatible cluster protocol wire library#19iteyelmp wants to merge 12 commits into
iteyelmp wants to merge 12 commits into
Conversation
|
How did you test it? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the cluster protocol wire library for the Go slave, providing byte-compatible communication with the Python master and other slaves. It addresses issue #5.
What's included
Protocol layer (cluster package)
Frame codec (frame.go)
[4B len] [12B metadata] [1B opcode] [8B rpc_id] [payload]Master connection (master_conn.go)
Dispatcher (master_dispatcher.go)
== 0→ MasterConn.Handle() (cluster RPC, master commands)!= 0→ PeerConn.Dispatch() (virtual P2P, external peer traffic)Peer virtual connection (peer_conn.go)
Xshard direct TCP (xshard_conn.go, xshard_pool.go)
wait_until_ping_received); remote id and full_shard_id_list are recorded on the conn byrecordPingSlaveRPC (slave_rpc.go)
NewSlaveRPC(cfg)→RegisterHandlers()→
Serve()*Slaveis completely hidden from callersMessage structs (messages.go)
Protocol constants (protocol.go)
Runtime flow
The slave goes through a fixed connection lifecycle that mirrors the Python implementation. Understanding this order is critical for wire compatibility debugging.
Step 1: Bootstrap — Master connects first
SlaveRPC.Serve()opens the listen socket and callsacceptLoop. The very first TCP connection accepted is treated as the master connection (matches PythonSlave.handle_new_connectionwhich setsself.master_connon the first accept). All subsequent connections are treated as inbound Slave↔Slave xshard connections.The master then sends PING with
metadata = (Branch=0, ClusterPeerID=0)andPingRequest{ID, FullShardIDList}. The slave responds withPongResponse{ID, FullShardIDList}. This is the only Mode 1 frame themaster initiates; afterwards the slave is considered "online".
Step 2: Master triggers inter-slave mesh — CONNECT_TO_SLAVES
The master sends
CONNECT_TO_SLAVES_REQUESTwith aSlaveInfoListof peers (id, host, port, full_shard_id_list). For each entry the slave:connectedSlaveIDsset, matches Pythonslave_ids)NewXshardConn)PINGcarrying the local id and shard list and waits forPONGPongResponse.IDandFullShardIDListmatch what the master advertised — mismatch closes the conn with an errorfullShardIDinxshardPool(key:FullShardID{ChainID, ShardID}— high 16 bits / low 16 bits)connectedSlaveIDsso the reciprocal inbound connection is not re-dialledStep 3: Inbound xshard handshake — acceptLoop else branch
When another slave dials in (the reciprocal of Step 2),
acceptLoopwraps the raw conn in anXshardConn, applies the same SLAVE_OP_RPC_MAPhandlers, tracks it inxshardPool.TrackInbound, then spawns a goroutine that blocks onWaitUntilPingReceivedbefore indexing:recordPingstoresremoteIDandremoteFullShardIDListon the conn and closes thepingReceivedsignal channelxshardPoolby every peer shard, and adds the remote id toconnectedSlaveIDsThis two-sided handshake guarantees that
xshardPoolis only populated with verified peers, matching Python'sSlaveConnectionManagerflow.Step 4: External peer attaches — CREATE_CLUSTER_PEER_CONNECTION
When an external P2P peer connects to the master, the master broadcasts
CREATE_CLUSTER_PEER_CONNECTION_REQUESTto every slave. Wire compatibility note: the master sends this withmetadata = ClusterMetadata(ROOT_BRANCH=0, cluster_peer_id=0); the realcluster_peer_idlives in the payload (CreateClusterPeerConnectionRequest.cluster_peer_id). The slave must deserialize the payload to read it — reading fromframe.Meta.ClusterPeerIDwould always yield 0 and break all PeerConn routing. The slave then:clusterPeerIDssetPeerConnbacked by the samemasterConn, applies the peer-shard CommandOp handlers, and registers it inDispatcher.peerConns[cluster_peer_id][branch]metadata = (branch, cluster_peer_id), andDispatcher.Dispatchroutes them to the matching PeerConnDESTROY_CLUSTER_PEER_CONNECTION_COMMAND(also carrying cluster_peer_id in the payload) tears it all downStep 5: Ongoing traffic
After handshake the slave handles three traffic classes concurrently:
cluster_peer_id == 0→MasterConn.HandlexshardPoolbyFullShardIDcluster_peer_id != 0→Dispatcher.Dispatch→PeerConn.HandleFrameDesign decisions
Slave(raw frames, opcode bytes) is internal;SlaveRPC(typed methods, serialization) is the public APIMasterConn.Handle()producesRESPONSE = REQUEST + 1— only REQUEST opcodes need handler registrationErrNotImplementedwith debug logging; shim layer callsRegisterPeerHandler()to replace stubs with real implementations laterSlave.xshardHandlersis applied to both outbound (inConnectToSlaves) and inbound (inacceptLoopelse branch) XshardConns, matching Python'sSLAVE_OP_RPC_MAPusageTesting
35 tests across the three communication modes:
TestE2E_TwoSlavesHandshake: two real Go Slaves talking to each other over TCP, verifying bidirectional PING handshake, bidirectionalrecordPing, bidirectionalxshardPoolindexing, and a realADD_XSHARD_TX_LISTdelivery from A to BTestE2E_PythonMasterPeerRouting: simulates Python master's exactbroadcast_rpcwire behavior over real TCP — verifiescluster_peer_idis read from the payload on CREATE_CLUSTER_PEER_CONNECTION, PeerConn creation, Dispatcher routing to PeerConn, and PeerConn removal on DESTROY