REACH is a pipeline-style framework designed for the Cloud and Edge Continuum. It transfers reinforcement learning based microservice rescheduling policies from simulation to real Kubernetes deployments. The framework starts from application profiling, uses the profiling results to train an RL policy in a Cloud and Edge Continuum simulation environment, and then deploys the trained policy as an online decision-making service for live pod migration.
The core idea is to make the simulation environment reflect the target continuum deployment closely enough that a policy trained offline can be moved into the production control loop. On a training machine with 32 CPU cores, 236 GB memory, and an Nvidia L40s GPU, it took less than 10 minutes to train a policy that includes 28 pods and 12 computing nodes.
REACH contains two main implementation components:
CEEnv/: a Gymnasium-based cloud-edge RL environment for training and serving migration policies.pod-migrator/: a Kubernetes controller process that observes the live cluster, calls the policy service, and performs pod migration.
This repository accompanies the paper REACH: Reinforcement Learning for Adaptive Microservice Rescheduling in the Cloud-Edge Continuum.
The paper presents REACH as an RL-based microservice rescheduling framework for the cloud-edge continuum. It combines CEEnv, a simulation environment for training rescheduling policies, with a Kubernetes-based PodMigrator that applies learned decisions in a real deployment.
Paper link:
https://arxiv.org/abs/2510.06675
REACH follows a three-stage deployment pipeline:
- Profile the target microservice application on heterogeneous Cloud and Edge Continuum nodes.
- Train an RL rescheduling policy in CEEnv using the profiled execution times, workload graph, and node topology.
- Deploy the trained policy as a decision-making agent and connect it to
pod-migrator, which applies the decisions in Kubernetes.
Stage 1: Profile Stage 2: Train Stage 3: Deploy
---------------- -------------- ----------------
+-------------------+ +----------------------+ +----------------------+
| Target MSA on | profile | CEEnv configuration | train | Trained RL policy |
| heterogeneous | ---------> | - node topology | -------> | Maskable PPO / DQN |
| cloud-edge nodes | | - service resources | +----------+-----------+
+-------------------+ | - workload graph | |
| - execution times | | serve as agent
+----------+-----------+ v
| +----------------------+
v | CEEnv policy server |
+----------------------+ | POST /get_action |
| CEEnv simulation | +----------+-----------+
| latency + reward | ^
| model for training | |
+----------------------+ | state
| decision
+------------+---------+
| pod-migrator |
| monitor + reschedule |
+------------+---------+
|
v
+----------------------+
| Kubernetes cluster |
| live pod migration |
+----------------------+
At runtime, the trained CEEnv model is served as an RL agent. pod-migrator builds the current Kubernetes state, including node resources and pod deployability, sends it to the CEEnv policy server, receives a concrete decision, and performs the pod migration. The decision payload returned by CEEnv contains pod_name, target_node, and is_stop. If is_stop is true, pod-migrator leaves the current placement unchanged. Otherwise, it launches a replacement pod on the target node and removes the old pod after the replacement becomes ready.
REACH/
CEEnv/ # RL environment, training scripts, Flask policy server
scripts/ # CEEnv training, serving, and network helper scripts
pod-migrator/ # Kubernetes migration controller written in Go
The following subsections describe each stage of the REACH pipeline in operational terms:
- how to turn application profiling data into workload configuration;
- how CEEnv uses that configuration to train an RL policy;
- how the trained policy is served as an agent and connected to Kubernetes through
pod-migrator.
The profiling phase measures the average request execution time of each microservice on each node class or CPU type. This repository does not include a profiler. Use an external non-intrusive profiling tool, such as Envy, Jaguar, Jaeger-based tracing, or a similar production-safe profiler, to collect per-service execution times without changing application code.
After profiling, place the resulting average execution times into the CEEnv workload configuration. The execution times are stored in the call-pattern JSON files:
CEEnv/config/aggregator_parallel_call_patterns.json
CEEnv/config/aggregator_sequential_call_patterns.json
CEEnv/config/chain_call_patterns.json
Each call entry contains an execution-time map keyed by CPU type:
"execution-time": {
"0": "7ms",
"1": "10.5ms",
"2": "14ms"
}The keys correspond to cpu_type values in CEEnv/config/7-21/nodes.json. Update these values with the profiled average execution time for each microservice on each node type.
Validate edited JSON files before training:
cd CEEnv
python -m json.tool config/aggregator_sequential_call_patterns.json >/dev/null
python -m json.tool config/7-21/nodes.json >/dev/null
python -m json.tool config/7-21/services.json >/dev/nullCEEnv models end-to-end microservice latency from the workload profile and cloud-edge topology. Given a microservice invocation graph, CEEnv traverses the call graph with DFS and recursively aggregates:
- profiled execution time on heterogeneous CPU types;
- inter-service network latency from the node-layer model;
- service placement and replica information;
- parallel or sequential external-call structure.
Parallel, sequential, and chain-style workload variants are represented by separate call-pattern files. During training, the RL reward is driven primarily by end-to-end latency changes, so the model learns pod migration actions that reduce application latency under cloud-edge dynamics.
CEEnv/config/ contains one runnable cluster example and reusable workload definitions:
CEEnv/config/
7-21/
nodes.json # representative node and network configuration
services.json # service resource and replica configuration for the example
aggregator_parallel_call_patterns.json
aggregator_sequential_call_patterns.json
chain_call_patterns.json
services_full.json
The 7-21 directory is the example configuration used by the default commands. The directory name follows <node-count>-<pod-count>, so 7-21 means seven nodes and up to twenty-one pods.
nodes.json describes the simulated cluster:
node_type: node families, CPU type identifiers, and bandwidth capacity.latency: latency ranges between layers such asclient,edge, andcloud.cluster_setup: how many nodes of each type exist in each layer, with CPU, memory, and bandwidth-utilization ranges.
services.json describes the workload services used by the example:
cpu-requestsandmemory-requests: resource demand per pod.max-replica: maximum number of pod replicas for the service.type: service category used by the environment.layer: optional placement constraint. For example,clientpins the client service to the client layer.
The root call-pattern files define endpoint call paths, request rate (rps), QoS target (qos), data size, and execution time per CPU type:
aggregator_parallel_call_patterns.jsonaggregator_sequential_call_patterns.jsonchain_call_patterns.json
services_full.json is kept as a richer workload reference with external-service metadata.
Use Python 3.12 or a compatible Python 3 environment.
cd CEEnv
python -m venv .venv
. .venv/bin/activate
pip install -r requirementsFor GPU training, install the PyTorch build that matches your CUDA runtime before installing or adjusting requirements.
All CEEnv shell entry points are collected under CEEnv/scripts/:
CEEnv/scripts/
train_policy.sh # train a CEEnv policy with environment-variable overrides
serve_policy.sh # start the CEEnv policy server
init.sh # initialization helper
network/ # optional network setup and traffic-control helpers
The scripts automatically switch to the CEEnv project root before running Python commands.
Masked PPO is the primary training path:
cd CEEnv
python trainingmask.py \
--nodes 7 \
--pods 21 \
--pattern aggregator_sequential \
--tag local \
--total_timesteps 100000 \
--device cpu \
--cpu_num 0The helper script exposes the same values through environment variables:
TOTAL_TIMESTEPS=1000000 \
DEVICE=cuda \
CPU_NUM=8 \
PATTERN=aggregator_sequential \
TAG=local \
./CEEnv/scripts/train_policy.shModels are written under CEEnv/models/ and logs under CEEnv/logs/; both are ignored by Git.
cd CEEnv
python testing.py \
--pattern aggregator_sequential \
--models_tpl "ppo-{pods}pods-{nodes}nodes-aggregator_sequential-local" \
--configs 7-21After training in CEEnv, the learned rescheduling policy is integrated into Kubernetes through pod-migrator. Kubernetes does not natively reschedule running pods to new nodes, so pod-migrator implements the rescheduling workflow:
- continuously monitor cluster and application state;
- send the current state to the CEEnv policy server;
- do nothing when the policy returns an idle action (
is_stop: true); - otherwise launch replacement pods on target nodes;
- terminate old pods only after new pods become ready, reducing service disruption.
Start the scheduling decision server from the trained model:
cd CEEnv
python server.py \
--modelname ppo \
--pattern aggregator_sequential \
--tag localOr use the script wrapper:
MODEL_NAME=ppo \
PATTERN=aggregator_sequential \
TAG=local \
./CEEnv/scripts/serve_policy.shThe server expects a trained model at:
CEEnv/models/<modelname>-21pods-7nodes-<pattern>-<tag>/best_model.zip
The server exposes:
POST http://localhost:5000/get_action
The server treats the trained model as an agent. It receives the current state from pod-migrator, builds a testbed observation, runs model.predict(...), and returns the resulting decision. The request body must include cluster_state and pod_deployable; the response contains pod_name, target_node, and is_stop.
pod-migrator is the online decision-making and migration component. It runs as a continuous control loop: it queries live cluster metrics from Prometheus, reads Kubernetes state through the Kubernetes API, sends the current scheduling state to the CEEnv policy server, and applies the returned migration decision. Before starting it, make sure the target Kubernetes environment and observability endpoints are already working.
Prerequisites:
- Go 1.22+
- A working Kubernetes cluster with the target microservice application already deployed
- Access to that cluster through
KUBECONFIG - Prometheus endpoint for cluster resource and application metrics
- Jaeger endpoint for trace latency data
- A running CEEnv policy server at
http://localhost:5000/get_action - Kubernetes labels that identify the target application pods, for example
APP_LABEL=deployment=mubench
Install dependencies:
cd pod-migrator
go mod downloadCreate a local .env from the example:
cd pod-migrator
cp .env.example .envSet these values for your cluster:
KUBECONFIG=/path/to/kubeconfig
PROMETHEUS_ADDR=http://prometheus.example:9090
APP_LABEL=deployment=mubench
GATEWAY_SERVICE=client
NAMESPACE=default
JAEGER_URL=http://jaeger.example/api/traces
SERVICE_NAME=machine-learning.default
LOOKBACK=1h
LIMIT=20
POLLING_PERIOD=1s
ALPHA=0.3
QOS_THRESHOLD=250
MAX_POD_WAITING_RETRIES=100
MAX_STEP=15
ENTRY_SERVICE_URL=http://client.default
.env is ignored by Git.
Start CEEnv first, then run:
cd pod-migrator
./run_app.shOr run directly:
cd pod-migrator
go run . --mode=monitoring --strategy=rl --output=pod-migrator.csvUseful modes are monitoring, test, pod_distribution, nodefailed, and autoscaling. Use go run . --help to inspect flags.
The implemented integration path is:
pod-migrator/request.gosendsPOST http://localhost:5000/get_action.CEEnv/server.pyloads the trained model and callsmodel.predict(...).CEEnv/server.pyreturnspod_name,target_node, andis_stop.pod-migrator/main.goandpod-migrator/rl.gopass non-idle decisions intoMigratePod.pod-migrator/pkg/migrator/migrator.goperforms the Kubernetes-side migration.
CSV and log outputs are treated as experiment artifacts and ignored by Git. Use the --output flag when you need a specific path for case-study data.
@misc{bai2025reachreinforcementlearningadaptive,
title={REACH: Reinforcement Learning for Adaptive Microservice Rescheduling in the Cloud-Edge Continuum},
author={Xu Bai and Muhammed Tawfiqul Islam and Rajkumar Buyya and Adel N. Toosi},
year={2025},
eprint={2510.06675},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2510.06675},
}Contact:
- Xu Bai:
baixu.must@gmail.com