REACH

REACH is a pipeline-style framework designed for the Cloud and Edge Continuum. It transfers reinforcement learning based microservice rescheduling policies from simulation to real Kubernetes deployments. The framework starts from application profiling, uses the profiling results to train an RL policy in a Cloud and Edge Continuum simulation environment, and then deploys the trained policy as an online decision-making service for live pod migration.

The core idea is to make the simulation environment reflect the target continuum deployment closely enough that a policy trained offline can be moved into the production control loop. On a training machine with 32 CPU cores, 236 GB memory, and an Nvidia L40s GPU, it took less than 10 minutes to train a policy that includes 28 pods and 12 computing nodes.

REACH contains two main implementation components:

CEEnv/: a Gymnasium-based cloud-edge RL environment for training and serving migration policies.
pod-migrator/: a Kubernetes controller process that observes the live cluster, calls the policy service, and performs pod migration.

Associated Paper

This repository accompanies the paper REACH: Reinforcement Learning for Adaptive Microservice Rescheduling in the Cloud-Edge Continuum.

The paper presents REACH as an RL-based microservice rescheduling framework for the cloud-edge continuum. It combines CEEnv, a simulation environment for training rescheduling policies, with a Kubernetes-based PodMigrator that applies learned decisions in a real deployment.

Paper link:

https://arxiv.org/abs/2510.06675

Framework Overview

REACH follows a three-stage deployment pipeline:

Profile the target microservice application on heterogeneous Cloud and Edge Continuum nodes.
Train an RL rescheduling policy in CEEnv using the profiled execution times, workload graph, and node topology.
Deploy the trained policy as a decision-making agent and connect it to pod-migrator, which applies the decisions in Kubernetes.

  Stage 1: Profile                 Stage 2: Train                    Stage 3: Deploy
  ----------------                 --------------                    ----------------

  +-------------------+            +----------------------+          +----------------------+
  | Target MSA on     |  profile   | CEEnv configuration  |  train   | Trained RL policy    |
  | heterogeneous     | ---------> | - node topology      | -------> | Maskable PPO / DQN   |
  | cloud-edge nodes  |            | - service resources  |          +----------+-----------+
  +-------------------+            | - workload graph     |                     |
                                   | - execution times    |                     | serve as agent
                                   +----------+-----------+                     v
                                              |                       +----------------------+
                                              v                       | CEEnv policy server  |
                                   +----------------------+          | POST /get_action     |
                                   | CEEnv simulation     |          +----------+-----------+
                                   | latency + reward     |                     ^
                                   | model for training   |                     |
                                   +----------------------+                     | state
                                                                                | decision
                                                                   +------------+---------+
                                                                   | pod-migrator         |
                                                                   | monitor + reschedule |
                                                                   +------------+---------+
                                                                                |
                                                                                v
                                                                   +----------------------+
                                                                   | Kubernetes cluster   |
                                                                   | live pod migration   |
                                                                   +----------------------+

At runtime, the trained CEEnv model is served as an RL agent. pod-migrator builds the current Kubernetes state, including node resources and pod deployability, sends it to the CEEnv policy server, receives a concrete decision, and performs the pod migration. The decision payload returned by CEEnv contains pod_name, target_node, and is_stop. If is_stop is true, pod-migrator leaves the current placement unchanged. Otherwise, it launches a replacement pod on the target node and removes the old pod after the replacement becomes ready.

Repository Layout

REACH/
  CEEnv/          # RL environment, training scripts, Flask policy server
    scripts/      # CEEnv training, serving, and network helper scripts
  pod-migrator/   # Kubernetes migration controller written in Go

Workload And Deployment Pipeline

The following subsections describe each stage of the REACH pipeline in operational terms:

how to turn application profiling data into workload configuration;
how CEEnv uses that configuration to train an RL policy;
how the trained policy is served as an agent and connected to Kubernetes through pod-migrator.

1. Microservice Application Profiles

The profiling phase measures the average request execution time of each microservice on each node class or CPU type. This repository does not include a profiler. Use an external non-intrusive profiling tool, such as Envy, Jaguar, Jaeger-based tracing, or a similar production-safe profiler, to collect per-service execution times without changing application code.

After profiling, place the resulting average execution times into the CEEnv workload configuration. The execution times are stored in the call-pattern JSON files:

CEEnv/config/aggregator_parallel_call_patterns.json
CEEnv/config/aggregator_sequential_call_patterns.json
CEEnv/config/chain_call_patterns.json

Each call entry contains an execution-time map keyed by CPU type:

"execution-time": {
  "0": "7ms",
  "1": "10.5ms",
  "2": "14ms"
}

The keys correspond to cpu_type values in CEEnv/config/7-21/nodes.json. Update these values with the profiled average execution time for each microservice on each node type.

Validate edited JSON files before training:

cd CEEnv
python -m json.tool config/aggregator_sequential_call_patterns.json >/dev/null
python -m json.tool config/7-21/nodes.json >/dev/null
python -m json.tool config/7-21/services.json >/dev/null

2. CEEnv Simulation And Policy Training

CEEnv models end-to-end microservice latency from the workload profile and cloud-edge topology. Given a microservice invocation graph, CEEnv traverses the call graph with DFS and recursively aggregates:

profiled execution time on heterogeneous CPU types;
inter-service network latency from the node-layer model;
service placement and replica information;
parallel or sequential external-call structure.

Parallel, sequential, and chain-style workload variants are represented by separate call-pattern files. During training, the RL reward is driven primarily by end-to-end latency changes, so the model learns pod migration actions that reduce application latency under cloud-edge dynamics.

CEEnv Configuration

CEEnv/config/ contains one runnable cluster example and reusable workload definitions:

CEEnv/config/
  7-21/
    nodes.json       # representative node and network configuration
    services.json    # service resource and replica configuration for the example
  aggregator_parallel_call_patterns.json
  aggregator_sequential_call_patterns.json
  chain_call_patterns.json
  services_full.json

The 7-21 directory is the example configuration used by the default commands. The directory name follows <node-count>-<pod-count>, so 7-21 means seven nodes and up to twenty-one pods.

nodes.json describes the simulated cluster:

node_type: node families, CPU type identifiers, and bandwidth capacity.
latency: latency ranges between layers such as client, edge, and cloud.
cluster_setup: how many nodes of each type exist in each layer, with CPU, memory, and bandwidth-utilization ranges.

services.json describes the workload services used by the example:

cpu-requests and memory-requests: resource demand per pod.
max-replica: maximum number of pod replicas for the service.
type: service category used by the environment.
layer: optional placement constraint. For example, client pins the client service to the client layer.

The root call-pattern files define endpoint call paths, request rate (rps), QoS target (qos), data size, and execution time per CPU type:

aggregator_parallel_call_patterns.json
aggregator_sequential_call_patterns.json
chain_call_patterns.json

services_full.json is kept as a richer workload reference with external-service metadata.

Install CEEnv Dependencies

Use Python 3.12 or a compatible Python 3 environment.

cd CEEnv
python -m venv .venv
. .venv/bin/activate
pip install -r requirements

For GPU training, install the PyTorch build that matches your CUDA runtime before installing or adjusting requirements.

CEEnv Scripts

All CEEnv shell entry points are collected under CEEnv/scripts/:

CEEnv/scripts/
  train_policy.sh      # train a CEEnv policy with environment-variable overrides
  serve_policy.sh      # start the CEEnv policy server
  init.sh              # initialization helper
  network/             # optional network setup and traffic-control helpers

The scripts automatically switch to the CEEnv project root before running Python commands.

Train A Policy

Masked PPO is the primary training path:

cd CEEnv
python trainingmask.py \
  --nodes 7 \
  --pods 21 \
  --pattern aggregator_sequential \
  --tag local \
  --total_timesteps 100000 \
  --device cpu \
  --cpu_num 0

The helper script exposes the same values through environment variables:

TOTAL_TIMESTEPS=1000000 \
DEVICE=cuda \
CPU_NUM=8 \
PATTERN=aggregator_sequential \
TAG=local \
./CEEnv/scripts/train_policy.sh

Models are written under CEEnv/models/ and logs under CEEnv/logs/; both are ignored by Git.

Evaluate A Policy

cd CEEnv
python testing.py \
  --pattern aggregator_sequential \
  --models_tpl "ppo-{pods}pods-{nodes}nodes-aggregator_sequential-local" \
  --configs 7-21

3. Integrating With Kubernetes

After training in CEEnv, the learned rescheduling policy is integrated into Kubernetes through pod-migrator. Kubernetes does not natively reschedule running pods to new nodes, so pod-migrator implements the rescheduling workflow:

continuously monitor cluster and application state;
send the current state to the CEEnv policy server;
do nothing when the policy returns an idle action (is_stop: true);
otherwise launch replacement pods on target nodes;
terminate old pods only after new pods become ready, reducing service disruption.

Serve The CEEnv Policy

Start the scheduling decision server from the trained model:

cd CEEnv
python server.py \
  --modelname ppo \
  --pattern aggregator_sequential \
  --tag local

Or use the script wrapper:

MODEL_NAME=ppo \
PATTERN=aggregator_sequential \
TAG=local \
./CEEnv/scripts/serve_policy.sh

The server expects a trained model at:

CEEnv/models/<modelname>-21pods-7nodes-<pattern>-<tag>/best_model.zip

The server exposes:

POST http://localhost:5000/get_action

The server treats the trained model as an agent. It receives the current state from pod-migrator, builds a testbed observation, runs model.predict(...), and returns the resulting decision. The request body must include cluster_state and pod_deployable; the response contains pod_name, target_node, and is_stop.

Configure PodMigrator

pod-migrator is the online decision-making and migration component. It runs as a continuous control loop: it queries live cluster metrics from Prometheus, reads Kubernetes state through the Kubernetes API, sends the current scheduling state to the CEEnv policy server, and applies the returned migration decision. Before starting it, make sure the target Kubernetes environment and observability endpoints are already working.

Prerequisites:

Go 1.22+
A working Kubernetes cluster with the target microservice application already deployed
Access to that cluster through KUBECONFIG
Prometheus endpoint for cluster resource and application metrics
Jaeger endpoint for trace latency data
A running CEEnv policy server at http://localhost:5000/get_action
Kubernetes labels that identify the target application pods, for example APP_LABEL=deployment=mubench

Install dependencies:

cd pod-migrator
go mod download

Create a local .env from the example:

cd pod-migrator
cp .env.example .env

Set these values for your cluster:

KUBECONFIG=/path/to/kubeconfig
PROMETHEUS_ADDR=http://prometheus.example:9090
APP_LABEL=deployment=mubench
GATEWAY_SERVICE=client
NAMESPACE=default
JAEGER_URL=http://jaeger.example/api/traces
SERVICE_NAME=machine-learning.default
LOOKBACK=1h
LIMIT=20
POLLING_PERIOD=1s
ALPHA=0.3
QOS_THRESHOLD=250
MAX_POD_WAITING_RETRIES=100
MAX_STEP=15
ENTRY_SERVICE_URL=http://client.default

.env is ignored by Git.

Run PodMigrator

Start CEEnv first, then run:

cd pod-migrator
./run_app.sh

Or run directly:

cd pod-migrator
go run . --mode=monitoring --strategy=rl --output=pod-migrator.csv

Useful modes are monitoring, test, pod_distribution, nodefailed, and autoscaling. Use go run . --help to inspect flags.

The implemented integration path is:

pod-migrator/request.go sends POST http://localhost:5000/get_action.
CEEnv/server.py loads the trained model and calls model.predict(...).
CEEnv/server.py returns pod_name, target_node, and is_stop.
pod-migrator/main.go and pod-migrator/rl.go pass non-idle decisions into MigratePod.
pod-migrator/pkg/migrator/migrator.go performs the Kubernetes-side migration.

CSV and log outputs are treated as experiment artifacts and ignored by Git. Use the --output flag when you need a specific path for case-study data.

Cite Us

@misc{bai2025reachreinforcementlearningadaptive,
      title={REACH: Reinforcement Learning for Adaptive Microservice Rescheduling in the Cloud-Edge Continuum}, 
      author={Xu Bai and Muhammed Tawfiqul Islam and Rajkumar Buyya and Adel N. Toosi},
      year={2025},
      eprint={2510.06675},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2510.06675}, 
}

Contact:

Xu Bai: baixu.must@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CEEnv		CEEnv
pod-migrator		pod-migrator
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REACH

Associated Paper

Framework Overview

Repository Layout

Workload And Deployment Pipeline

1. Microservice Application Profiles

2. CEEnv Simulation And Policy Training

CEEnv Configuration

Install CEEnv Dependencies

CEEnv Scripts

Train A Policy

Evaluate A Policy

3. Integrating With Kubernetes

Serve The CEEnv Policy

Configure PodMigrator

Run PodMigrator

Cite Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

REACH

Associated Paper

Framework Overview

Repository Layout

Workload And Deployment Pipeline

1. Microservice Application Profiles

2. CEEnv Simulation And Policy Training

CEEnv Configuration

Install CEEnv Dependencies

CEEnv Scripts

Train A Policy

Evaluate A Policy

3. Integrating With Kubernetes

Serve The CEEnv Policy

Configure PodMigrator

Run PodMigrator

Cite Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages