From dffbeb276f74572bfc344a92300ff61c23c3bf67 Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Fri, 19 Jun 2026 20:44:50 +0800 Subject: [PATCH 1/2] Clear 3 security alerts: protobuf e2e fixture CVE + histogram count narrowing - Dependabot CVE-2026-0994: bump the Airflow e2e mock's pinned protobuf 4.25.8 -> 5.29.6 (no 4.x patch exists) and opentelemetry-proto 1.24.0 -> 1.28.0 (its protobuf<5.0 cap was the blocker). CI-only test fixture, never shipped; grpcio/flask unchanged. - CodeQL java/implicit-cast-in-compound-assignment: widen the cumulative `count` accumulator from int to long in Sum/AvgHistogramPercentileFunction. `count += value` silently narrowed a long bucket-count sum back to int; `total` was already long. Verified: Sum/AvgHistogramPercentileFunctionTest pass (12/12); checkstyle + license clean. --- docs/en/changes/changes.md | 1 + .../meter/function/avg/AvgHistogramPercentileFunction.java | 2 +- .../meter/function/sum/SumHistogramPercentileFunction.java | 2 +- test/e2e-v2/cases/airflow/mock/requirements-replay.txt | 4 ++-- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index 19944574a644..140114b8b766 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -304,6 +304,7 @@ * Bump Apache Curator `4.3.0` → `5.9.0` and Apache ZooKeeper `3.5.7` → `3.9.5` together to clear CVE-2023-44981 (the bundled ZooKeeper jar carried it; OAP is a ZooKeeper client only, so the server-side bug was never reachable, but the jar tripped Dependabot). The cluster-zookeeper and configuration-zookeeper plugins use only stable Curator APIs, so no source changes were required. Operator-facing change: the supported ZooKeeper server version is now 3.6+ (Curator 5.x uses ZooKeeper persistent watches, added in server 3.6.0); older servers (3.5.x, 3.4.x) are no longer supported. * Migrate the Consul cluster and configuration client from the abandoned `com.orbitz.consul:consul-client` `1.5.3` to the maintained fork `org.kiwiproject:consul-client` `0.9.0` to clear the okhttp CVE the old client carried (CVE-2021-0341; the old client pinned okhttp `3.14.9`, fixed in okhttp `4.9.2+`), so the BOM now pins okhttp to `4.12.0`. The fork's `0.9.x` line is the last one built for JDK 11 (which SkyWalking still targets); `1.0.0+` is compiled to JDK 17 bytecode, so the migration stays on `0.9.0`. The cluster-consul and configuration-consul plugins use only stable Consul client APIs, so the change is a package rename (`com.orbitz.consul` → `org.kiwiproject.consul`); okhttp is pulled only by the Consul plugins (the fabric8 Kubernetes client excludes its okhttp transport), so no other module is affected. * Bump test-scope assertj-core `3.20.2` → `3.27.7` to clear CVE-2026-24400 (XXE in `isXmlEqualTo`, not used by any test). +* Clear three security alerts: bump the Airflow e2e mock's pinned `protobuf` `4.25.8` → `5.29.6` (and `opentelemetry-proto` `1.24.0` → `1.28.0`, whose `protobuf<5.0` cap was the blocker) to clear CVE-2026-0994 — a CI-only test fixture, never shipped; and widen the cumulative `count` accumulator from `int` to `long` in `SumHistogramPercentileFunction` / `AvgHistogramPercentileFunction` to clear the CodeQL `implicit-cast-in-compound-assignment` alerts (`count += value` silently narrowed a `long` bucket-count sum back to `int`, while `total` was already `long`). * Fix: continuous profiling policy validation now rejects a threshold / count of `0` to match the error messages and rover's `value >= threshold` trigger semantics (a `0` threshold would always trigger). CPU percent and HTTP error rate are tightened from `[0-100]` to `(0-100]`. * Fix wrong BanyanDB resource options in record data. * Align the default BanyanDB stage `segmentInterval` values so each coarser stage is an integer multiple of the finer one (`records` cold `3` → `4`, `metricsMinute` cold `5` → `6`, `metricsHour` warm `7` → `10` and cold `15` → `20`), keeping hot → warm → cold lifecycle migration on the cheap whole-segment fast path. diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/avg/AvgHistogramPercentileFunction.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/avg/AvgHistogramPercentileFunction.java index c018a2e7c12a..fd3002e03a2b 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/avg/AvgHistogramPercentileFunction.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/avg/AvgHistogramPercentileFunction.java @@ -248,7 +248,7 @@ public void calculate() { roofs[i] = Math.round(total * ranks.get(i) * 1.0f / 100); } - int count = 0; + long count = 0; final List sortedKeys = subDataset.sortedKeys(Comparator.comparingLong(Long::parseLong)); int loopIndex = 0; diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/sum/SumHistogramPercentileFunction.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/sum/SumHistogramPercentileFunction.java index 5d94a5f55f09..b7435970593a 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/sum/SumHistogramPercentileFunction.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/meter/function/sum/SumHistogramPercentileFunction.java @@ -214,7 +214,7 @@ public void calculate() { roofs[i] = Math.round(total * ranks.get(i) * 1.0f / 100); } - int count = 0; + long count = 0; final List sortedKeys = subDataset.sortedKeys(Comparator.comparingLong(Long::parseLong)); int loopIndex = 0; diff --git a/test/e2e-v2/cases/airflow/mock/requirements-replay.txt b/test/e2e-v2/cases/airflow/mock/requirements-replay.txt index 2e302eda6988..8446cc888d55 100644 --- a/test/e2e-v2/cases/airflow/mock/requirements-replay.txt +++ b/test/e2e-v2/cases/airflow/mock/requirements-replay.txt @@ -1,4 +1,4 @@ flask==3.1.3 grpcio==1.62.2 -protobuf==4.25.8 -opentelemetry-proto==1.24.0 +protobuf==5.29.6 +opentelemetry-proto==1.28.0 From b28d5e16ef7beb5b9a3f7f8dfa5a89c8b61ce6c3 Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Fri, 19 Jun 2026 23:16:00 +0800 Subject: [PATCH 2/2] Fix airflow e2e mock: bump grpcio to 1.63.2 for opentelemetry-proto 1.28.0 stubs opentelemetry-proto 1.28.0's generated gRPC stubs are produced by grpcio-tools >=1.63 and call channel.unary_unary(_registered_method=True), which grpcio 1.62.2 does not accept -> MetricsServiceStub(channel) raised "TypeError: Channel.unary_unary() got an unexpected keyword argument '_registered_method'", so the OTLP replay mock could not build the stub or send metrics. Bump grpcio 1.62.2 -> 1.63.2. Reproduced + verified in python:3.11-slim against the real mock-data: json_format.Parse (protobuf 5.29.6) + MetricsServiceStub construction + stub.Export all work (Export reaches the wire; UNAVAILABLE only because no server is listening). The earlier commit's claim that grpcio was unchanged was wrong. --- docs/en/changes/changes.md | 2 +- test/e2e-v2/cases/airflow/mock/requirements-replay.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index 140114b8b766..a91eefb0679e 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -304,7 +304,7 @@ * Bump Apache Curator `4.3.0` → `5.9.0` and Apache ZooKeeper `3.5.7` → `3.9.5` together to clear CVE-2023-44981 (the bundled ZooKeeper jar carried it; OAP is a ZooKeeper client only, so the server-side bug was never reachable, but the jar tripped Dependabot). The cluster-zookeeper and configuration-zookeeper plugins use only stable Curator APIs, so no source changes were required. Operator-facing change: the supported ZooKeeper server version is now 3.6+ (Curator 5.x uses ZooKeeper persistent watches, added in server 3.6.0); older servers (3.5.x, 3.4.x) are no longer supported. * Migrate the Consul cluster and configuration client from the abandoned `com.orbitz.consul:consul-client` `1.5.3` to the maintained fork `org.kiwiproject:consul-client` `0.9.0` to clear the okhttp CVE the old client carried (CVE-2021-0341; the old client pinned okhttp `3.14.9`, fixed in okhttp `4.9.2+`), so the BOM now pins okhttp to `4.12.0`. The fork's `0.9.x` line is the last one built for JDK 11 (which SkyWalking still targets); `1.0.0+` is compiled to JDK 17 bytecode, so the migration stays on `0.9.0`. The cluster-consul and configuration-consul plugins use only stable Consul client APIs, so the change is a package rename (`com.orbitz.consul` → `org.kiwiproject.consul`); okhttp is pulled only by the Consul plugins (the fabric8 Kubernetes client excludes its okhttp transport), so no other module is affected. * Bump test-scope assertj-core `3.20.2` → `3.27.7` to clear CVE-2026-24400 (XXE in `isXmlEqualTo`, not used by any test). -* Clear three security alerts: bump the Airflow e2e mock's pinned `protobuf` `4.25.8` → `5.29.6` (and `opentelemetry-proto` `1.24.0` → `1.28.0`, whose `protobuf<5.0` cap was the blocker) to clear CVE-2026-0994 — a CI-only test fixture, never shipped; and widen the cumulative `count` accumulator from `int` to `long` in `SumHistogramPercentileFunction` / `AvgHistogramPercentileFunction` to clear the CodeQL `implicit-cast-in-compound-assignment` alerts (`count += value` silently narrowed a `long` bucket-count sum back to `int`, while `total` was already `long`). +* Clear three security alerts: bump the Airflow e2e mock's pinned `protobuf` `4.25.8` → `5.29.6` (with `opentelemetry-proto` `1.24.0` → `1.28.0`, whose `protobuf<5.0` cap was the blocker, and `grpcio` `1.62.2` → `1.63.2`, required because `opentelemetry-proto` `1.28.0`'s gRPC stubs call `unary_unary(_registered_method=...)`) to clear CVE-2026-0994 — a CI-only test fixture, never shipped; and widen the cumulative `count` accumulator from `int` to `long` in `SumHistogramPercentileFunction` / `AvgHistogramPercentileFunction` to clear the CodeQL `implicit-cast-in-compound-assignment` alerts (`count += value` silently narrowed a `long` bucket-count sum back to `int`, while `total` was already `long`). * Fix: continuous profiling policy validation now rejects a threshold / count of `0` to match the error messages and rover's `value >= threshold` trigger semantics (a `0` threshold would always trigger). CPU percent and HTTP error rate are tightened from `[0-100]` to `(0-100]`. * Fix wrong BanyanDB resource options in record data. * Align the default BanyanDB stage `segmentInterval` values so each coarser stage is an integer multiple of the finer one (`records` cold `3` → `4`, `metricsMinute` cold `5` → `6`, `metricsHour` warm `7` → `10` and cold `15` → `20`), keeping hot → warm → cold lifecycle migration on the cheap whole-segment fast path. diff --git a/test/e2e-v2/cases/airflow/mock/requirements-replay.txt b/test/e2e-v2/cases/airflow/mock/requirements-replay.txt index 8446cc888d55..8754b425e308 100644 --- a/test/e2e-v2/cases/airflow/mock/requirements-replay.txt +++ b/test/e2e-v2/cases/airflow/mock/requirements-replay.txt @@ -1,4 +1,4 @@ flask==3.1.3 -grpcio==1.62.2 +grpcio==1.63.2 protobuf==5.29.6 opentelemetry-proto==1.28.0