Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#### Project

* Extend the `GET /inspect/entities` admin API to inspect a metric persisted by **any** OAP, even one this node does not define locally. When the metric is unknown to the local registry, the caller supplies `valueColumn` + `valueType` and the storage backend resolves the physical index/table/group from its own running config (no DB schema/table-metadata read): ES uses the merged `metrics-all` index + `metric_table` discriminator, JDBC probes the node's function tables by the `table_name` discriminator, and BanyanDB synthesizes a read-only measure schema. Scope is no longer required — the `entity_id` is decoded structurally (service / 2nd-level / relations) with a generic `name` leaf. Locally-defined metrics keep the exact field names, scope, and `mqeEntity` as before.
* Add the `POST /inspect/values` admin API — read the value series of a metric persisted by **another** OAP (one this node does not define locally) by supplying its `{valueColumn, valueType}`. The real MQE engine runs over a request-scoped `InspectQueryContext` overlay (provide-if-absent — the local catalog always wins) that makes the foreign metric look registered to every read path: `ValueColumnMetadata` resolves its value column / type / scope, and the storage location registries resolve where it lives (`MetadataRegistry` synthesizes a BanyanDB measure schema, `IndexController` resolves the ES `metrics-all` index, `TableHelper` probes the JDBC function tables), so the read returns the native MQE `ExpressionResult` with no per-DAO special-casing. Admin-only (a forced read this OAP cannot validate); not mirrored onto the public REST / GraphQL surface. See the [Inspect API](../setup/backend/admin-api/inspect.md).
* Remove the always-on alarm-to-event conversion (`EventHookCallback`). A triggered alarm is no longer synthesized into the events pipeline as an `Alarm`/`AlarmRecovery` event; events now originate only from real event sources (agents, SkyWalking CLI, Kubernetes Event Exporter). Alarms remain available through the alarm store (`getAlarm`/`queryAlarms`) and the configured alarm hooks. This drops a documented "Known Event" and removes 1-2 synthetic event records per alarm fire.
* **New `queryAlarms` GraphQL query — entity / layer / rule filters for alarms.** Adds
a comprehensive alarm query API alongside the legacy `getAlarm`. The new
Expand Down
85 changes: 79 additions & 6 deletions docs/en/setup/backend/admin-api/inspect.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Inspect API

The Inspect API lives on `admin-server` and exposes two browse endpoints that
let operators answer two questions without writing exploratory MQE:
The Inspect API lives on `admin-server` and exposes three endpoints that
let operators answer three questions without writing exploratory MQE:

1. *Which metrics has OAP registered, and at what downsampling?*
2. *For metric `X` in time range `T`, which entities currently hold values?*
1. *Which metrics has OAP registered, and at what downsampling?* — `GET /inspect/metrics`
2. *For metric `X` in time range `T`, which entities currently hold values?* — `GET /inspect/entities`
3. *For metric `X` + entity `E`, what are the values?* — `POST /inspect/values`

For a locally-defined metric, the output of (2) carries a ready-to-paste
`mqeEntity` payload, so the follow-up MQE call against the public GraphQL
`execExpression` mutation is copy-paste from the inspect response. A metric
persisted by **another OAP** that this node does not define can also be
inspected with caller-supplied metadata (see
[Foreign metrics](#foreign-metrics-not-defined-on-this-oap)).
inspected with caller-supplied metadata — both its entities (2) and its values
(3) — see [Foreign metrics](#foreign-metrics-not-defined-on-this-oap).

## Enabling

Expand Down Expand Up @@ -227,6 +228,73 @@ curl 'http://oap-admin:17128/inspect/entities?metric=meter_custom_x&valueColumn=
}
```

### `POST /inspect/values`

Reads the **values** of a metric this OAP does not define locally, by running the
real MQE engine over caller-supplied metadata. Where `GET /inspect/entities`
answers *which entities hold values*, this answers *what those values are* — the
native MQE `ExpressionResult` (the same shape the UI renders for a catalog metric),
for a metric that is otherwise foreign to this node.

Because it trusts caller-supplied metadata and forces a read of a metric this OAP
cannot validate, it is **admin-only** (it never mirrors onto the public REST /
GraphQL surface) and takes a request **body**: an MQE expression plus one metadata
entry per foreign metric the expression references.

Request body (`application/json`):

| Field | Required | Description |
|-------|----------|-------------|
| `expression` | yes | The MQE expression to evaluate — a single foreign metric name, or an expression combining foreign and/or catalog metrics. |
| `entity` | yes | The MQE query entity; its `scope` binds every foreign metric. e.g. `{ "scope": "Service", "serviceName": "X", "normal": true }` (use `serviceInstanceName` / `endpointName` for the deeper scopes). |
| `start` / `end` | yes | Time range, same format as [`/inspect/entities`](#get-inspectentities). |
| `step` | yes | One of `MINUTE` / `HOUR` / `DAY`. |
| `foreignMetrics` | yes | One entry per metric in `expression` that this OAP does not define: `{ "name": "...", "valueColumn": "value", "valueType": "LONG" }`. `valueColumn` is the post-override physical column; `valueType` is one of `LONG` / `INT` / `DOUBLE` / `LABELED`. A locally-defined metric must **not** be listed here (query it via the public GraphQL `execExpression`). |

The metadata is overlaid **provide-if-absent** (the local catalog always wins) onto
the same registries the engine already consults, so a foreign metric looks registered
for the duration of the request: `ValueColumnMetadata` resolves its value column / type
/ scope, and the storage location registries resolve its index / table / measure exactly
as described in [Foreign metrics](#foreign-metrics-not-defined-on-this-oap). The overlay
is request-scoped to the calling thread and removed when the read completes; the public
query path never sets it.

Only scalar (`LONG` / `INT` / `DOUBLE`) and labeled (best-effort) value series are
supported. An expression that resolves to `top_n` / records / heatmaps needs a local
model and surfaces as an error. Under ES `logicSharding=true` a foreign value read is
unsupported (the physical index derives from the metric's stream class), returning `500`.

Example — read the value series of `meter_custom_x`, defined on another OAP:

```bash
curl -X POST 'http://oap-admin:17128/inspect/values' \
-H 'Content-Type: application/json' \
-d '{
"expression": "meter_custom_x",
"entity": { "scope": "Service", "serviceName": "payment", "normal": true },
"start": "2026-05-10 1230", "end": "2026-05-10 1240", "step": "MINUTE",
"foreignMetrics": [
{ "name": "meter_custom_x", "valueColumn": "value", "valueType": "LONG" }
]
}'
```

```json
{
"type": "TIME_SERIES_VALUES",
"results": [
{
"metric": { "labels": [] },
"values": [
{ "id": "1778416200000", "value": "42" },
{ "id": "1778416260000", "value": "42" }
]
}
],
"error": null
}
```

## Discovering the OAP REST URL for the MQE follow-up

To keep the surface minimal, the inspect API does not introduce a separate
Expand All @@ -248,6 +316,11 @@ session start is enough.
| 400 | `{"error":"metric type SAMPLED_RECORD is out of scope for /inspect/entities"}` | Metric is `SAMPLED_RECORD`. |
| 400 | `{"error":"process scope is out of scope"}` | Scope is `Process` / `ProcessRelation`. |
| 400 | `{"error":"limit must be between 1 and 300"}` | `limit` out of range. |
| 400 | `{"error":"foreignMetrics is required; a locally-defined metric should be queried via the public GraphQL execExpression"}` | `POST /inspect/values` body had no `foreignMetrics`. |
| 400 | `{"error":"metric foo is defined locally; query it via the GraphQL execExpression and drop it from foreignMetrics"}` | A `foreignMetrics` entry names a metric this OAP already defines. |
| 400 | `{"error":"valueColumn is invalid: …"}` | A `foreignMetrics` `valueColumn` is not a bare identifier. |
| 400 | `{"error":"<MQE error>"}` | `POST /inspect/values` expression resolved to an unsupported shape (e.g. `top_n` / record / heatmap) for a foreign metric. |
| 500 | `{"error":"<storage error>"}` | A wrong `valueColumn` / `valueType`, or ES `logicSharding=true`, surfaced at the storage layer during a value read. |

## Limits

Expand Down
3 changes: 3 additions & 0 deletions docs/en/setup/backend/admin-api/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ Common operations:
- `GET /inspect/metrics` — metric catalog with type / scope / supported downsamplings.
- `GET /inspect/entities?metric=&start=&end=&step=` — capped (≤300) list of
entities holding values, decoded into MQE-ready form.
- `POST /inspect/values` — read the value series of a metric this OAP does not
define locally (foreign metric), by supplying its `{valueColumn, valueType}`;
returns the native MQE result.

Operator reference: [Inspect API](inspect.md).

Expand Down
6 changes: 6 additions & 0 deletions oap-server/server-admin/inspect/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@
<artifactId>admin-server</artifactId>
<version>${project.version}</version>
</dependency>
<!-- MQEExecutor: run the MQE engine synchronously for the foreign-metric value path. -->
<dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>query-graphql-plugin</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,15 @@

package org.apache.skywalking.oap.server.admin.inspect.handler;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.linecorp.armeria.common.HttpData;
import com.linecorp.armeria.common.HttpResponse;
import com.linecorp.armeria.common.HttpStatus;
import com.linecorp.armeria.common.MediaType;
import com.linecorp.armeria.server.annotation.Blocking;
import com.linecorp.armeria.server.annotation.Get;
import com.linecorp.armeria.server.annotation.Param;
import com.linecorp.armeria.server.annotation.Post;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
Expand All @@ -36,7 +40,9 @@
import java.util.regex.PatternSyntaxException;
import java.util.stream.Collectors;
import lombok.extern.slf4j.Slf4j;
import org.apache.skywalking.oap.query.graphql.mqe.rt.MQEExecutor;
import org.apache.skywalking.oap.server.admin.inspect.decoder.EntityDecoder;
import org.apache.skywalking.oap.server.admin.inspect.request.InspectValuesRequest;
import org.apache.skywalking.oap.server.admin.inspect.response.EntitiesResponse;
import org.apache.skywalking.oap.server.admin.inspect.response.EntityRow;
import org.apache.skywalking.oap.server.admin.inspect.response.ErrorResponse;
Expand All @@ -51,10 +57,12 @@
import org.apache.skywalking.oap.server.core.query.enumeration.Scope;
import org.apache.skywalking.oap.server.core.query.enumeration.Step;
import org.apache.skywalking.oap.server.core.query.input.Duration;
import org.apache.skywalking.oap.server.core.query.mqe.ExpressionResult;
import org.apache.skywalking.oap.server.core.query.type.Service;
import org.apache.skywalking.oap.server.core.source.DefaultScopeDefine;
import org.apache.skywalking.oap.server.core.storage.StorageModule;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ForeignMetricMeta;
import org.apache.skywalking.oap.server.core.storage.annotation.ValueColumnMetadata;
import org.apache.skywalking.oap.server.core.storage.model.IModelManager;
import org.apache.skywalking.oap.server.core.storage.model.Model;
Expand All @@ -77,6 +85,9 @@ public class InspectRestHandler {
/** Value types a caller may declare for a foreign (locally-undefined) metric. */
private static final Set<String> ACCEPTED_FOREIGN_VALUE_TYPES =
Set.of("LONG", "INT", "DOUBLE", "LABELED");
/** A value column is interpolated into JDBC SQL on the read path; restrict it to a bare identifier. */
private static final Pattern VALUE_COLUMN_PATTERN = Pattern.compile("^[A-Za-z_][A-Za-z0-9_]*$");
private static final ObjectMapper VALUES_MAPPER = new ObjectMapper();

private final ModuleManager moduleManager;

Expand Down Expand Up @@ -398,6 +409,109 @@ private HttpResponse listForeignEntities(final String metric,
return HttpResponse.ofJson(MediaType.JSON_UTF_8, body);
}

/**
* Read the VALUES of metric(s) persisted by another OAP that this node does not define. The body
* carries an MQE expression plus, in {@code foreignMetrics}, the metadata for each foreign metric
* it references (value column + type). The same MQE engine the public GraphQL surface uses is run
* synchronously with that metadata overlaid PROVIDE-IF-ABSENT (the catalog always wins), returning
* the native {@code ExpressionResult}. Marked {@code @Blocking}: the eval + storage read are
* synchronous and must not run on the event loop. Only scalar (LONG/INT/DOUBLE) and labeled
* (best-effort) value series are supported; {@code top_n} and record/heatmap shapes need a local
* model and surface as an error.
*/
@Blocking
@Post("/inspect/values")
public HttpResponse listValues(final HttpData requestBody) {
final InspectValuesRequest req;
try {
req = VALUES_MAPPER.readValue(requestBody.toStringUtf8(), InspectValuesRequest.class);
} catch (Exception e) {
return error(HttpStatus.BAD_REQUEST, "invalid request body: " + e.getMessage());
}
if (req.getExpression() == null || req.getExpression().isBlank()) {
return error(HttpStatus.BAD_REQUEST, "expression is required");
}
if (req.getEntity() == null || req.getEntity().getScope() == null) {
return error(HttpStatus.BAD_REQUEST, "entity (with a scope) is required");
}
// The scope alone is not enough: the entity must carry the name fields its scope needs
// (e.g. serviceName + normal for Service), or buildId() yields a bogus id that the read
// silently misses — surface that as a 400 instead of an empty 200.
if (!req.getEntity().isValid()) {
return error(HttpStatus.BAD_REQUEST,
"entity is missing required fields for scope " + req.getEntity().getScope()
+ " (Service needs serviceName + normal; ServiceInstance/Endpoint also need "
+ "serviceInstanceName / endpointName)");
}
if (req.getForeignMetrics() == null || req.getForeignMetrics().isEmpty()) {
return error(HttpStatus.BAD_REQUEST,
"foreignMetrics is required; a locally-defined metric should be queried via the public "
+ "GraphQL execExpression");
}

final Step step;
try {
step = Step.valueOf(String.valueOf(req.getStep()).toUpperCase());
} catch (Exception e) {
return error(HttpStatus.BAD_REQUEST,
"step must be one of MINUTE / HOUR / DAY (got " + req.getStep() + ")");
}
if (step == Step.SECOND) {
return error(HttpStatus.BAD_REQUEST, "step must be one of MINUTE / HOUR / DAY (got SECOND)");
}

final int scopeId = req.getEntity().getScope().getScopeId();
final List<ForeignMetricMeta> foreign = new ArrayList<>();
for (final InspectValuesRequest.ForeignMetricInput fm : req.getForeignMetrics()) {
if (fm.getName() == null || fm.getName().isBlank()) {
return error(HttpStatus.BAD_REQUEST, "each foreignMetrics entry needs a name");
}
if (fm.getValueColumn() == null || !VALUE_COLUMN_PATTERN.matcher(fm.getValueColumn()).matches()) {
return error(HttpStatus.BAD_REQUEST, "valueColumn is invalid: " + fm.getValueColumn());
}
final String type = fm.getValueType() == null ? "" : fm.getValueType().toUpperCase();
if (!ACCEPTED_FOREIGN_VALUE_TYPES.contains(type)) {
return error(HttpStatus.BAD_REQUEST,
"valueType must be one of LONG / INT / DOUBLE / LABELED (got " + fm.getValueType() + ")");
}
if (ValueColumnMetadata.INSTANCE.readValueColumnDefinition(fm.getName()).isPresent()) {
return error(HttpStatus.BAD_REQUEST,
"metric " + fm.getName() + " is defined locally; query it via the GraphQL "
+ "execExpression and drop it from foreignMetrics");
}
foreign.add(new ForeignMetricMeta(fm.getName(), fm.getValueColumn(), type, scopeId, 0));
}

final Duration duration = new Duration();
duration.setStart(req.getStart());
duration.setEnd(req.getEnd());
duration.setStep(step);
try {
duration.getStartTimeBucket();
duration.getEndTimeBucket();
} catch (IllegalArgumentException | UnexpectedException e) {
return error(HttpStatus.BAD_REQUEST,
"start / end must follow the step's date format (DAY: yyyy-MM-dd, HOUR: yyyy-MM-dd HH, "
+ "MINUTE: yyyy-MM-dd HHmm): " + e.getMessage());
}

final ExpressionResult result;
try {
result = new MQEExecutor(moduleManager)
.execute(req.getExpression(), req.getEntity(), duration, foreign);
} catch (Exception e) {
// Optimistic read: a foreign top_n / record shape, or a wrong valueColumn/valueType,
// surfaces here rather than as garbage.
log.warn("inspect values execute failed for expression={}", req.getExpression(), e);
return error(HttpStatus.INTERNAL_SERVER_ERROR, e.getMessage());
}
if (result.getError() != null) {
// e.g. an unsupported shape resolved to UNKNOWN — never put that on the wire as a 200.
return error(HttpStatus.BAD_REQUEST, result.getError());
}
return HttpResponse.ofJson(MediaType.JSON_UTF_8, result);
}

/**
* Mirror of the {@code /inspect/entities} type acceptance set. Kept in one place so
* the {@code mqeQueryable=true} filter on {@code /inspect/metrics} and the actual
Expand Down
Loading
Loading