claude_cowork: add Claude Cowork OTel integration#19943
Conversation
✅ Elastic Docs Style Checker (Vale)No issues found on modified lines! The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
TL;DRThe failed Buildkite job ran against commit Remediation
Investigation detailsRoot Cause
owner:
github: elastic/security-service-integrations
type: elasticbut that same failed commit did not include a Evidence
Verification
What is this? | From workflow: PR Buildkite Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
🚀 Benchmarks reportTo see the full report comment with |
| "service.name": "cowork", | ||
| "service.version": "1.15962.1", | ||
| "session.id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", | ||
| "source": "config", |
There was a problem hiding this comment.
Severity: 🔵 Low confidence: low path: packages/claude_cowork/data_stream/events/_dev/test/pipeline/test-tool-decision.json:19
Dashboards aggregate on claude_cowork.events.decision_source, but tool_decision events emit the permission source as source; if the real attribute is source, the permission-source panels will render empty. Verify the vendor attribute name and align the dashboards or the fixtures.
Details
The permission-decisions, security-overview, session-timeline and tool-call-analysis dashboards bucket on claude_cowork.events.decision_source (with values user_temporary/user_permanent). However the only observable data for tool_decision events — this pipeline test fixture and the docker sample generator — emits the permission source under the attribute source (value config), which the pipeline namespaces to claude_cowork.events.source. Nothing in the pipeline populates decision_source. Note that config is listed as a decision_source value in fields.yml, suggesting the fixture may use the wrong key. Since the true Cowork OTel attribute name cannot be confirmed from the checkout, this is flagged low-confidence — but if the emitted attribute is source, the decision-source dashboard panels will always be empty.
Recommendation:
Confirm the attribute name Cowork actually emits for permission source. If it is source, either point the dashboards at the produced field or normalize it in the pipeline; if it is decision_source, update the fixtures/generator so the tests exercise it. For example, if the real attribute is decision_source, update the tool_decision fixture:
{
"decision": "accept",
"decision_source": "config"
}Alternatively, if the field is source, add a rename in the pipeline so dashboards resolve:
- rename:
tag: rename_source_to_decision_source
field: claude_cowork.events.source
target_field: claude_cowork.events.decision_source
ignore_missing: true
if: ctx.claude_cowork?.events?.event?.name == 'tool_decision'🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
| - configuration | ||
| type: | ||
| - change | ||
| mcp_server_connection: |
There was a problem hiding this comment.
Severity: 🟡 Medium confidence: medium path: packages/claude_cowork/data_stream/events/elasticsearch/ingest_pipeline/default.yml:81
The mcp_server_connection event path has no pipeline test fixture, so its field names (server_name, status, transport_type, error_code) are unverified even though the MCP Server Access dashboard depends on them; add a test-mcp-server-connection fixture.
Details
The pipeline categorizes an mcp_server_connection event (network/connection), and the MCP Server Access dashboard renders panels on claude_cowork.events.server_name, claude_cowork.events.status, claude_cowork.events.transport_type, and claude_cowork.events.error_code. None of the 7 pipeline fixtures nor the 8-event system-test generator emit an mcp_server_connection event, so the attribute names those panels rely on are never exercised against the pipeline. This is the same class of alignment gap previously raised for the permission-source panels: if the real vendor attribute differs from what fields.yml assumes, these connection panels render empty and no test would catch it.
Recommendation:
Add a pipeline fixture that drives the connection path, e.g. test-mcp-server-connection.json:
{
"events": [
{
"@timestamp": "2026-06-30T02:43:31.836681529Z",
"attributes": {
"event.name": "mcp_server_connection",
"server_name": "workspace",
"server_scope": "project",
"transport_type": "stdio",
"status": "connected",
"session.id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
},
"event_name": "mcp_server_connection",
"data_stream": { "type": "logs", "dataset": "claude_cowork.events.otel", "namespace": "default" }
}
]
}Commit the matching -expected.json so the connection field names are validated against the pipeline.
🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
| "includeIsRegex": false, | ||
| "excludeIsRegex": false | ||
| }, | ||
| "sourceField": "claude_cowork.events.to_mode" |
There was a problem hiding this comment.
Severity: 🔵 Low confidence: medium path: packages/claude_cowork/kibana/dashboard/claude_cowork-security-overview.json:707
The Security Overview "Permission Mode Changes" panel aggregates on claude_cowork.events.to_mode, a field fields.yml documents as reserved/not currently emitted, so the panel always renders empty; drop the panel (or the field) until the event is emitted.
Details
This panel's terms aggregation uses sourceField: claude_cowork.events.to_mode. In fields.yml, to_mode is declared with the description "Permission mode after a change (reserved for future use)" and the field comment states it is "reserved; not currently emitted by Cowork." No categorized event or fixture populates it. As shipped, this visualization is permanently empty, which is a visible quality issue on a headline dashboard rather than a forward-compatibility benefit.
Recommendation:
Remove the empty panel until permission_mode_changed events are actually emitted, or keep only fields that carry data today. If retained, gate it behind a note. Minimal change is to delete the panel referencing the reserved field:
{
"//": "Remove the 'Permission Mode Changes' Lens panel and its columns",
"//2": "that reference sourceField: claude_cowork.events.to_mode until the",
"//3": "permission_mode_changed event is emitted by Cowork."
}🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
5c34361 to
5987894
Compare
| - append: | ||
| tag: append_related_hosts | ||
| field: related.hosts | ||
| value: "{{{claude_cowork.events.server_name}}}" |
There was a problem hiding this comment.
Severity: 🔵 Low confidence: low path: packages/claude_cowork/data_stream/events/elasticsearch/ingest_pipeline/default.yml:331
related.hosts is populated from claude_cowork.events.server_name, but an MCP server name is a service identifier, not a host identifier; consider dropping this enrichment or moving it to a more appropriate ECS field.
Details
The append_related_hosts processor copies claude_cowork.events.server_name into related.hosts. server_name is the MCP server name from connection events (e.g. 'workspace'), which is a logical service/component identifier rather than a hostname, FQDN, or host alias. ECS related.hosts is intended for host identifiers used to correlate events to machines; mixing MCP server names into it dilutes host-based correlation and pivots. related.user (email + id) is correctly populated, so the correlation story does not depend on this line.
Recommendation:
Either drop the related.hosts enrichment for MCP server names, or keep the value only in the namespaced field. If host correlation is genuinely desired for MCP connections, gate it on a field that actually carries a host identifier. Example removing the mis-bucketed enrichment:
# Remove the following processor; claude_cowork.events.server_name is retained
# in the namespaced field and does not belong in ECS related.hosts.
# - append:
# tag: append_related_hosts
# field: related.hosts
# value: "{{{claude_cowork.events.server_name}}}"
# allow_duplicates: false
# if: ctx.claude_cowork?.events?.server_name != null🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
| |-------------|-------------| | ||
| | `events` | All Claude Cowork OTLP log events — tool executions, API requests, permission decisions, MCP connections, hooks, plugins, and session lifecycle. | | ||
|
|
||
| The integration processes these event types: |
There was a problem hiding this comment.
Severity: 🔵 Low confidence: medium path: packages/claude_cowork/_dev/build/docs/README.md:23
The README 'event types' table lists only 7 of the 16 events the pipeline categorizes; add the missing types (api_error, api_refusal, api_retries_exhausted, auth, permission_mode_changed, mcp_server_connection, hook_registered, plugin_installed, skill_activated) so the collected scope is accurate.
Details
The ingest pipeline's ecs_categorization processor assigns event.category/event.type for 16 distinct event_name values, but the 'The integration processes these event types' table documents only 7 (tool_result, tool_decision, api_request, user_prompt, hook_execution_start, hook_execution_complete, plugin_loaded). For a security-focused integration, an auditor reading this table would not know that authentication events (auth), API error/refusal events, permission mode changes, and MCP server connections are also collected and categorized. The table understates the actual collection scope.
Recommendation:
Extend the table to cover the remaining event types the pipeline categorizes, for example:
| Event | Description | ECS category |
|-------|-------------|--------------|
| `api_error` | Failed API call to Anthropic. | `web` |
| `api_refusal` | API request refused. | `web` |
| `api_retries_exhausted` | API call gave up after retries. | `web` |
| `auth` | Authentication event. | `authentication` |
| `permission_mode_changed` | Session permission mode change. | `configuration` |
| `mcp_server_connection` | MCP server connection state change. | `network` |
| `hook_registered` | Hook registered for an event. | `process` |
| `plugin_installed` | Plugin installed. | `package` |
| `skill_activated` | Skill activated. | `process` |🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
There was a problem hiding this comment.
@efd6 would event.category:api be a better fit for the api events than 'web'?
Collect Claude Cowork agentic telemetry via OpenTelemetry. Derived from the claude_code package with adaptations for Cowork-specific schema: non-interactive terminal, dotted MCP field names (mcp_server.name, mcp_tool.name), no tool_parameters/tool_input, deployment_mode and workspace.host_paths attributes. Test samples were collected from live sessions and sanitised.
|
✅ All changelog entries have the correct PR link. |
💚 Build Succeeded
History
cc @efd6 |
| value: success | ||
| if: ctx.claude_cowork?.events?.success == 'true' | ||
| - set: | ||
| tag: set_event_outcome_failure |
There was a problem hiding this comment.
Severity: 🟡 Medium confidence: high path: packages/claude_cowork/data_stream/events/elasticsearch/ingest_pipeline/default.yml:294
The pipeline's failure/error branches are untested: no fixture exercises event.outcome=failure (success='false', api_error, api_retries_exhausted) or the error/error_type/error_code fields the Tool Call Analysis dashboard aggregates on. Add at least one failure-path fixture (e.g. test-tool-result-failed and test-api-error).
Details
All seven pipeline fixtures are success-path only: every one sets success='true' or omits it, and none carries an api_error / api_refusal / api_retries_exhausted event or the error, error_type, or error_code attributes. As a result the set_event_outcome_failure processor (which fires on success=='false' or event.name in {api_error, api_retries_exhausted}), the web/error and web/denied categorization entries, and the set_event_reason enrichment are never exercised. The Tool Call Analysis dashboard has panels that aggregate on claude_cowork.events.error and claude_cowork.events.error_type, and the MCP Server Access dashboard aggregates on claude_cowork.events.error_code, so those field names are unverified by any test. The sibling claude_code package ships test-tool-result-failed and test-mcp-server-connection-failed fixtures for exactly this reason.
Recommendation:
Add a failure-path fixture so the failure-outcome branch and the error fields are covered, for example a tool_result with a failed outcome:
{
"events": [
{
"@timestamp": "2026-06-30T02:43:31.836681529Z",
"event_name": "tool_result",
"attributes": {
"event.name": "tool_result",
"tool_name": "Bash",
"success": "false",
"error": "command failed",
"error_type": "ShellError",
"error_code": "ENOENT",
"duration_ms": 12
}
}
]
}and assert in the matching -expected.json that event.outcome resolves to "failure" and event.reason is copied from claude_cowork.events.error. Consider a second fixture for an api_error event to cover the web/error categorization path.
🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
Review summaryIssues found across the latest commits [a476567](https://github.com/elastic/integrations/commit/a476567d491aa815e7f75bdc518c18730b81a3dd) — 1 medium
Issues found across earlier commits [5987894](https://github.com/elastic/integrations/commit/5987894949d7c5ab8f124f90731eba2fd5061234) — 1 low
Issues found across earlier commits [5c34361](https://github.com/elastic/integrations/commit/5c3436111cd3e07622802ece01a85ba4617b309c) — 1 low
Issues found across earlier commits [d332a1d](https://github.com/elastic/integrations/commit/d332a1daf593c45da7c6b25a669d5ad40d3c78d8) — 1 medium, 1 low
Issues found across earlier commits [a0dc396](https://github.com/elastic/integrations/commit/a0dc3969a557d520f5993b1f4230f128c09cee28) — 1 low
🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
|
|
Hello, I have a question regarding this and claude code integration. |
I think the main reason we are doing OTEL -> ECS is due to specific business requirements. @jamiehynds could you clarify here ? As for the naming, I think it's because it's using otel receivers as the input, but will let @efd6 clarify when he's back. |
Hey @ishleenk17 - our goal with the Claude integration is to spare users from having to manually add component templates, ingest pipelines, etc. to normalize to ECS and stay aligned with the schema our security users rely on. Our InfoSec team published a blog post on onboarding Claude data via OTel, but several users have asked for a more seamless experience via an OOTB integration. We should absolutely consider our o11y users, who may want to onboard this data as native OTel. Is there a path to supporting that within the same integration, or would we need to split it between an integration and a content pack? |
Proposed commit message
Note
Again, sorry for the size.
Checklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Related issues
Screenshots