logs rendering and rbac enhancements#1040
Merged
Merged
Conversation
And add the nodename as a log entry prefix.
The `o[mx] node|cluster|<obj> logs [-f]` commands now buffer 500ms of sse feeding before sort and print the lines older than 500ms.
* Use millisecond precision in the logreader renderer * Add a widget to better highlight log origin changes. Example: # ox svc1 logs --filter ORCHESTRATION_ID=e770998a-989c-43c4-9020-8797658d5736 2026-06-03T15:06:51.041570+02:00 INF ⣀⡇ dev2n1: daemon: imon: svc1: change global expect none -> stopped 2026-06-03T15:06:51.530054+02:00 INF ⣸⡀ dev2n3: daemon: imon: svc1: change global expect none -> stopped 2026-06-03T15:06:51.541098+02:00 INF ⣇⡀ dev2n2: daemon: imon: svc1: change global expect none -> stopped
* Add --selector and --rid to `om daemon ps` * Add the corresponding api handler query parameters
The context has a Namespace field, that the login must pass as the scope of the requested token.
The filtering now behaves like this (verified by go tests): Role Scope Grants Filtered Grants --- --- --- --- nil nil root root nil nil admin admin nil nil admin:ns1,admin:ns2,guest:ns3 admin:ns1,admin:ns2,guest:ns3 Role Scope Grants Filtered Grants --- --- --- --- admin ns2 root admin:ns2 admin ns2 admin admin:ns2 guest ns2 admin guest:ns2 admin ns2 admin:ns1,admin:ns2,guest:ns3 admin:ns2 admin ns3 admin:ns1,admin:ns2,guest:ns3 guest ns2 admin:ns1,admin:ns2,guest:ns3 guest:ns2 Role Scope Grants Filtered Grants --- --- --- --- root nil root root admin nil root admin root nil admin admin nil admin admin guest nil admin guest root nil admin:ns1,admin:ns2,guest:ns3 admin nil admin:ns1,admin:ns2,guest:ns3 admin:ns1,admin:ns2 guest nil admin:ns1,admin:ns2,guest:ns3 guest:ns3 Role Scope Grants Filtered Grants --- --- --- --- nil ns1 root admin:ns1 nil ns1 guest guest:ns1 nil ns1 admin:ns1,admin:ns2,guest:ns3 admin:ns1 nil ns3 admin:ns1,admin:ns2,guest:ns3 guest:ns3
The "admin" grant grants the "admin", "operator" and "guest" roles to all namespaces. The "operator" grant grants the "operator" and "guest" roles to all namespaces. The "guest" grant grants the "guest" role to all namespaces.
* Allow GET /api/node/info to rbac.RoleGuest, rbac.RoleOperator,
rbac.RoleAdmin, rbac.RoleRoot, rbac.RoleJoin, rbac.RoleLeave
* Add grants.AssertRoleOn(scope, role...) to wrap strict role X
check on scope and the broader grant X check in a single call.
e.g. grants.AssertRoleOn("ns1", RoleGuest) return true if
grants contain the "guest" grant but not "guest:ns1", where
grants.HasRoleOn(scope, roles...) returns false.
* Unexport the filterRole, filterScope, filterGrant rbac funcs.
Only used by the local FilterGrantStrings()
* Small optimization for rbac.HasRoleOn(scope, role...), no
functional change.
* Better FilterGrantStrings() func doc, moving examples in the
func doc.
* Move the resource.T.ApplyPGChain calls to resource.Start, out of the selected few drivers' Start() func. * Add backward compat "pg update" too
Declared but not included in the ccfg and node kw stores.
key.ParseStrict() codepath did not allow dots in the scope, breaking at least the "config validate" command on such keywords. Fix and add a test to avoid regression. Example: $ om svc1 config val LEVEL PATH DRIVER KEY KIND COMMENT warning svc1 disk.hp3par disk#0.array@n1.acme.com unknown keyword keyword does not exist: "disk#0.array@n1.acme.com"
And use "2 * sync_period" as the default.
Do not honor `<rid>.disable=false` in this case. With: [DEFAULT] disable = true [fs#1] disable = false Wrong behaviour before this patch: ├ disk#2 ..D../.. ├ disk#3 ..D../.. └ fs#1 ........ Correct behaviour: ├ disk#2 ..D../.. ├ disk#3 ..D../.. └ fs#1 ..D../..
Example: $ ox restarts instance pg update --node dev2n1 OBJECT NODE SID restarts dev2n1 33445950-9f96-4eb7-ac13-825a35d14d91 $ ox node logs --filter SID=33445950-9f96-4eb7-ac13-825a35d14d91 2026-06-05T18:19:42.604786+02:00 INF ⣇⡀ dev2n1: instance: restarts: >>> do pg_update [/usr/bin/om restarts instance pg update] (origin daemon/api, sid 33445950-9f96-4eb7-ac13-825a35d14d91) 2026-06-05T18:19:42.681723+02:00 INF ⣇⡀ dev2n1: instance: restarts: app#foo1: applied pg /opensvc.slice/opensvc-svc.restarts.slice/opensvc-svc.restarts-app.foo1.slice: cpu_shares=2002 2026-06-05T18:19:42.821157+02:00 INF ⣇⡀ dev2n1: instance: restarts: <<< done pg_update [/usr/bin/om restarts instance pg update] in 215.837943ms, instance status is now up
* add eid for per-pid unique identification * fix sid to be inherited by all om commands forked from a om command, or by remote exec via api * group orchestration, exec and session ids in the xsession pkg * rename the `requester_sid` query param to `session_id`
* Rename the ERROR column to STATUS to be less alarming
* Write "accepted" as a STATUS when the submit is accepted
* Write "error: <api problem detail>" when the submit errs
* Less verbose errors in imon.SetInstanceMonitor
* Factorize objectaction.DoAsync error handling code
Example:
root@dev2n1:~/dev/om3# bin/om svc* stop
OBJECT ORCHESTRATION_ID STATUS
svc11 00000000-0000-0000-0000-000000000000 error: orchestration 8d961f31-a28e-4237-a280-27c995e7ed49 (>stopped) already in progress
svc1 00000000-0000-0000-0000-000000000000 error: timeout publishing instance.MonitorUpdate{CandidateOrchestrationID=64140f28-9b1a-4740-bbe6-d392034e973d GlobalExpect=stopped}
svc0 9e8b59b2-be45-4f1b-8c44-333d5f90eaaf accepted
svc111 40035c3b-bcc2-49cc-bec7-30c6c649e7c7 accepted
Error: actions rejected: 2
* Set the ExecId and SessionId fields of the Exec, ExecSuccess, ExecFailed, Title messages. * Log the execs at info level if the title is not empty * Pass a title on freeze, unfreeze, drain actions
When "--node <peer>" is set, the action is submitted on the peer api and the session id is returned. --wait waits for the session id to terminate (ExecSuccess or ExecFailed eventa) until --time is out. --watch enters the monitoring loop.
core/resource/resource.go:1361:10: assignment copies lock value to t.log: github.com/opensvc/om3/v3/util/plog.Logger contains sync.RWMutex Store the log as a pointer to avoid copying the RWMutex.
util/pubsub/main.go:452:40: literal copies lock value from *b.log: github.com/opensvc/om3/v3/util/plog.Logger contains sync.RWMutex Store the log as a pointer to avoid copying the RWMutex.
drivers/resdiskhp3par/main.go:1310:41: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
drivers/resdiskhp3par/main.go:1311:41: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
core/object/actor.go:296:30: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
core/object/actor.go:300:28: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
core/object/node_stonith.go:30:36: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
Use `key.New(section, option)` instead of `key.T{section, option}`
drivers/resdiskhp3par/main.go:313:2: unreachable code drivers/resdiskhp3par/main.go:767:2: unreachable code drivers/resdiskhp3par/main.go:797:2: unreachable code drivers/resdiskhp3par/main.go:1295:2: unreachable code
drivers/rescontainerocibase/main.go:92:3: struct field tag `json:sysctl` not compatible with reflect.StructTag.Get: bad syntax for struct tag value core/object/node_collector.go:33:3: struct field text has json tag but is not exported
drivers/rescontainerocibase/executor.go:71:2: the cancel function is not used on all paths (possible context leak) drivers/rescontainerocibase/executor.go:74:3: this return statement may be reached without using the cancel var defined on line 71
The DEFAULT section listed keywords not allowed for the kind. e.g. The `l` keyword was listed in the `svc` documentation, ignoring its Usr,Sec restriction.
the "admin" grant permits to get the "admin:ns1" claim on auth.
The root, admin, operator and guest grants give visibility on all namespaces.
"array#a.b.c.api@n1.opensvc.com" must parse as: Section: "array#a.b.c" Option: "api@n1.opensvc.com"
And filter the dataset with the requester grants. The "guest" grant on a namespace is sufficient to read metrics of the actor objects' cgroups. The "root" grant is required to access the daemon metrics. This patch add a `reqh2.NewUDSClient() *http.Client` so the GET /api/node/name/<nodename>/metrics can proxy for GET /metrics using a UDS client with `.Do(req)` accessible (the client.New() provides only api.ClientWithResponse interface clients, which don't expose the `.Do(req)` functio).
Resources now store the logger as a pointer, so tests can trigger
plog.Logger calls with the resource log unallocated... causing
this kind of panics:
--- FAIL: TestT_Info (0.34s)
--- FAIL: TestT_Info/from_zero_app (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0xce9427]
goroutine 29 [running]:
testing.tRunner.func1.2({0x13c3ca0, 0x1d64290})
/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1737 +0x35e
panic({0x13c3ca0?, 0x1d64290?})
/opt/hostedtoolcache/go/1.24.13/x64/src/runtime/panic.go:792 +0x132
github.com/opensvc/om3/v3/util/plog.(*Logger).Msgf(...)
/home/runner/work/om3/om3/util/plog/main.go:120
github.com/opensvc/om3/v3/util/plog.(*Logger).Levelf(0x0, 0xff, {0x152f927?, 0x15299c8?}, {0xc00019f430, 0x1, 0x1})
/home/runner/work/om3/om3/util/plog/main.go:144 +0x47
github.com/opensvc/om3/v3/util/plog.(*Logger).Tracef(...)
/home/runner/work/om3/om3/util/plog/main.go:132
github.com/opensvc/om3/v3/drivers/resapp.(*T).CmdArgs(0xc00019f8f0?, {0x169e288?, 0xc00024f880?}, {0x0?, 0x0?}, {0x15135ab?, 0x4?})
/home/runner/work/om3/om3/drivers/resapp/unix.go:292 +0x5ad
Test nilness to transparently ignore logging when the logger is
not allocated.
The register is done on manager Start, so test doing multiple
start-stop cycles caused panics like:
panic: duplicate metrics collector registration attempted
goroutine 475 [running]:
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0x48ab160, {0xc0012202a0?, 0xffffffffffffffff?, 0x112?})
/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.22.0/prometheus/registry.go:406 +0x65
github.com/prometheus/client_golang/prometheus.MustRegister(...)
/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.22.0/prometheus/registry.go:177
github.com/opensvc/om3/v3/daemon/pgmetrics.(*Manager).registerMetrics(0x39ae8e0?)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request resolves several identified issues and introduces enhancements to improve functionality and stability.
Key changes include:
Fixed RBAC-related grant issues for various roles and API scopes.
Clarify tracing with sessionID,ExecID...
Fix for unregistering Prometheus
pgmetriccollectors to avoid duplicate registration panics caused during multiple manager start-stop cycles.Enhanced testing for
plog.Loggerpointer nilness to improve resiliency and prevent panics.Filtered RBAC datasets when exposing
/metricsendpoint.Addressed several
go veterrors and code simplifications.Added support for
--wait,--watch, and--timefor remote actions.Clarified error reporting and updated documentation for visibility and usability.