Skip to content

logs rendering and rbac enhancements#1040

Merged
cgalibern merged 36 commits into
opensvc:mainfrom
cgalibern:cva-main-pr
Jun 13, 2026
Merged

logs rendering and rbac enhancements#1040
cgalibern merged 36 commits into
opensvc:mainfrom
cgalibern:cva-main-pr

Conversation

@cgalibern

Copy link
Copy Markdown
Contributor

Description

This pull request resolves several identified issues and introduces enhancements to improve functionality and stability.
Key changes include:

  • Fixed RBAC-related grant issues for various roles and API scopes.

  • Clarify tracing with sessionID,ExecID...

  • Fix for unregistering Prometheus pgmetric collectors to avoid duplicate registration panics caused during multiple manager start-stop cycles.

  • Enhanced testing for plog.Logger pointer nilness to improve resiliency and prevent panics.

  • Filtered RBAC datasets when exposing /metrics endpoint.

  • Addressed several go vet errors and code simplifications.

  • Added support for --wait, --watch, and --time for remote actions.

  • Clarified error reporting and updated documentation for visibility and usability.

cvaroqui and others added 30 commits June 11, 2026 08:09
And add the nodename as a log entry prefix.
The `o[mx] node|cluster|<obj> logs [-f]` commands now buffer
500ms of sse feeding before sort and print the lines older than
500ms.
* Use millisecond precision in the logreader renderer
* Add a widget to better highlight log origin changes.

Example:

	# ox svc1 logs --filter ORCHESTRATION_ID=e770998a-989c-43c4-9020-8797658d5736
	2026-06-03T15:06:51.041570+02:00 INF ⣀⡇ dev2n1: daemon: imon: svc1: change global expect none -> stopped
	2026-06-03T15:06:51.530054+02:00 INF ⣸⡀ dev2n3: daemon: imon: svc1: change global expect none -> stopped
	2026-06-03T15:06:51.541098+02:00 INF ⣇⡀ dev2n2: daemon: imon: svc1: change global expect none -> stopped
* Add --selector and --rid to `om daemon ps`
* Add the corresponding api handler query parameters
The context has a Namespace field, that the login must pass as the
scope of the requested token.
The filtering now behaves like this (verified by go tests):

	Role       Scope     Grants                           Filtered Grants
	---        ---       ---                              ---
	nil        nil       root                             root
	nil        nil       admin                            admin
	nil        nil       admin:ns1,admin:ns2,guest:ns3    admin:ns1,admin:ns2,guest:ns3

	Role       Scope     Grants                           Filtered Grants
	---        ---       ---                              ---
	admin      ns2       root                             admin:ns2
	admin      ns2       admin                            admin:ns2
	guest      ns2       admin                            guest:ns2
	admin      ns2       admin:ns1,admin:ns2,guest:ns3    admin:ns2
	admin      ns3       admin:ns1,admin:ns2,guest:ns3
	guest      ns2       admin:ns1,admin:ns2,guest:ns3    guest:ns2

	Role       Scope     Grants                           Filtered Grants
	---        ---       ---                              ---
	root       nil       root                             root
	admin      nil       root                             admin
	root       nil       admin
	admin      nil       admin                            admin
	guest      nil       admin                            guest
	root       nil       admin:ns1,admin:ns2,guest:ns3
	admin      nil       admin:ns1,admin:ns2,guest:ns3    admin:ns1,admin:ns2
	guest      nil       admin:ns1,admin:ns2,guest:ns3    guest:ns3

	Role       Scope     Grants                           Filtered Grants
	---        ---       ---                              ---
	nil        ns1       root                             admin:ns1
	nil        ns1       guest                            guest:ns1
	nil        ns1       admin:ns1,admin:ns2,guest:ns3    admin:ns1
	nil        ns3       admin:ns1,admin:ns2,guest:ns3    guest:ns3
The "admin" grant grants the "admin", "operator" and "guest" roles
to all namespaces.

The "operator" grant grants the "operator" and "guest" roles to
all namespaces.

The "guest" grant grants the "guest" role to all namespaces.
* Allow GET /api/node/info to rbac.RoleGuest, rbac.RoleOperator,
  rbac.RoleAdmin, rbac.RoleRoot, rbac.RoleJoin, rbac.RoleLeave

* Add grants.AssertRoleOn(scope, role...) to wrap strict role X
  check on scope and the broader grant X check in a single call.
  e.g. grants.AssertRoleOn("ns1", RoleGuest) return true if
  grants contain the "guest" grant but not "guest:ns1", where
  grants.HasRoleOn(scope, roles...) returns false.

* Unexport the filterRole, filterScope, filterGrant rbac funcs.
  Only used by the local FilterGrantStrings()

* Small optimization for rbac.HasRoleOn(scope, role...), no
  functional change.

* Better FilterGrantStrings() func doc, moving examples in the
  func doc.
* Move the resource.T.ApplyPGChain calls to resource.Start, out
  of the selected few drivers' Start() func.

* Add backward compat "pg update" too
Declared but not included in the ccfg and node kw stores.
key.ParseStrict() codepath did not allow dots in the scope,
breaking at least the "config validate" command on such keywords.

Fix and add a test to avoid regression.

Example:

	$ om svc1 config val
	LEVEL    PATH  DRIVER       KEY                        KIND             COMMENT
	warning  svc1  disk.hp3par  disk#0.array@n1.acme.com   unknown keyword  keyword does not exist: "disk#0.array@n1.acme.com"
And use "2 * sync_period" as the default.
Do not honor `<rid>.disable=false` in this case.

With:

	[DEFAULT]
	disable = true

	[fs#1]
	disable = false

Wrong behaviour before this patch:

	      ├ disk#2          ..D../..
	      ├ disk#3          ..D../..
	      └ fs#1            ........

Correct behaviour:

	      ├ disk#2          ..D../..
	      ├ disk#3          ..D../..
	      └ fs#1            ..D../..
Example:

	$ ox restarts instance pg update  --node dev2n1
	OBJECT    NODE    SID
	restarts  dev2n1  33445950-9f96-4eb7-ac13-825a35d14d91

	$ ox node logs --filter SID=33445950-9f96-4eb7-ac13-825a35d14d91
	2026-06-05T18:19:42.604786+02:00 INF ⣇⡀ dev2n1: instance: restarts: >>> do pg_update [/usr/bin/om restarts instance pg update] (origin daemon/api, sid 33445950-9f96-4eb7-ac13-825a35d14d91)
	2026-06-05T18:19:42.681723+02:00 INF ⣇⡀ dev2n1: instance: restarts: app#foo1: applied pg /opensvc.slice/opensvc-svc.restarts.slice/opensvc-svc.restarts-app.foo1.slice: cpu_shares=2002
	2026-06-05T18:19:42.821157+02:00 INF ⣇⡀ dev2n1: instance: restarts: <<< done pg_update [/usr/bin/om restarts instance pg update] in 215.837943ms, instance status is now up
* add eid for per-pid unique identification
* fix sid to be inherited by all om commands forked from a om
  command, or by remote exec via api
* group orchestration, exec and session ids in the xsession pkg
* rename the `requester_sid` query param to `session_id`
* Rename the ERROR column to STATUS to be less alarming
* Write "accepted" as a STATUS when the submit is accepted
* Write "error: <api problem detail>" when the submit errs
* Less verbose errors in imon.SetInstanceMonitor
* Factorize objectaction.DoAsync error handling code

Example:

	root@dev2n1:~/dev/om3# bin/om svc* stop
	OBJECT  ORCHESTRATION_ID                      STATUS
	svc11   00000000-0000-0000-0000-000000000000  error: orchestration 8d961f31-a28e-4237-a280-27c995e7ed49 (>stopped) already in progress
	svc1    00000000-0000-0000-0000-000000000000  error: timeout publishing instance.MonitorUpdate{CandidateOrchestrationID=64140f28-9b1a-4740-bbe6-d392034e973d GlobalExpect=stopped}
	svc0    9e8b59b2-be45-4f1b-8c44-333d5f90eaaf  accepted
	svc111  40035c3b-bcc2-49cc-bec7-30c6c649e7c7  accepted
	Error: actions rejected: 2
* Set the ExecId and SessionId fields of the Exec, ExecSuccess,
  ExecFailed, Title messages.
* Log the execs at info level if the title is not empty
* Pass a title on freeze, unfreeze, drain actions
When "--node <peer>" is set, the action is submitted on the
peer api and the session id is returned.

--wait waits for the session id to terminate (ExecSuccess or
ExecFailed eventa) until --time is out.

--watch enters the monitoring loop.
	core/resource/resource.go:1361:10: assignment copies lock value to t.log: github.com/opensvc/om3/v3/util/plog.Logger contains sync.RWMutex

Store the log as a pointer to avoid copying the RWMutex.
	util/pubsub/main.go:452:40: literal copies lock value from *b.log: github.com/opensvc/om3/v3/util/plog.Logger contains sync.RWMutex

Store the log as a pointer to avoid copying the RWMutex.
	drivers/resdiskhp3par/main.go:1310:41: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
	drivers/resdiskhp3par/main.go:1311:41: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
	core/object/actor.go:296:30: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
	core/object/actor.go:300:28: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields
	core/object/node_stonith.go:30:36: github.com/opensvc/om3/v3/util/key.T struct literal uses unkeyed fields

Use `key.New(section, option)` instead of `key.T{section, option}`
	drivers/resdiskhp3par/main.go:313:2: unreachable code
	drivers/resdiskhp3par/main.go:767:2: unreachable code
	drivers/resdiskhp3par/main.go:797:2: unreachable code
	drivers/resdiskhp3par/main.go:1295:2: unreachable code
	drivers/rescontainerocibase/main.go:92:3: struct field tag `json:sysctl` not compatible with reflect.StructTag.Get: bad syntax for struct tag value
	core/object/node_collector.go:33:3: struct field text has json tag but is not exported
	drivers/rescontainerocibase/executor.go:71:2: the cancel function is not used on all paths (possible context leak)
	drivers/rescontainerocibase/executor.go:74:3: this return statement may be reached without using the cancel var defined on line 71
The DEFAULT section listed keywords not allowed for the kind.

e.g. The `l` keyword was listed in the `svc` documentation,
ignoring its Usr,Sec restriction.
the "admin" grant permits to get the "admin:ns1" claim on auth.
The root, admin, operator and guest grants give visibility on
all namespaces.
cvaroqui added 6 commits June 11, 2026 15:32
"array#a.b.c.api@n1.opensvc.com" must parse as:

	Section: "array#a.b.c"
	Option: "api@n1.opensvc.com"
And filter the dataset with the requester grants.

The "guest" grant on a namespace is sufficient to read metrics of
the actor objects' cgroups.

The "root" grant is required to access the daemon metrics.

This patch add a `reqh2.NewUDSClient() *http.Client` so the
GET /api/node/name/<nodename>/metrics can proxy for GET /metrics
using a UDS client with `.Do(req)` accessible (the client.New()
provides only api.ClientWithResponse interface clients, which
don't expose the `.Do(req)` functio).
Resources now store the logger as a pointer, so tests can trigger
plog.Logger calls with the resource log unallocated... causing
this kind of panics:

	--- FAIL: TestT_Info (0.34s)
	    --- FAIL: TestT_Info/from_zero_app (0.00s)
	panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	    panic: runtime error: invalid memory address or nil pointer dereference
	[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0xce9427]

	goroutine 29 [running]:
	testing.tRunner.func1.2({0x13c3ca0, 0x1d64290})
	    /opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1734 +0x21c
	testing.tRunner.func1()
	    /opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1737 +0x35e
	panic({0x13c3ca0?, 0x1d64290?})
	    /opt/hostedtoolcache/go/1.24.13/x64/src/runtime/panic.go:792 +0x132
	github.com/opensvc/om3/v3/util/plog.(*Logger).Msgf(...)
	    /home/runner/work/om3/om3/util/plog/main.go:120
	github.com/opensvc/om3/v3/util/plog.(*Logger).Levelf(0x0, 0xff, {0x152f927?, 0x15299c8?}, {0xc00019f430, 0x1, 0x1})
	    /home/runner/work/om3/om3/util/plog/main.go:144 +0x47
	github.com/opensvc/om3/v3/util/plog.(*Logger).Tracef(...)
	    /home/runner/work/om3/om3/util/plog/main.go:132
	github.com/opensvc/om3/v3/drivers/resapp.(*T).CmdArgs(0xc00019f8f0?, {0x169e288?, 0xc00024f880?}, {0x0?, 0x0?}, {0x15135ab?, 0x4?})
	    /home/runner/work/om3/om3/drivers/resapp/unix.go:292 +0x5ad

Test nilness to transparently ignore logging when the logger is
not allocated.
The register is done on manager Start, so test doing multiple
start-stop cycles caused panics like:

	panic: duplicate metrics collector registration attempted

	goroutine 475 [running]:
	github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0x48ab160, {0xc0012202a0?, 0xffffffffffffffff?, 0x112?})
		/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.22.0/prometheus/registry.go:406 +0x65
	github.com/prometheus/client_golang/prometheus.MustRegister(...)
		/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.22.0/prometheus/registry.go:177
	github.com/opensvc/om3/v3/daemon/pgmetrics.(*Manager).registerMetrics(0x39ae8e0?)
@cgalibern cgalibern merged commit 19ab3c7 into opensvc:main Jun 13, 2026
2 checks passed
@cgalibern cgalibern deleted the cva-main-pr branch June 13, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants