Fix: gate FortiOS deploy-config CLI fallback on rejected commands by a-v-popov · Pull Request #3466 · ipspace/netlab

a-v-popov · 2026-06-09T11:27:51Z

I assume, deploy-config should fail on a malformed config deployment
rather then silently apply it partially.

Problem

netsim/ansible/tasks/deploy-config/fortios.yml applies device configuration
through the config-script API (fortios_monitor: upload.system.config-script)
with an ansible.builtin.expect SSH fallback in a rescue block.

expect reports success whenever the SSH session closes cleanly — which it
does even when FortiOS rejected every command. So a malformed
configuration was silently accepted: the API call failed, the rescue pasted
the configuration over SSH, FortiOS rejected the bad lines, the session closed
with rc 0, and the play reported success. The device was left partially
configured (the valid lines applied) while netlab reported [CREATED] and
exited 0.

This was latent until pexpect was installed on the control host; without it
the rescue merely crashed on the missing import, which accidentally surfaced
the failure.

Fix

Gate the fallback on its own transcript: register the expect result and
failed_when it if the transcript contains a FortiOS error marker (or the SSH
command itself failed).

    register: fortios_cli_result
    vars:
      fos_config_error: '^(\d+: )?(Command fail\. Return code|Unknown action|command parse error)'
    failed_when: >-
      fortios_cli_result.rc != 0 or
      fortios_cli_result.stdout is search(fos_config_error, multiline=true)

Why gate the CLI transcript rather than the API result

The fallback exists (PR #2864) because the config-script API returns HTTP
500 even for valid configuration on some FortiOS releases (e.g. the 7.0.17
build used to build the boxes). A 500 is therefore not a reliable signal of
a bad configuration; treating it as fatal would regress those releases to a
hard failure. Scanning the CLI transcript instead keeps the workaround intact
— valid configuration applied via CLI produces no error markers — while still
catching genuine rejections regardless of FortiOS version.

Marker design

Case-sensitive. The strings are mixed-case (Command fail. Return code,
Unknown action, command parse error); a lowercase pattern would need
(?i), which only widens the false-positive surface.
Anchored to line start (^, multiline=true). The transcript echoes the
input configuration on prompt-prefixed lines (dut (ospf) # set …), so
anchoring stops an echoed value that merely contains the marker text from
tripping the check.
Optional \d+: prefix. FortiOS also emits errors as 8610: Unknown action; the optional numeric prefix lets such a line match at line start.
(A naive ^ anchor would have missed it — a false negative, the dangerous
direction.)
All three markers retained: Command fail. Return code -N accompanies most
rejections, but a bare Unknown action 0 can appear without it (a set
whose parent config already failed).

Test evidence (live, FortiOS v8.0.0, `netlab config`)

Scenario	Path exercised	Before	After
Malformed config	API 500 → CLI rescue	exit 0, `[CREATED]` (silent success)	exit 1, `failed_when_result: true`, `FatalError`
Valid config	API 200	exit 0	exit 0 (unchanged)
Valid config, API forced to fail	CLI rescue	n/a	exit 0 (workaround preserved, no false positive)

Regex also validated offline against the captured transcript: it matches all
four observed error forms (command parse error, Command fail. Return code,
bare Unknown action, prefixed 8610: Unknown action) and rejects benign
echoed descriptions containing the exact mixed-case marker text.

yamllint clean.

The deploy-config task applies configuration through the config-script API with an `ansible.builtin.expect` SSH fallback. `expect` reports success whenever the SSH session closes cleanly, which it does even when FortiOS rejected every command, so a malformed configuration was silently accepted: the play went green while the device was left partially configured. Gate the fallback by scanning its transcript for FortiOS error markers and failing the task when any are present. The check is deliberately on the CLI transcript rather than on the API result. The fallback exists because the config-script API returns HTTP 500 on some FortiOS releases (e.g. 7.0.17) even for valid configuration, so a 500 is not a reliable signal of a bad configuration; treating it as fatal would regress those releases to a hard failure. Scanning the transcript keeps that workaround intact (valid configuration applied via CLI produces no error markers) while still catching genuine rejections. The markers are anchored to the start of a line, because the transcript echoes the input configuration on prompt-prefixed lines; anchoring prevents an echoed value that merely contains the text from tripping the check. The optional numeric prefix tolerates FortiOS's `NNNN:` error form so a prefixed error at line start still matches. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ipspace

Thanks a million!

ipspace approved these changes Jun 9, 2026

View reviewed changes

ipspace merged commit e88b43a into ipspace:dev Jun 9, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: gate FortiOS deploy-config CLI fallback on rejected commands#3466

Fix: gate FortiOS deploy-config CLI fallback on rejected commands#3466
ipspace merged 1 commit into
ipspace:devfrom
a-v-popov:fos-deploy-config-gate

a-v-popov commented Jun 9, 2026

Uh oh!

ipspace left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a-v-popov commented Jun 9, 2026

Problem

Fix

Why gate the CLI transcript rather than the API result

Marker design

Test evidence (live, FortiOS v8.0.0, netlab config)

Uh oh!

ipspace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test evidence (live, FortiOS v8.0.0, `netlab config`)