Skip to content

scottpeterman/netlapse

Repository files navigation

Netlapse

An Oxidized replacement, network state time-lapse. Periodic structured snapshots of network device operational state with versioned diffs.

Netlapse — network state time-lapse

Why Netlapse?

Oxidized backs up configs as text blobs. It answers "what changed?" with git diff.

Netlapse captures any CLI output (configs, ARP tables, BGP state, routing tables, interface status), parses it into structured JSON, versions both artifacts, and answers "what changed?" with semantic diffs:

  • ARP diff: "12 MACs learned, 3 aged out, 2 moved interfaces"
  • BGP diff: "peer 10.0.0.1 went Established → Idle, prefix count dropped 4200 → 0"
  • Interface diff: "Gi0/3 MTU changed 9000 → 1500"
  • Config diff: config-aware side-by-side — intra-line word highlighting, fold-unchanged, and volatile-line masking, over input that's echo/prompt-stripped at collection so it never lights up on a phantom first line.

Two questions, kept separate. The text diff answers did the bytes change? — a line-aligned, config-aware view of the raw output. The semantic diff answers did the network change? — a record-level comparison of the parsed output that excludes volatile noise (age timers, counters) and reports only meaningful state. An LLDP or OSPF capture changes on every poll because a Dead Time or age field ticks; the text diff faithfully shows the timer moving, the semantic diff says "No changes detected," and both are correct for the question they answer. Which fields count as noise is data-driven, not hardcoded — see Semantic Diff. This is the thing Oxidized and RANCID can't do, because they only diff text.

One Search

Type one token — an IP, a MAC, an ASN, a hostname, a serial — and get everything that references it across the whole fleet, grouped by what it means. It's the structured version of the 2 AM grep: instead of greping one device at a time, you ask the question once and the answer comes back organized.

One Search — one token across inventory and every capture, grouped by capture type

A single search returns two answers:

  • Identity — which managed device this token is. A management or OOB address, a hostname, a serial, an asset tag resolves to the device that owns it, with a link straight to its detail page.
  • References — everywhere it's seen, grouped by capture type (config, ARP, MAC table, BGP, routes, version, …) then by device, with the matching lines highlighted. Open any hit to the full capture and copy it whole.

The matching is format-aware, because the value never appears the same way twice across a mixed fleet:

  • IP is octet-anchored — searching 10.7.255.35 will not drag in 10.7.255.350 or 210.7.255.35. The lines it returns are actually that address.
  • MAC matches every vendor representation from one keystroke — type d4af.f76c.45ad and it also finds d4:af:f7:6c:45:ad, d4-af-f7-6c-45-ad, and the bare d4aff76c45ad, wherever they're stored.
  • ASN matches with or without the AS prefix, so AS6169 and remote-as 6169 both surface.

Because it reads the captures Netlapse already stores, it has zero runtime coupling to collection — it's a query over the dual-artifact tree, not a second index to keep in sync. Trace a /31 link subnet to the device that advertises it, pivot from its ARP entry to the chassis MAC in show version, and confirm they're the same box — in two searches, with no topology tool and no correlation database underneath.

Semantic Diff

Two snapshots, one question that actually matters at 2 AM: did the network change, or did a timer just tick? Semantic diff answers it by comparing parsed records instead of raw text. It matches each record by its identity — an ARP entry by IP, a BGP session by neighbor, an OSPF adjacency by neighbor ID — and reports only meaningful state changes: a MAC relearned on a different interface, a peer dropped to Idle, a route's next-hop moved. Volatile fields that tick every poll — age timers, uptimes, counters — are excluded by default, so the diff stays quiet when nothing operationally significant happened and surfaces the one line that matters when something did

Netlapse — Semantic Diff

⚠️ Beta Version Notice

Netlapse is beta (v0.1.0). It can run in production, but expect rough edges and breaking changes before 1.0.

Authentication is minimal. The current backend is a small local user credential database — usernames with scrypt-hashed passwords, bootstrapped from a single admin account (set NETLAPSE_ADMIN_PASSWORD on first run). It's adequate for a trusted operator or a small team on a trusted network; it is not yet hardened for untrusted or multi-tenant exposure.

LDAP is not yet implemented. The provider seam is in place (auth.provider: ldap / ldap+local), but selecting it raises NotImplementedError. Use the default auth.provider: local until LDAP lands.

Data-Driven Collection

What Netlapse collects — and how each vendor's CLI says it — is defined in YAML files, not in code. They live as siblings of config.yaml (default ~/.netlapse/):

captures.yaml — the catalog of collectable artifacts. Each capture is a vendor-neutral intent (an ARP table, a BGP summary) plus the syntax to get it. command is either one string for everyone, or a map keyed by platform slug with a reserved default fallback:

captures:
  bgp-summary:
    capture_type: bgp
    description: BGP session state
    command:
      default: show ip bgp summary
      juniper_junos: show bgp summary
    interval: 900

jobs.yaml — bindings: which capture runs against which devices. A binding carries no command syntax and no interval (those come from the capture it references) — only the device selection:

jobs:
  bgp-summary:
    capture: bgp-summary
    name: BGP summary
    enabled: true
    filters:
      platform: cisco_ios,arista_eos,juniper_junos

Adding a vendor, a capture, or a whole new artifact type is a file edit and a restart — no coder in the loop:

# edit ~/.netlapse/captures.yaml + jobs.yaml, then:
python -m netlapse definitions validate    # offline, all-or-nothing check
python -m netlapse definitions sync         # project into the jobs table (or just restart)

The new capture type then propagates to the UI through the DB — it appears in the Capture Type selector on every job, with no template or schema work.

Files are authoritative for definitions; the DB owns runtime state. A sync reconciles the jobs table to the files — insert new, update changed, tombstone removed — but never touches the live enable/disable toggle, the schedule, or job history. Validation runs over the whole catalog at once and is fail-soft: a typo is reported alongside every other problem in one pass, and the sync is skipped entirely rather than half-applied — the daemon keeps running on the last-good definitions already in the DB.

The definitions CLI rounds it out: validate, sync, list, and adopt (which takes over jobs that predate the registry). See Design Decisions for why definitions are file-authoritative while platforms stayed in the DB.

Jobs editor showing a file-managed job — a banner notes the definition comes from the YAML catalog (edit captures.yaml/jobs.yaml and re-sync), with capture type, interval, default command, per-platform overrides, and site/role/platform/device filters

Project Status

v0.1.0 — ~15,300 lines across 43 source files. Structured parsing operational via tfsm-fire (1275 templates). Multi-vendor job management with per-platform command mapping, device-level CRUD with credential and SSH override controls, and integrated SSH auth diagnostics. 79 devices across 3 data centers collecting on schedule with live WebSocket progress.

Netlapse dashboard — device/site/job stat cards, per-site device counts, and the recent-collections table with capture type, interval, last run, coverage, and duration

Screenshots throughout are from the bundled demo lab (13 emulated devices across ENG/USA/WAN), not the production fleet — so device counts here won't match the figures above.

Layer Module Lines Status
API Oxidized compat (9 routes) 220 ✅ Wired to DCIM + storage + scheduler
API Native REST (27 routes) 687 ✅ Full CRUD + device edit + auth test + scheduler triggers + search
DCIM NetBox-aligned SQLite schema (v6) 1,439 ✅ Jobs, devices, history, CRUD, per-platform commands, auto-migration
DCIM SC2 map importer 1059 ✅ Topology → DCIM sync, idempotent, full hostname preservation, file-overridable role inference
Storage File backend (directory tree) 417 ✅ Dual artifacts (raw + parsed JSON), last-N rotation
Storage Git backend (versioned + trailers) 738 ✅ Tested
SSH Client (Paramiko wrapper) 817 ✅ Ported from SC2, key auth, legacy algorithm + pubkey signature handling
SSH Emulation shim (NetEmulate) 302 ✅ 82/82 devices verified
SSH Executor (DCIM → SSH → Snapshot) 602 ✅ Built, auth test, device-level legacy override
Vault Credential vault (Fernet/PBKDF2) 2,142 ✅ Ported from SC2, headless unlock, key + password auth
Vault DCIM bridge + credential resolution 350 ✅ Vault ↔ executor integration
Core Collection pipeline (DCIM→vault→SSH→disk) 502 ✅ End-to-end proven, \r\n normalization
Core CLI (sync-map, vault, collect) 232 ✅ Subcommand dispatch
Parser tfsm-fire engine (TextFSM auto-template) 232 ✅ Ported from SC2.5, thread-safe
Parser Parse engine (clean + score + enrich) 392 ✅ Output cleaning, filter cascade, vendor fallback
Parser Template database 296 templates ✅ Arista (48), Cisco IOS (143), Juniper (22), NX-OS, ASA
Scheduler Async job loop + WS broadcast + parsing 571 ✅ Running in production, inline tfsm-fire, per-platform command resolution
Web Dashboard (FastAPI + vanilla JS) 3,880 ✅ Live — 6 views, job CRUD, device edit + auth test, parsed data tables, capture type selector
netlapse/
├── __init__.py              (3)     Package version (0.1.0)
├── __main__.py              (287)   CLI: serve, sync-map, vault (init/add-ssh/delete/list/assign), collect
├── app.py                   (255)   FastAPI app, lifespan, vault unlock, parser init, scheduler start
├── parse_test.py            (305)   Single-device collect + parse diagnostic tool
├── api/
│   ├── native.py            (687)   27 native REST routes at /api/v1/
│   └── oxidized_compat.py   (220)   9 Oxidized-compatible routes at /
├── core/
│   └── collector.py         (502)   End-to-end collection pipeline, \r\n normalization
├── dcim/
│   ├── db_schema.py         (1439)  SQLite schema v10, views, queries, CRUD, device edit, auto-migration
│   └── map_importer.py      (1059)  SC2 topology map → DCIM sync, hostname.site preservation, roles.yaml inference
├── parser/
│   ├── __init__.py          (17)    Package exports: ParseEngine, ParseResult
│   ├── tfsm_fire.py         (232)   TextFSMAutoEngine — template matching + scoring (from SC2.5)
│   └── engine.py            (392)   ParseEngine — output cleaning, filter cascade, vendor fallback
├── scheduler/
│   └── __init__.py          (571)   ConnectionManager + Scheduler (poll loop, queue, WS broadcast, inline parsing, per-platform commands)
├── storage/
│   ├── backend.py           (280)   Abstract interface + Snapshot/DiffResult + shared diff/aligned helpers
│   ├── diff.py              (210)   Record (semantic) diff engine, DIFF_KEY_FIELDS, field exclusion
│   ├── config_diff.py       (230)   Line-aligned, config-aware text diff (side-by-side, intra-line)
│   ├── file_backend.py      (417)   Directory tree, last-N rotation, multi-type search, parsed JSON write-through
│   └── git_backend.py       (738)   Git versioning, commit trailers
├── ssh/
│   ├── emulation.py         (302)   NetEmulate shim (standalone, reusable)
│   ├── client.py            (817)   SSHClient with DCIM platform field mapping, legacy pubkey handling
│   └── executor.py          (602)   v_device_detail → SSHClient → Snapshot[], auth test with debug capture
├── vault/
│   ├── encryption.py        (339)   PBKDF2 key derivation + Fernet encryption
│   ├── models.py            (391)   SSH/SNMP credential dataclasses
│   ├── schema.py            (353)   Vault SQLite schema + DatabaseManager
│   ├── vault.py             (1059)  CredentialVault — CRUD, encrypt/decrypt
│   └── bridge.py            (350)   Headless unlock, DCIM↔vault integration
└── web/
    ├── __init__.py          (26)    Web router — serves / and mounts /static
    └── static/
        ├── index.html       (79)    SPA shell: sidebar, content area, module loader
        ├── css/
        │   ├── netlapse-light.css      CSS custom properties, full component library + forms + modal (light)
        │   └── netlapse-dark.css       Dark theme — same component classes, dark-mapped variables
        └── js/
            ├── app.js       (154)   Hash router, view lifecycle, health polling
            ├── api.js       (86)    All /api/v1 + Oxidized + native search + job CRUD + device edit + auth test
            ├── components.js(125)   Shared: badges, stat cards, formatters, icons (edit, toggle, trash)
            ├── ws.js        (136)   WebSocket manager with auto-reconnect
            └── views/
                ├── dashboard.js  (114)  Health stats, site grid, recent jobs
                ├── devices.js    (126)  Filterable inventory, URL param pre-filtering
                ├── device.js     (834)  Tabs incl. semantic + text diff (snapshot pair-select, field-exclusion chips, fold/mask controls), capture pills, parsed data, device edit, auth test
                ├── jobs.js       (592)  Job CRUD with per-platform command map editor
                ├── search.js     (331)  Multi-type regex search, detail modal, match nav
                └── collection.js (185)  Job selector, WebSocket progress, live log

Quick Start

Netlapse runs from the repo — there's no published package yet. Clone it, set up a virtualenv, and install the dependencies:

git clone https://github.com/scottpeterman/netlapse.git
cd netlapse
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

All commands below are run from the repo root with the virtualenv active, invoking the package as a module (python -m netlapse ...).

Import devices from Secure Cartography topology map

# Dry run — parse and report, no writes
python -m netlapse sync-map /path/to/map.json --dry-run

# Import — creates sites, assigns roles, detects platforms
python -m netlapse sync-map /path/to/map.json

# Exclude OOB management switches
python -m netlapse sync-map /path/to/map.json --exclude-prefix oob

The map importer supports two hostname conventions:

Convention Example Device Name Site
Dot-separated (datacenter/SP) border01.site1.company.com border01.site1 site1
Dot-separated (datacenter/SP) border1-01.site1 border1-01.site site1
Dash-prefixed (campus/enterprise) den-core-01 den-core-01 den

Device names preserve the operational hostname — domain suffixes are stripped but the site segment stays because it's part of the device identity (border01.site1 and border01.den2 are different devices).

Roles are inferred from hostname patterns (border* → router, tor* → leaf, -core- → core, -sw- → access). These defaults are common datacenter/SP and campus conventions, but they're not baked in: drop a roles.yaml next to captures.yaml (or pass --roles PATH) to map your own naming standard to roles — same edit-the-file, no-code workflow as captures. Absent that file, the built-ins apply. See Map Importer. Platforms are parsed from SC2's discovery strings (Arista DCS-7280SRA-48C6-F EOS 4.33.1.1Farista_eos, Cisco IOS-XE 17.03.06cisco_ios_xe, Juniper JUNOS 23.2R1-S2.5juniper_junos). Handles both short (Arista EOS 4.33.1.1F) and full chassis (Arista DCS-7280SRA-48C6-F EOS 4.33.1.1F) platform strings. Idempotent — safe to re-run after every SC2 discovery cycle.

Initialize credential vault and collect

# Initialize vault with master password
python -m netlapse vault init

# Add SSH credentials (password auth)
python -m netlapse vault add-ssh lab -u admin -p admin --default

# Add SSH credentials (key auth)
python -m netlapse vault add-ssh prod -u scott -k ~/.ssh/id_ed25519 --default

# Add SSH credentials (key + password for enable)
python -m netlapse vault add-ssh prod -u scott -k ~/.ssh/id_rsa -p 'enable_pass' --default

# Assign credential to all devices
python -m netlapse vault assign prod

# Verify
python -m netlapse vault list

# Delete a credential
python -m netlapse vault delete lab

# Collect configs (one-shot CLI)
export NETLAPSE_VAULT_PASSWORD="your_master_password"
python -m netlapse collect --site den

# Collect against NetEmulate (emulated devices)
python -m netlapse collect --site den --emulate

# Run a named job
python -m netlapse collect --job config-backup

# Start the daemon (scheduler + API + Web UI)
python -m netlapse

The vault stores key file contents encrypted — the original file isn't referenced at runtime, so the daemon doesn't need filesystem access to the key.

Browse the Web UI at http://localhost:8888 or the Swagger API docs at http://localhost:8888/docs.

First scheduled collection (65/65 against NetEmulate)

$ python -m netlapse
Netlapse v0.1.0 starting
Storage backend: FileBackend at /home/user/.netlapse/data
Emulation enabled: 1738 device IPs loaded
Vault unlocked from NETLAPSE_VAULT_PASSWORD env var
Scheduler started (poll=15s, workers=2)
Netlapse ready — http://0.0.0.0:8888

Job 'config-backup': collecting from 65 devices (trigger=scheduled, history=1)
  den-core-01: show running-config — 35361 bytes in 0.1s
  den-core-02: show running-config — 33057 bytes in 0.1s
  den-2-sw-01: show running-config — 47442 bytes in 0.1s
  ...
Job 'config-backup' complete: 65/65 in 42.3s (history=1)

Architecture

┌─────────────────────────────────────────────────────┐
│  FastAPI Application (port 8888)                    │
│  ├── /nodes, /node/* → Oxidized compat API          │
│  ├── /api/v1/*       → Native Netlapse API          │
│  ├── /ws             → WebSocket (live progress)    │
│  └── /               → Web UI (vanilla JS)          │
└────────────────────┬────────────────────────────────┘
                     │
┌────────────────────▼─────────────────────────────────┐
│  Scheduler (asyncio + ThreadPoolExecutor)            │
│  ├── Poll loop: get_due_jobs() every 15s             │
│  ├── Job queue: scheduled + API triggers             │
│  ├── Worker threads: collect_device() per device     │
│  ├── WS broadcast: per-device progress to all clients│
│  └── History: job_history + job_device_results       │
└────────────────────┬─────────────────────────────────┘
                     │
┌────────────────────▼─────────────────────────────────┐
│  Collection Engine                                   │
│  ├── Collector (DCIM → vault → executor → storage)   │
│  ├── SSH Client (Paramiko, legacy device support)    │
│  ├── Emulation shim (NetEmulate mock devices)        │
│  ├── Parse engine (tfsm-fire, 296 templates)         │
│  └── Credential vault (Fernet/PBKDF2 encrypted)      │
└────────────────────┬─────────────────────────────────┘
                     │
┌────────────────────▼─────────────────────────────────┐
│  Storage Layer                                       │
│  ├── Git backend (raw text + parsed JSON per commit) │
│  ├── File backend (directory tree, last-N rotation)  │
│  ├── DCIM SQLite (devices, jobs, history)            │
│  └── Vault SQLite (encrypted credentials, separate)  │
└──────────────────────────────────────────────────────┘

Scheduler Data Flow

app.py lifespan → DB → Storage → Vault unlock → Parser init → Emulation → Scheduler.start()
                                                                │
    ┌───────────────────────────────────────────────────────────┘
    │
    ├── Poll loop (asyncio, main thread)
    │   └── get_due_jobs(now) → queue.put(("scheduled", slug))
    │
    ├── API triggers (async, main thread)
    │   ├── POST /api/v1/jobs/{slug}/run   → scheduler.enqueue_job()
    │   ├── POST /api/v1/collect/{id}      → scheduler.enqueue_device()
    │   └── GET/PUT /node/next/{node}      → scheduler.enqueue_device()
    │
    └── Queue consumer → ThreadPoolExecutor (worker thread)
        ├── NetlapseDB(db_path)             # thread-local DB connection
        ├── db.start_job_run()              # job_history row (status=running)
        ├── for device in targets:
        │   ├── collect_device()            # SSH → Paramiko → raw output
        │   ├── parser.enrich_snapshot()    # tfsm-fire → parsed records + score
        │   ├── _record_result()            # store snapshot + update device status
        │   ├── db.insert_device_result()   # per-device history row
        │   └── _broadcast()                # WS events → collection view
        ├── db.complete_job_run()           # finalize counts + status
        └── db.update_job_schedule()        # next_run = now + interval

End-to-End Data Flow

CLI: netlapse collect --site den --emulate
    │
    ▼
Map importer: SC2 map.json → DCIM (sites, platforms, roles, devices)
    │   Hostname parsing: border01.site1 → site=site1, den-core-01 → site=cal
    │   Platform parsing: "Cisco IOS-XE 17.03.06" → cisco_ios_xe
    │   Role inference: border* → router, tor* → leaf, -core- → core (overridable via roles.yaml)
    │
    ▼
Vault: unlock from NETLAPSE_VAULT_PASSWORD env var
    │   PBKDF2-HMAC-SHA256 (480,000 iterations) → Fernet key derivation
    │   resolve_shared_credentials() → (username, password) tuple
    │
    ▼
Collector: list_collection_targets(site_filter="cal") → 13 devices
    │
    ▼
Executor: build_ssh_config(v_device_detail row + credentials)
    │   Maps dcim_platform fields → SSHClientConfig:
    │     primary_ip4           → host
    │     ssh_port              → port
    │     platform_profile      → (used by tfsm-fire for template matching)
    │     paging_disable_command → single platform-specific command
    │     prompt_regex          → prompt detection override
    │     enable_command        → enter privileged mode
    │     legacy_ssh            → device override > platform default > off
    │                             (DH group1, 3DES, forced ssh-rsa signatures)
    │
    ▼
Emulation shim: 172.16.48.60:22 → 127.0.0.1:10224 (den-core-01)
    │   1738 IPs loaded from ip_lookup.json
    │   DNS intercept patches socket.getaddrinfo
    │
    ▼
SSHClient: connect → find_prompt → disable_pagination → show running-config
    │   Prompt detected: "den-core-01#"
    │   Output captured: 35,361 bytes in 0.1s
    │
    ▼
Parser: ParseEngine.enrich_snapshot(snapshot, platform_profile="arista_eos")
    │   _clean_output: strip preamble, find last hostname echo, take output after
    │   _build_filter: "arista_eos_show_ip_arp" → 1 matching template
    │   find_best_template: score 80.2 → arista_eos_show_ip_arp
    │   Snapshot.parsed_data = { records: [...] }
    │   Snapshot.template_name = "arista_eos_show_ip_arp"
    │   (vendor fallback if specific filter misses)
    │
    ▼
Storage: store_batch(snapshots)
    │   → ~/.netlapse/data/site1/border1-01.site1/arp.txt  (raw CLI output)
    │   → ~/.netlapse/data/site1/border1-01.site1/arp.json (parsed records + template metadata)
    │
    ▼
DCIM: update_device_collection_status(device_id, "success", timestamp)

What's Built — Module Details

Scheduler (scheduler/__init__.py)

Two components in one module:

ConnectionManager — WebSocket broadcast hub. Tracks connected clients, broadcasts JSON events to all. The /ws endpoint in app.py adds/removes connections; the scheduler broadcasts.

Scheduler — asyncio-based job loop backed by a ThreadPoolExecutor for blocking SSH work. Poll loop checks get_due_jobs() every 15 seconds (configurable). API routes push ad-hoc triggers onto an asyncio.Queue. Worker threads create their own SQLite connections (thread-local — SQLite connections can't cross thread boundaries). Each device collection produces a WebSocket event, giving the frontend real-time progress.

WebSocket events emitted during collection:

Event Payload When
collection_start { job, device_count, history_id } Job begins
device_collected { device, status, command, commands, platform, bytes, duration, parsed, template } Each device completes
collection_progress { collected, total } After each device
collection_complete { job, collected, failed, device_count, duration, history_id } Job finishes

The Jobs view auto-refreshes every 10 seconds and instantly on collection_start/collection_complete events. The Collection view shows per-device live progress with a progress bar and scrolling log.

DCIM Schema (dcim/db_schema.py)

Single SQLite database at ~/.netlapse/netlapse.db. Schema version 10 (auto-migrates from v3 through v10). WAL mode for concurrent reads from the API while the scheduler writes.

Tables:

Table Purpose
dcim_site Physical locations. slug = Oxidized group = git directory = one namespace everywhere
dcim_manufacturer Hardware vendors (8 seeded: Cisco, Arista, Juniper, Palo Alto, Fortinet, F5, HP, Dell)
dcim_platform OS/software. Each platform carries SSH behavior fields: platform_profile, paging_disable_command, enable_command, prompt_regex, legacy_ssh
dcim_device_role Functional roles (12 seeded: router through border)
dcim_device Devices. credential_id, collection_enabled, legacy_ssh (device override), last_collection_status, credential_tested_at, credential_test_result
jobs Persistent job definitions: capture type, commands (JSON default), command_map (per-platform JSON, projected from captures.yaml for file-managed jobs), device filters, schedule, and source (file = catalog-owned/managed by sync; seed = shipped default, editable until a matching binding adopts it; NULL/api = created via API/UI)
job_history Per-run execution records: trigger, status, counts, timing
job_device_results Per-device outcome within a job run: status, error category, duration

Views: v_device_detail (full join of device + site + platform + manufacturer + role + credential, includes both device_legacy_ssh and platform legacy_ssh), v_site_summary, v_platform_summary, v_job_summary (27 columns — all job fields including command_map + latest history with computed duration).

Job CRUD methods: create_job(), update_job() (field whitelist protects schedule fields), delete_job() (history preserved via FK SET NULL), set_job_enabled().

Device CRUD methods: update_device() (14-field whitelist, diff-only updates, FK null normalization, legacy_ssh tri-state), list_credentials() (safe — no decrypted material), list_roles(), list_platforms().

Job history methods: start_job_run() → history_id, complete_job_run(), insert_device_result(), list_job_history(), get_job_run() (includes nested device results).

13 platforms with SSH behavior seeded. The slug is what devices reference and what captures.yaml per-platform override keys are matched against; profile is the tfsm-fire template-matching key (the two differ on a few platforms, so use the slug when authoring overrides):

Platform Slug Profile Paging Command Enable Legacy
Cisco IOS cisco_ios cisco_ios terminal length 0 enable
Cisco IOS-XE cisco_ios_xe cisco_xe terminal length 0 enable
Cisco IOS-XR cisco_ios_xr cisco_xr terminal length 0
Cisco NX-OS cisco_nxos cisco_nxos terminal length 0
Cisco ASA cisco_asa cisco_asa terminal pager 0 enable
Arista EOS arista_eos arista_eos terminal length 0
Juniper Junos juniper_junos juniper_junos set cli screen-length 0
Palo Alto PAN-OS paloalto_panos paloalto_panos set cli pager off
Fortinet FortiOS fortinet_fortios fortinet config system console\nset output standard\nend
F5 TMOS f5_tmos f5_tmsh modify cli preference pager disabled
HP ProCurve hp_procurve hp_procurve no page
HP Comware hp_comware hp_comware screen-length disable
Dell OS10 dell_os10 dell_os10 terminal length 0

Reference Data → Platforms tab: the 13 seeded platforms with slug, name, tfsm-fire profile, paging-disable command, legacy-SSH flag, and live device counts

8 default jobs seeded (as source='seed' — editable and collecting out of the box; sync promotes them to file if/when a jobs.yaml binding with the same slug is synced):

Job Capture Type Commands Interval
Config Backup config show running-config 1 hour
ARP Table arp show ip arp 30 min
BGP Summary bgp show ip bgp summary 15 min
Interface Status interfaces show interfaces 30 min
Route Table routes show ip route 30 min
OSPF Neighbors ospf show ip ospf neighbor 30 min
MAC Address Table mac show mac address-table 30 min
LLDP Neighbors lldp show lldp neighbors detail 30 min

Per-platform command overrides (Junos show configuration | display set, Arista show arp, etc.) live in captures.yaml and are projected into each job's command_map at sync time.

Map Importer (dcim/map_importer.py)

Imports device inventory from Secure Cartography topology maps — the output of SC2's BFS discovery. The network is the source of truth. NetBox coexistence is optional.

The importer follows the NetAudit pattern: the patrol/topology map is the seed source. It creates sites and roles as needed, upserts devices (updates IP/platform if changed, inserts if new), and flags devices that may need legacy SSH algorithms.

# CLI
python -m netlapse sync-map /path/to/map.json
python -m netlapse sync-map /path/to/map.json --exclude-prefix oob
python -m netlapse sync-map /path/to/map.json --site override-slug --dry-run
python -m netlapse sync-map /path/to/map.json --roles /path/to/roles.yaml
# Programmatic
from netlapse.dcim.map_importer import sync_from_map
result = sync_from_map(db, "/path/to/map.json")
print(f"Created {result.created}, updated {result.updated}, skipped {result.skipped}")

Role inference is file-overridable (roles.yaml). Out of the box the importer maps hostnames to roles with built-in conventions — prefix rules (border* → router, tor* → leaf) tried first, then segment rules (-core- → core, -fw- → firewall), falling back to switch. Those defaults are common but opinionated, so they're overridable from a roles.yaml definition file (sibling of captures.yaml/jobs.yaml, or --roles PATH) without touching code:

version: 1
role_inference:
  extend_builtins: false      # false = your rules are the whole ruleset;
                              # true  = your rules win, built-ins fill the rest
  fallback: switch
  prefix:                     # hostname STARTS WITH match; first match wins
    - {match: "gw",  role: router}
    - {match: "tor", role: leaf}
  segment:                    # match appears ANYWHERE; tried after prefixes
    - {match: "-fw-", role: firewall}

Same contract as the collection registry: absent file → built-in defaults apply (zero-config behavior unchanged), present-but-invalid → the import stops with every problem reported at once rather than silently mis-roling the fleet, present-and-valid → your rules govern. Unlike captures.yaml/jobs.yaml, roles.yaml is consumed at import time by sync-map, not projected into the jobs table — it shapes inventory, not collection. Role slugs that don't exist yet are auto-created on import (the same _ensure_role path the built-ins use).

Reference Data → Roles tab: seeded device roles with slug, display name, color swatch, and device counts — the leaf/spine/router/core counts here are what hostname inference assigned at import time

Credential Vault (vault/)

Ported from Secure Cartography's credential vault. Separate encrypted SQLite database at ~/.netlapse/vault.db — not in the DCIM DB (correct security boundary).

Encryption: PBKDF2-HMAC-SHA256 (480,000 iterations) for key derivation, Fernet (AES-128-CBC + HMAC-SHA256) for symmetric encryption. Salt randomly generated per vault initialization.

Credential types: SSH (username + password and/or private key), SNMPv2c (community string), SNMPv3 (USM with auth/priv protocols). SSH is the primary path for collection; SNMP support is carried forward for future use.

Headless unlock: Set NETLAPSE_VAULT_PASSWORD env var for daemon/unattended operation. The bridge module (vault/bridge.py) auto-unlocks on first access. The app lifespan unlocks the vault before starting the scheduler.

DCIM integration: The bridge resolves credential_id from dcim_device → vault lookup → (username, password) tuple for the executor. assign_credential_to_all() bulk-assigns a named credential to matching devices.

python -m netlapse vault init                          # Initialize with master password
python -m netlapse vault add-ssh lab -u admin -p admin --default  # Add SSH credential
python -m netlapse vault list                          # List credentials (no secrets shown)
python -m netlapse vault assign lab                    # Assign to all devices
python -m netlapse vault assign lab --site den         # Assign to one site

Collection Pipeline (core/collector.py)

The bridge between "we have devices and credentials" and "config backups land on disk." Two entry points converge on the same pipeline:

  • collect_now() — ad-hoc collection from CLI or API trigger. Specify filters, commands, and credential name.
  • run_job() — job-based collection from the jobs table. Resolves job definition, target devices, and schedule.

Both resolve credentials from the vault, call the executor, store results via the storage backend, and update DCIM collection history. Line endings are normalized (\r\n\n) before storage — devices send Windows-style line endings, Netlapse stores Unix-only. The scheduler calls collect_device() directly for per-device granularity in history and WebSocket broadcast.

# Ad-hoc collection
python -m netlapse collect --site den --emulate
python -m netlapse collect --role router --credential prod-ssh

# Job-based collection
python -m netlapse collect --job config-backup
python -m netlapse collect --job arp-table --site site1

Storage Layer (storage/)

Abstract interface defines Snapshot, StoredVersion, DiffResult data classes and the method contract. Factory function create_backend(config) reads storage.backend: file|git.

File backend stores snapshots as {site}/{device}/{capture_type}.txt with last-N rotation into a history/ subdirectory. Supports regex search across any capture type with line-number tracking and dynamic capture-type discovery. Both the semantic and text diffs are produced in file mode — the versioned parsed JSON in history/ gives the record diff the two states it needs, with no git dependency.

Git backend commits per-device with machine-parseable trailers:

X-Netlapse-Device: border-rtr-01
X-Netlapse-Site: site1
X-Netlapse-Trigger: scheduled
X-Netlapse-Types: config,arp,bgp

Trailers survive git clone, git bundle, and repo migrations — no side-car database required.

Diff engine. A DiffResult carries three views of a version pair, all produced by both backends (the comparison is pure record/text logic — git only ever contributed two blobs to read):

Semantic (record) diff (diff.py) — parsed JSON records matched by a capture-type key field, reported as added / removed / changed with per-field {from, to}:

Capture Type Key Field(s)
arp ADDRESS
mac DESTINATION_ADDRESS, VLAN
bgp NEIGHBOR or BGP_NEIGH
ospf NEIGHBOR_ID
interfaces INTERFACE or INTF
routes NETWORK or PREFIX
vlans VLAN_ID
spanning, cdp, lldp INTERFACE + NEIGHBOR
inventory NAME, PID

Falls back to full-record hash comparison when no key field is recognized (a change then shows as remove + add rather than changed). Volatile fields are excluded from the comparison by default — see Field exclusion below.

Aligned text diff (config_diff.py) — line-aligned, config-aware side-by-side: SequenceMatcher with autojunk=False (it matters on long configs), intra-line word highlighting on changed lines, fold-unchanged with adjustable context, and volatile-line masking against the comparison key, never the displayed text.

Raw unified diff — a plain difflib unified string, retained for Oxidized compatibility.

Field exclusion (volatility). Which parsed fields count as noise is policy, not algorithm, so it lives in volatility.yaml (loaded by registry/volatility.py), not code. Global name heuristics (*_TIME, *_PACKETS, AGE, …) plus per-capture volatile/keep rules resolve against the live schema; key fields are never excluded. So an ARP entry only reports "changed" when its MAC or interface moves, not when its age timer ticks — and a new capture type's timers classify themselves the first time they're collected. The per-diff UI exposes every column as a toggle on top of the defaults. Full reference: Semantic Diff.

Parser (parser/)

Structured CLI output parsing via tfsm-fire — the "output selects template" paradigm. Ported from Secure Cartography v2.5. The parse engine finds the best TextFSM template for raw CLI output automatically based on the output's structure, not manual template selection.

Architecture: Three layers —

  1. tfsm_fire.py (from SC2.5) — TextFSMAutoEngine. Thread-safe (thread-local SQLite connections). Scores each candidate template on four factors — record count (0–90), field richness (0–90), population rate (0–25), and consistency (0–15) — and selects the highest. The Template Lab normalizes the total to a 0–100 match score for display. Template tester: a raw capture run through tfsm-fire, the engine auto-selecting the matching TextFSM template and rendering parsed records with the match score — the "output selects template" inversion in action

  2. engine.pyParseEngine. Wraps the engine with output cleaning, filter string construction, and a two-stage parse strategy:

    • Specific filter: {platform_profile}_{command} (e.g. arista_eos_show_ip_arp) — tries 1-3 templates, fast.
    • Vendor fallback: {vendor} only (e.g. arista) — tries all vendor templates when the command name doesn't align with the template naming convention.
  3. tfsm_templates.db — 1275 TextFSM templates in a SQLite database. 48 Arista, 143 Cisco IOS, 22 Juniper, plus NX-OS, ASA, and others. Template browser: the 296-template tfsm-fire library browsable in-app, filterable by platform/vendor, each entry showing its template name and fields — the parser's matching surface made visible Device detail → Parsed Data tab: CDP neighbors rendered as a sortable table (neighbor, mgmt address, platform, interfaces, capabilities) with the auto-selected template cisco_ios_show_cdp_neighbors_detail and its match score

Output cleaning is critical — SSH session captures include command echo, banners, pagination responses, and trailing prompts that TextFSM can't handle. The cleaner uses a two-strategy approach:

  • Primary: Find the LAST hostname-prefixed command echo (e.g. router#show ip arp, user@switch> show arp) and take everything after it.
  • Fallback: Strip known preamble patterns (bare command lines, JUNOS version banners, terminal length, set cli screen-length, empty lines).

Integration point: The scheduler calls parser.enrich_snapshot(snapshot, platform_profile) after SSH collection and before storage. Both raw text and parsed JSON are written to disk as dual artifacts (arp.txt + arp.json).

Diagnostic tool: parse_test.py runs the full pipeline for a single device with verbose step-by-step output:

# Full pipeline: SSH → clean → parse → store
python -m netlapse.parse_test border1-01 --capture arp --emulate

# Skip SSH, test against a saved raw file
python -m netlapse.parse_test border01 --raw-file /tmp/border01-arp.txt --capture arp

# Lower threshold to see low-confidence matches
python -m netlapse.parse_test border01 --capture bgp --min-score 1

The same scoring drives the Parse Audit view — fleet-wide parse health per capture type. Each device shows the template that matched, its score, and a record count, bucketed OK / LOW / MISS / RAW / NO DATA so a parsing regression (a firmware bump that changes output shape, a missing template) surfaces as a row that dropped below threshold rather than a silent empty table. A reparse can be previewed and applied across the fleet without re-collecting from devices.

Parse Audit view for the arp capture type: 13 devices all OK, each row showing device, site, platform, the auto-matched TextFSM template, parse score, and record count, with Preview/Apply Reparse controls

SSH Module (ssh/)

Ported from Secure Cartography v2, split into three modules.

ssh/emulation.py — NetEmulate integration as a standalone module. Four resolution strategies for IP→mock device mapping: exact IP match → DNS resolution → FQDN-strip → hostname reverse-scan. Includes DNS intercept (monkey-patches socket.getaddrinfo). Enable once at startup (via CLI --emulate or config.yaml emulation.enabled), every SSHClient connection transparently redirects.

ssh/client.py — Paramiko wrapper. Invoke-shell only (required for most network devices). ANSI sequence filtering, prompt detection, RSA/Ed25519/ECDSA key loading from vault (PEM strings, not file paths at runtime), platform-specific pagination disable, enable mode entry. SSHClientConfig dataclass maps 1:1 to dcim_platform fields.

LegacySSHSupport auto-registers all available kex and host key handlers into Paramiko's Transport._kex_info and Transport._key_info dicts at first connection. Paramiko 3.x+ (especially on Python 3.14) ships with incomplete handler dictionaries — algorithm names are offered during negotiation but their handler classes aren't registered, causing KeyError on connect(). The registration discovers every kex/key class Paramiko ships via importlib, registers what's missing, and builds preference lists from only what's actually registered. Runs once per process, idempotent. Handles mixed fleets: modern devices negotiate curve25519/ecdh, legacy Cisco-1.25 devices fall back to DH group1/3DES, OpenSSH 6.x servers with only ssh-rsa host keys all connect without retry or fallback logic. When legacy_mode is active, also disables rsa-sha2-512 and rsa-sha2-256 pubkey signature algorithms to force ssh-rsa (SHA-1) — required for pre-2014 SSH servers that don't support RFC 8332 or advertise server-sig-algs.

ssh/executor.py — The bridge. build_ssh_config() takes a v_device_detail row and credentials, produces an SSHClientConfig. Device-level legacy_ssh overrides platform default via _resolve_legacy_ssh(). collect_device() connects, detects prompt, disables pagination, optionally enters enable mode, runs commands, and returns DeviceResult with Snapshot objects. test_device_auth() connects and disconnects without commands, capturing the full Paramiko negotiation trace via a temporary log handler — returns AuthTestResult with transport metadata (SSH banner, KEX algorithm, cipher, auth method) and debug log. 12 error categories: connection_refused, connection_timeout, auth_failure, host_unreachable, dns_failure, ssh_protocol, shell_timeout, prompt_detection, command_timeout, command_error, emulation_miss, unknown. Consecutive-failure circuit breaker for batch runs.

Web UI (web/)

Single-page application served by FastAPI at /. Vanilla JS with ES modules — no build step, no bundler, no framework. ~3,300 lines across 13 files. Tested live against 82 devices across 3 data centers.

Architecture: One HTML shell loads app.js, which manages a hash router and dynamically imports each view module on demand. Every view implements the same lifecycle contract: render() returns an HTML string, init() fetches data and wires events after the HTML is in the DOM, destroy() cleans up timers and WebSocket subscriptions when navigating away.

Views:

View Route API Endpoints Purpose
Dashboard #/dashboard /health, /sites, /jobs Stat cards, site grid with device counts, recent collections table
Devices #/devices /devices, /sites Filterable inventory table — search, site, status filters. URL param pre-filtering (#/devices?site=site1)
Device Detail #/device/{id} /devices/{id}, PATCH /devices/{id}, /devices/{id}/test-auth, /snapshots, /snapshots/latest, /diff, /search/capture_types, /platforms, /roles, /credentials, /sites Four tabs with capture type pill selector. Raw output per type, parsed data table with click-to-sort columns and status color-coding, snapshot timeline, semantic diff per capture type. Device edit modal (identity, collection, metadata) with credential override, legacy SSH tri-state, SSH auth test with debug log
Jobs #/jobs /jobs, /jobs/{slug}/run, POST /jobs, PUT /jobs/{slug}, DELETE /jobs/{slug}, PUT /jobs/{slug}/enabled, /platforms Full CRUD: create/edit via modal form with per-platform command map editor, enable/disable toggle, delete with confirmation. Auto-refresh (10s), WS-driven instant updates
Config Search #/search POST /api/v1/search, GET /api/v1/search/capture_types, /devices/{id}/snapshots/latest Capture-type selector (config, arp, bgp, routes, interfaces), regex search with line numbers, match highlighting, full-output detail modal with ▲▼ match navigation
Live Collection #/collection /jobs, /jobs/{slug}/run, WS /ws Job selector, trigger button, real-time progress via WebSocket — per-device log, progress bar, summary

Config Search detail modal: a device's running-config with line numbers, the search term highlighted in place, and ▲▼ controls to step through every match

Device detail → Raw Output tab: the verbatim show running-config snapshot with a capture-type pill selector (arp/bgp/cdp/config/interfaces/lldp/mac/ospf/routes), alongside Parsed Data, Snapshots, and Semantic Diff tabs

Key modules:

api.js (86 lines) — Every /api/v1 and Oxidized-compat endpoint in one file. Includes job CRUD methods (createJob, updateJob, deleteJob, setJobEnabled, jobHistory), device edit (updateDevice), auth test (testAuth), and reference data (platforms, roles, credentials). Views import typed convenience methods and never construct URLs.

components.js (125 lines) — Pure functions returning HTML strings. badge(status) maps status strings to colored indicators, statCard() renders dashboard metrics, code() wraps text in monospace tags, and formatters handle uptime, intervals, relative timestamps, and byte counts. Icon set includes play, edit, trash, toggle on/off for the jobs CRUD UI.

ws.js (136 lines) — WebSocket manager wrapping /ws with auto-reconnect (exponential backoff, 2s → 30s cap) and event dispatch. Views subscribe during init() and receive an unsubscribe function to call during destroy(). The collection view receives four event types from the scheduler: collection_start, device_collected (now includes parsed flag and template name), collection_progress, collection_complete. The jobs view subscribes to collection_start and collection_complete for instant status updates.

netlapse-light.css / netlapse-dark.css — two theme files, same component classes, theme-mapped CSS custom properties. IBM Plex Sans/Mono typography. Change --blue and every button, badge, and link updates; the light/dark split is variable values only, not duplicated rules. A full dark theme (not just a dark sidebar) toggled in-app. Component classes for cards, tables, badges, stat cards, the side-by-side text-diff table (cdiff), raw-diff line coloring, semantic-diff field-exclusion chips, snapshot timelines, progress bars, form grids, modal dialogs, filter pills, and the search detail modal with match navigation.

Design decisions:

  • Vanilla JS + ES modules — no webpack, no node, no build tooling. A network engineer opens views/devices.js and sees HTML strings and fetch calls.
  • Dynamic import() — the browser only loads the JS for the view being displayed. Dashboard never loads the collection view's WebSocket code.
  • render → init → destroy lifecycle — same pattern as a PyQt6 widget (setupUi → populate → cleanup). Familiar to anyone who's written desktop apps.
  • IBM Plex Sans/Mono — enterprise typography that renders IPs, hostnames, and config blocks alongside prose without visual conflict.
  • URL param pre-filtering — clicking a site card on the dashboard navigates to #/devices?site=cal and the devices view reads the param on init. Deep-linkable.

API Detail

Auto-generated OpenAPI 3.1 docs at /docs — two endpoint groups side by side: oxidized-compat (the drop-in /nodes, /node/fetch, /node/next, conf_search surface) and netlapse (the native /api/v1 REST API)

Oxidized Compatibility

Netlapse exposes the full Oxidized REST API at the root path. Point LibreNMS at http://netlapse:8888 — no config changes needed.

Oxidized Endpoint Method Status
/nodes GET ✅ Optional ?group= filter by site slug
/node/fetch/{node} GET ✅ Resolves by name or IP
/node/fetch/{group}/{node} GET ✅ Scoped to site slug
/node/next/{node} GET/PUT ✅ Pushes to scheduler queue
/reload GET ✅ Returns device count
/node/version GET ✅ Git commits or file timestamps
/node/version/view POST ✅ View/diff specific version
/nodes/conf_search POST ✅ Regex search across all configs (backward compat — native search is multi-type)

Native API

Swagger UI at /docs.

Endpoint Method Status
/api/v1/health GET ✅ Version, uptime, device/job counts
/api/v1/status/scheduler GET ✅ Running state, active jobs, queue depth, WS clients
/api/v1/devices GET ✅ Filterable by site, platform, role, status
/api/v1/devices/{id} GET ✅ Full detail from v_device_detail
/api/v1/devices/{id} PATCH ✅ Update device (14-field whitelist, diff-only)
/api/v1/devices/{id}/test-auth POST ✅ SSH auth test with debug log capture
/api/v1/sites GET ✅ Site list with device counts
/api/v1/platforms GET ✅ Platform list with device counts
/api/v1/roles GET ✅ Role list with device counts
/api/v1/credentials GET ✅ Credential list (safe — id, name, username only)
/api/v1/devices/{id}/snapshots GET ✅ List collection snapshots
/api/v1/devices/{id}/snapshots/latest GET ✅ Latest raw text + parsed JSON
/api/v1/devices/{id}/snapshots/{sha} GET ✅ Specific version
/api/v1/devices/{id}/diff GET ✅ Structured diff
/api/v1/jobs GET ✅ Job list with last-run summary (v_job_summary)
/api/v1/jobs POST ✅ Create new job
/api/v1/jobs/{slug} GET ✅ Single job definition
/api/v1/jobs/{slug} PUT ✅ Update job (field whitelist)
/api/v1/jobs/{slug} DELETE ✅ Delete job (history preserved)
/api/v1/jobs/{slug}/run POST ✅ Trigger immediate run via scheduler
/api/v1/jobs/{slug}/enabled PUT ✅ Enable/disable job
/api/v1/jobs/{slug}/history GET ✅ Run history for a job
/api/v1/history/{id} GET ✅ Single run with per-device results
/api/v1/collect/{device_id} POST ✅ Trigger single-device collection
/api/v1/search POST ✅ Multi-capture-type regex search with line numbers
/api/v1/search/capture_types GET ✅ List all stored capture types dynamically

Configuration

# ~/.netlapse/config.yaml
listen:
  host: 0.0.0.0
  port: 8888

storage:
  backend: file           # file or git
  path: ~/.netlapse/data
  max_versions: 50        # file backend only

dcim_db: ~/.netlapse/netlapse.db

scheduler:
  poll_interval: 15       # seconds between due-job checks (default: 15)
  max_workers: 2          # concurrent collection threads (default: 2)

emulation:
  enabled: true           # redirect SSH to NetEmulate mock devices
  # lookup_path: ~/netemulate/ip_lookup.json  # auto-searches defaults if omitted
  # bind_host: 127.0.0.1                     # default

parser:
  # db_path: ~/.netlapse/tfsm_templates.db  # auto-detected if omitted
  # min_score: 15.0                          # minimum template match score (0-100)

Config path overridable with NETLAPSE_CONFIG env var. Falls back to sensible defaults if no config file exists.

More files live alongside config.yaml in the same directory (default ~/.netlapse/) and drive collection, inventory, and diffs rather than the daemon itself:

  • captures.yaml — what to collect and the per-platform command syntax
  • jobs.yaml — bindings: which capture runs against which devices
  • roles.yaml — optional hostname → role inference for the map importer (overrides the built-in naming-convention defaults)
  • volatility.yaml — which parsed fields the semantic diff treats as noise (age timers, counters): global name heuristics plus per-capture volatile/keep rules. Absent → no default exclusions; the per-diff UI toggles still work.

All are optional. Absent captures.yaml/jobs.yaml, the daemon runs on whatever job definitions are already in the DB; absent roles.yaml, sync-map uses the built-in role defaults; absent volatility.yaml, the semantic diff compares every field. See Data-Driven Collection, Map Importer, and Semantic Diff. Their directory can be overridden with definitions_dir in config.yaml; otherwise it follows NETLAPSE_CONFIG. Set definitions_dir if you keep definitions outside ~/.netlapse/ — both the registry sync and the volatility loader resolve from it, so captures and diff-noise policy stay read from the same place.

Environment variables:

  • NETLAPSE_VAULT_PASSWORD — master password for headless vault unlock (required for scheduler)
  • NETLAPSE_CONFIG — config file path override

Remaining Work

Phase 1 — Scheduler ✅

Complete. 551 lines. Asyncio poll loop + ThreadPoolExecutor + WebSocket broadcast. Job CRUD API (create, update, delete, enable/disable). Job history with per-device results. Auto-migrating schema (v3 → v4). All API stubs wired. Running in production against 82 devices.

Phase 2 — Structured Parsing ✅

Complete. Parser engine ported from SC2.5 (tfsm_fire.py + engine.py). 296 TextFSM templates (48 Arista, 143 Cisco IOS, 22 Juniper). Integrated into scheduler — parsing runs inline after SSH collection, before storage. Dual artifacts written to disk: raw .txt + parsed .json. Output cleaning handles multi-vendor SSH session transcripts (Cisco #, Junos >, Arista #, with banners, pagination responses, and command echo). Device detail view renders parsed data as sortable tables with capture type selection. File backend handles write-through of parsed JSON even when raw text is unchanged (covers parser-added-after-first-collection scenario).

Phase 3 — Web UI ✅

Complete. ~3,300 lines, 13 files, zero new dependencies. Six views with hash routing, dynamic module loading, and WebSocket integration. Jobs view has full CRUD (create, edit, enable/disable, delete via modal forms with two-column grid layout, capture type datalist, interval picker, collapsible device filters, auto-slug generation). Device detail view has capture type pill selector across Raw Output, Parsed Data, and Semantic Diff tabs — parsed data renders as sortable tables with status color-coding. Config search supports all capture types with regex, line numbers, and full-output detail modal.

Phase 4 — Multi-Vendor & Device Management (partial) ✅

Per-platform command resolution: a job's commands column holds the default command list and command_map holds per-platform overrides (platform_slug → command array). These columns are no longer authored by hand — for file-managed jobs they're projected from captures.yaml at sync time (the capture's default becomes commands, every other platform key becomes a command_map entry), so the override surface is edited in one vendor-neutral place rather than per job. API-created jobs still write the same two columns directly, so both kinds resolve identically.

Resolution itself is a single shared function — registry.resolver.resolve_commands(default_commands, command_map, platform_slug) — that both the scheduler and the CLI collector call. That sharing is the point: before it existed, the scheduler honored command_map inline while the CLI collector sent the default commands to every platform, so netlapse collect --job and the daemon disagreed on Junos boxes. With one resolver, the two paths cannot diverge. Schema is now v10. jobs.source has three states — file (catalog-owned, read-only, tombstoned when removed), seed (a shipped default no file owns yet: editable, never tombstoned, auto-adopted to file when a matching binding is synced), and NULL/api (hand-made, never touched by sync). The defaults are seeded seed so they collect and stay editable with no catalog on disk, yet a catalog edit takes effect the moment its slug is synced. Auto-migrates from v3 through v10 (v10 repairs DBs left at an interim file seed value).

Device edit: full CRUD modal on the device detail view. Three sections — Identity (name, status, IPs, platform, site, role), Collection (credential override, SSH port, legacy SSH tri-state, collection enabled), Metadata (serial, asset tag, description, comments). Diff-only saves — only changed fields are sent in the PATCH payload. Reference data (sites, platforms, roles, credentials) lazy-loaded once on first edit.

Per-device SSH controls: credential_id overrides the shared vault default per device. legacy_ssh column on dcim_device (nullable — NULL inherits platform default, 0 forces off, 1 forces on). The SSH client now passes disabled_algorithms={'pubkeys': ['rsa-sha2-512', 'rsa-sha2-256']} when legacy mode is active, forcing ssh-rsa signatures for pre-2014 OpenSSH servers that don't support RFC 8332. (Introduced in schema v10)

SSH auth test with debug: POST /devices/{id}/test-auth connects, detects prompt, captures transport negotiation details (SSH banner, KEX algorithm, cipher, auth method), and disconnects. A temporary log handler captures DEBUG-level output from Paramiko and the SSH client during the test, returning the full negotiation trace. Updates credential_tested_at and credential_test_result on the device. UI shows inline results with a collapsible debug log panel — the exact output that diagnosed the rsa-sha2-512 signature mismatch on OpenSSH 6.2 Juniper border routers.

Phase 5 — Reporting Layer (next focus)

With collection, parsing, versioning, and diff in place, the next focus is a reporting layer over the state already collected — read logic, not new collection. A report is a query across the parsed, time-versioned store, run on a schedule and stored as an artifact, so the diff engine applies to it like any other capture. Two features lead:

  • Attached Device Reporting — vendor distribution and rogue-device detection from the MAC/ARP captures already collected. OUI→vendor resolution ships as a static IEEE registry (a seed-artifact, like the template DB); distribution is a group-by; "rogue" is a policy question answered the way volatility is — sensible heuristics plus an operator-declared expected-vendor policy in YAML, flagging deviations.
  • CVE Reporting — port of Secure Cartography's NVD/CPE security analysis (platform → CPE → NVD cached lookup, severity rollup). The version-matching, rate-limit, and cache work is already solved in SC2; the daemon port adds age-based re-sync (CVEs are filed against old versions continuously, so a daemon must refresh on a TTL where SC2 relies on a human clicking Force Re-sync), a device-version join for per-device exposure, and report-as-artifact diffing ("new CRITICAL against a running version since last week").

See Reporting — Design & Next Steps for the full design, the SC2 CVE findings, and the reporting-shell contract both plug into.

Design Decisions

Why the network is the source of truth: The resistance of NetBox adoption at the operator level drove an architectural pivot. VelocityCMDB required a populated NetBox; Netlapse doesn't. Each tool in the suite carries enough DCIM to operate autonomously. If NetBox exists, sync to it. If it doesn't, the tool still works. The SC2 topology map is the seed source — clone, point at the network, start collecting.

Why definitions are files-authoritative but DB-synced: A YAML catalog is what an operator should edit — diffable, reviewable, version-controllable, no code change to add a vendor. But the running daemon needs state that doesn't belong in a file: the live enable/disable toggle flipped in the UI, the schedule (last_run/next_run), and job history. So definitions are authoritative in the files and projected into the jobs table on startup, while that runtime state stays authoritative in the DB and is never clobbered by a sync. The DB therefore doubles as the last-good cache: if the catalog fails validation on a restart, the sync is skipped and the daemon runs on the definitions already in the DB. A typo degrades to "ran with last good definitions," never to "stopped collecting."

Why platforms stayed in the DB while captures and jobs moved to files: Devices reference their platform by platform_slug, and dcim_platform already carries the live SSH behavior each platform needs — prompt regex, paging-disable command, enable command, legacy-SSH flag. Putting platforms in a file too would create a second source of platform truth that could drift from the one devices actually resolve against. Instead, a capture's per-platform command keys are validated against the real dcim_platform list at load time, so an override key that no device could ever match is caught against the source that decides matching. One platform table, no drift. (Making the file authoritative for SSH behavior too is a deliberate later step — it means deciding what wins when file and DCIM disagree.)

Why SQLite, not PostgreSQL: Single-file deployment. No database server. The DCIM handles 500+ devices trivially. WAL mode handles concurrent API reads while the scheduler writes. Worker threads create their own connections — SQLite connections can't cross thread boundaries, but concurrent connections with WAL mode are safe.

Why two storage backends: Not everyone has or wants git. The file backend lets someone start collecting in 60 seconds. When they want versioning, they switch one config line.

Why dual artifacts (raw + parsed): The raw text is what you grep at 2 AM. The parsed JSON is what makes Netlapse different — structured diffs, semantic change detection, operational state awareness.

Why a separate vault database: The credential vault lives at ~/.netlapse/vault.db, separate from the DCIM at ~/.netlapse/netlapse.db. Different security boundary — the vault is encrypted, the DCIM is not. The credential_id on dcim_device is a logical reference resolved at runtime through the vault bridge.

Why commit trailers instead of a metadata database: Git trailers survive git clone, git bundle, repo migrations, and backup/restore. A side-car database can get out of sync.

Why site_slug = Oxidized group = git directory: Three systems that need to agree on a namespace. Making them the same string eliminates mapping tables.

Why thread-local DB connections in the scheduler: SQLite connections can't cross thread boundaries (check_same_thread=True by default). The scheduler's worker threads create their own NetlapseDB(db_path) instances and close them after each job. The poll loop runs in the main asyncio thread and uses the shared connection. WAL mode ensures concurrent reads don't block.

Design Philosophy

The stack is deliberately inheritable: FastAPI, SQLite, vanilla JS, Paramiko — mainstream frameworks, no exotic dependencies. Vanilla JS is intentional (the next person maintaining this is a network engineer, not a frontend developer). All projects have architecture docs. Runs from a cloned repo and a virtualenv — no build step. GPL licensed so it stays open.

Portable Components

Netlapse reuses battle-tested modules from the author's network automation stack:

Component Origin Status Purpose
SSH Client Secure Cartography v2 ✅ Ported Paramiko wrapper with auto-registered algorithm handlers, key+password auth, ANSI filtering, prompt detection
Emulation Shim Secure Cartography v2 ✅ Ported NetEmulate mock device redirection for testing
SSH Executor VelocityCollector ✅ Adapted DCIM → SSH → Snapshot pipeline with 12-category error handling
DCIM Schema VelocityCollector ✅ Ported NetBox-aligned SQLite (sites, platforms, roles, devices, jobs, history)
Credential Vault Secure Cartography v2 ✅ Ported Fernet-encrypted SQLite, headless unlock, DCIM bridge
tfsm-fire Secure Cartography v2.5 ✅ Ported TextFSM auto-template selection — output selects template
Parse Engine New for Netlapse ✅ Built Output cleaning, filter cascade, vendor fallback, Snapshot enrichment
Map Importer New for Netlapse ✅ Built SC2 topology maps → DCIM device inventory, hostname.site preservation
Collection Pipeline New for Netlapse ✅ Built End-to-end: DCIM → vault → executor → parser → storage
Scheduler New for Netlapse ✅ Built asyncio + ThreadPoolExecutor, WS broadcast, inline parsing, job history
Web UI New for Netlapse ✅ Built SPA: dashboard, inventory, parsed data tables, job CRUD, live collection

Dependencies

fastapi>=0.110          # Web framework
uvicorn[standard]>=0.27 # ASGI server
python-multipart>=0.0.9 # Form parsing (Oxidized compat endpoints)
paramiko>=3.4           # SSH (SC2 client)
gitpython>=3.1          # Git storage backend
pyyaml>=6.0             # Configuration
cryptography>=42.0      # Vault encryption
textfsm>=1.1            # Template parsing (tfsm-fire structured output)

Python Compatibility

Tested on Python 3.12 and 3.14. The SSH client's LegacySSHSupport handles Paramiko algorithm registration differences across Python versions automatically — no version-specific configuration needed.

License

GPLv3

Author

Scott Peterman — [Full Stack Net Ops Developer](https://scottpeterman.github.io)

About

Oxidized alternative / replacement — captures any CLI output (configs, ARP, BGP, routes), parses to structured JSON, versions both, and answers 'what changed' with semantic diffs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors