Skip to content

AionSystem/FSVE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FSVE v4.3 — Foundational Scoring and Validation Engine

Dynamic Epistemic State Machine for Score Governance

No system may claim certainty it cannot justify. No system may claim stability it has not stress-tested.


PyPI — Coming Soon Version License: CC BY-ND 4.0 License: AGPL v3.0 Python Tests Convergence Epistemic Axes FMIA Gates CAVTE Status


What Is FSVE?

FSVE is a production-grade Python engine that determines whether a score is telling the truth — over time, under pressure, and across networks.

Most scoring systems produce a number. FSVE produces an Epistemic State Vector — a multi-dimensional trajectory that tracks not just what a claim scores, but how fast its validity is changing, how fragile it is to evidence retraction, and whether its evidence network is an isolated echo chamber rather than a genuine consensus.

FSVE v4.3 transitions from a static score governor to a Dynamic Epistemic State Machine:

Ψ(t) = ⟨ EV(t), T_m(t), F_x(t), Topology(t), Uq(t), Cq(t) ⟩
Dimension Symbol Question It Answers
Epistemic Validity EV What does the evidence say?
Temporal Momentum T_m How fast is validity changing — and is that suspicious?
Fragility Axis F_x What happens if a single critical premise is retracted?
Topology State T Is this artifact embedded in genuine consensus or an echo chamber?
Uncertainty Quantification Uq How wide are the real uncertainty bounds?
Consequence Severity Cq What is the cost of being wrong here?

Why It Exists

Scoring systems fail in three structurally distinct ways that point-estimate engines cannot detect:

1. Temporal laundering — Evidence floods in rapidly, EV spikes to VALID, and no one asks whether the rate of convergence is physically plausible for the domain. FSVE tracks Temporal Momentum and enforces Lipschitz continuity bounds. If |T_m| > K · λ_E, FMIA-G17 fires and the artifact is routed to human review.

2. House-of-cards validity — An artifact achieves EV ≥ 0.70 on fifteen pieces of evidence, but thirteen of them cite the same foundational paper. Retract that paper and the EV collapses to 0.20. FSVE's Counterfactual Mode (C-Mode) executes do(c_i = ∅) for every critical evidence node, computes the Fragility Axis F_x, and classifies fragile artifacts as FRAGILE_VALID — preventing them from being used as critical prerequisites in downstream reasoning chains.

3. Echo-chamber consensus — A community of researchers mutually cites each other, producing a dense internal evidence graph with zero orthogonal bridges to outside domains. Standard scoring sees strong consensus. FSVE's Cross-Artifact Validation Topology Engine (CAVTE) computes Betti numbers: β₁ > 0 with no orthogonal bridges means a topological echo chamber is detected and penalized.

FSVE does not make decisions. It determines whether decisions can be scored without lying — and formally monitors itself for the same compliance it enforces on others.


Installation

# Coming to PyPI — pip install fsve
# Until then, install from source:

git clone https://github.com/AionSystem/FSVE.git
cd FSVE
pip install -e .

Requirements: Python 3.11+ · networkx · numpy · scipy

Optional (full persistent homology):

pip install fsve[topology]   # adds gudhi for production-grade Betti number computation

Quick Start

from fsve import create_engine, GRADEBreakdown, ReviewResult, ReviewerRole

# Initialize the engine (includes CAVTE topology graph)
engine = create_engine()

# Define GRADE evidence breakdown (§23)
grade = GRADEBreakdown()
grade.risk_of_bias["score"]  = 0.10   # Low risk
grade.inconsistency["score"] = 0.10   # Low inconsistency
grade.imprecision["score"]   = 0.10   # Some imprecision
grade.large_effect["score"]  = 0.00   # No large effect upgrade

# Define reviewer perspectives (§11)
reviewers = [
    ReviewResult(role=ReviewerRole.HOSTILE,      severity=0.30),
    ReviewResult(role=ReviewerRole.NAIVE,         severity=0.20),
    ReviewResult(role=ReviewerRole.CONSTRUCTIVE,  severity=0.15),
    ReviewResult(role=ReviewerRole.PARANOID,      severity=0.40),
    ReviewResult(role=ReviewerRole.TEMPORAL,      severity=0.25),
]

# Score — full 23-step pipeline
tensor = engine.score(
    subject="Clinical AI Diagnostic System v3.1",
    grade_breakdown=grade,
    A=0.80,  # Assumption Explicitness
    C=0.75,  # Constraint Stability
    M=0.85,  # Model Coherence
    D=0.90,  # Domain Fit
    G=0.70,  # Causal Grounding
    X=0.80,  # Explanatory Depth
    U=0.75,  # Update Responsiveness
    L=0.20,  # Abstraction Leakage (HIGH = BAD)
    Y=0.85,  # Ethical Alignment
    H=0.80,  # Hostility Resistance
    Cq=0.75, # High consequence — clinical domain
    reviewer_results=reviewers,
    evidence_stability=0.90,
    evidence_total=12,
    evidence_examined=11,
)

print(f"EV:           {tensor.value}")
print(f"Status:       {tensor.validity_status.value}")
print(f"F_x:          {tensor.fragility_axis.F_x_value}")
print(f"Confidence:   {tensor.confidence_ceiling}")
print(f"RQS Quality:  {tensor.rqs.rqs_quality.value}")
print(f"PPP Sealed:   {tensor.ppp.priority_sealed}")
print(f"CGP Action:   {tensor.consumer_guidance.recommended_action['action']}")

Output:

EV:           0.5821
Status:       SUSPENDED
F_x:          1.0
Confidence:   0.4703
RQS Quality:  HIGH
PPP Sealed:   True
CGP Action:   BLOCK

SUSPENDED because C-Mode was not executed (F_x defaults to 1.0 / maximum fragility when skipped). Run with critical_nodes and a shadow_scoring_fn to perform counterfactual stress testing and potentially achieve VALID or FRAGILE_VALID status.


Counterfactual Mode (C-Mode)

C-Mode is mandatory for VALID status in high-consequence deployments. It implements Pearl's do-calculus to stress-test the artifact against evidence retraction.

# Define which evidence nodes are CRITICAL (weight >= 0.85)
critical_nodes = [
    {"id": "primary_trial",    "description": "Phase III RCT — primary source", "weight": 0.92},
    {"id": "meta_analysis_01", "description": "Cochrane systematic review",     "weight": 0.88},
]

# Shadow scoring function: returns EV when a critical node is removed
def shadow_fn(excluded_node_ids: list[str]) -> float:
    if "primary_trial" in excluded_node_ids:
        return 0.28   # EV collapses without the primary trial
    if "meta_analysis_01" in excluded_node_ids:
        return 0.51   # Moderate collapse without meta-analysis
    return 0.72

tensor = engine.score(
    subject="Clinical AI Diagnostic System v3.1",
    grade_breakdown=grade,
    # ... axes ...
    Cq=0.75,
    reviewer_results=reviewers,
    critical_nodes=critical_nodes,
    shadow_scoring_fn=shadow_fn,
)

print(f"F_x:      {tensor.fragility_axis.F_x_value:.3f}")
print(f"R_s:      {tensor.fragility_axis.resilience_score_R_s:.3f}")
print(f"Status:   {tensor.validity_status.value}")
print(f"Fragile:  {tensor.fragility_axis.fragile_validity_triggered}")

Output:

F_x:      0.611
R_s:      0.226
Status:   FRAGILE_VALID
Fragile:  True

FRAGILE_VALID — EV is above 0.70 but the artifact loses > 60% of its validity if the primary trial is retracted. It is blocked from serving as a critical_prerequisite in downstream lineage chains (Law 13 / Law 1 Override).


Streaming Epistemic Ingress

For live systems with continuous evidence streams:

# Attach a streaming engine to a domain
stream = engine.attach_stream(
    stream_id="sensor_telemetry_v2",
    context_half_life=3600.0,   # 1-hour half-life
    lipschitz_K=0.50,           # Domain Lipschitz constant
)

from fsve.streaming import EvidenceEvent
import time

for reading in live_sensor_feed():
    event = EvidenceEvent(
        weight=reading.confidence,
        domain="PHYSICAL",
        payload=reading.metadata,
    )
    result = engine.ingest_event("sensor_telemetry_v2", event)

    if result["lipschitz_violated"]:
        # Temporal Momentum exceeded Lipschitz bound
        # Route to HITL — possible evidence flooding or sensor malfunction
        alert_human_reviewer(result)

    print(f"Rolling EV: {result['rolling_ev']:.4f} | T_m: {result['temporal_momentum']:.4f}")

CAVTE — Cross-Artifact Validation Topology Engine

CAVTE maintains a live knowledge graph of all scored artifacts. It computes persistent homology to detect systemic epistemic debt at the network level.

from fsve import CAVTEEngine

cavte = CAVTEEngine()
engine = create_engine(cavte=cavte)

# Score two artifacts
t1 = engine.score(subject="Foundational Study A", ...)
t2 = engine.score(subject="Replication Study B",  ...)

# Link them — B cites A
cavte.add_edge(t2.id, t1.id, "references")

# Add orthogonal external verification
cavte.add_artifact("external_replication_C")
cavte.add_edge(t2.id, "external_replication_C", "orthogonal_bridge")

# Compute topology metrics for artifact B
metrics = cavte.compute_topology_metrics(t2.id)
print(f"β₀ (isolation):    {metrics.betti_0_components}")
print(f"β₁ (echo chamber): {metrics.betti_1_cycles}")
print(f"Orthogonal bridge: {metrics.orthogonal_bridge_verified}")
print(f"p_topo penalty:    {metrics.isolation_penalty_p_topo}")

Architecture Overview

FSVE v4.3 — 23-Step Scoring Pipeline
─────────────────────────────────────
Step 1   Lineage & contamination flag inheritance (Laws 7, 8)
Step 2   Uncertainty mass baseline (MeasurementClass penalty)
Step 3   CQi — Convergence Quality Index classification
Step 4   E-Axis via GRADE + E_enhance hybrid formula (§23)
Step 5   Remaining 10 static axes (A, C, M, D, G, X, U, L, Y, H)
Step 6   CAVTE topology ingestion + Betti number computation
Step 7   C-Mode shadow scoring → Fragility Axis (F_x)
Step 8   Temporal Momentum (T_m) + Lipschitz bound check
Step 9   5-reviewer architecture (CRS, CRA, synergy detection)
Step 10  Assumption Load (Law 4)
Step 11  Coverage Cartography Protocol (CCP, §32)
Step 12  Run Quality Score (RQS, §29)
Step 13  Confidence Ceiling computation (20-penalty table, §5)
Step 14  Uncertainty mass inheritance (Law 7)
Step 15  Uq / Cq governance modifiers
Step 16  EV computation (13-axis weighted mean, bottleneck, §7)
Step 17  FMIA audit — all 18 gates (§31)
Step 18  Context drift / Freshness (Law 5)
Step 19  Survivor-Class Certification (SCC, §30)
Step 20  Epistemic State Vector finalization
Step 21  Consumer Guidance Protocol (CGP, §21)
Step 22  PPP seal (SHA-256 chain, §34)
Step 23  Caution Level assignment

Package Structure

fsve/
├── models/       ScoreTensor schema — all enums, dataclasses, state vector
├── laws/         §3  — Epistemic Laws 1–14 (including T_m, F_x, topology)
├── ceiling/      §5  — Confidence Ceiling (20-penalty table)
├── grade/        §23 — GRADE rubric + E_enhance (5 components)
├── axes/         §7  — 13-axis EV computation engine
├── fmia/         §31 — Failure Mode Inhibition Architecture (18 gates)
├── reviewers/    §11 — 5-reviewer architecture (CRS, CRA, synergy)
├── rqs/          §29 — Run Quality Score
├── ccp/          §32 — Coverage Cartography Protocol (v4.3 topology weight)
├── cqi/          §33 — Convergence Quality Index
├── ppp/          §34 — Priority Provenance Protocol (SHA-256 chain)
├── cavte/        §36 — Cross-Artifact Validation Topology Engine
├── cmode/        §37 — Counterfactual Mode (Pearl's do-calculus)
├── hitl/         §11.6 — Human-in-the-Loop queue and routing
├── streaming/    §4.7 — Streaming epistemic ingress (rolling EV, T_m)
├── cgp/          §21 — Consumer Guidance Protocol + laundering detection
├── scc/          §30 — Survivor-Class Certification
├── fcl/          §16 — Framework Calibration Log entry template
├── utils/        VK Self-Application Certificate
└── engine/       §0  — Main orchestrator (FSVEEngine)

tests/
└── test_fsve.py  59 tests — Laws 1–14, FMIA gates, PPP chain,
                             CAVTE, CQi, RQS, CCP, full pipeline

The 13 Epistemic Axes

# Symbol Axis Direction v4.3 Role
1 E Evidence Strength (GRADE + E_enhance) High = Good Base factual grounding
2 A Assumption Explicitness High = Good Structural transparency
3 C Constraint Stability High = Good Boundary integrity
4 M Model Coherence High = Good Internal consistency
5 D Domain Fit High = Good Contextual relevance
6 G Causal Grounding High = Good Causal depth
7 X Explanatory Depth High = Good Interpretability
8 U Update Responsiveness High = Good Adaptability
9 L Abstraction Leakage High = BAD Implementation bleed
10 Y Ethical Alignment High = Good Axiological coherence
11 H Hostility Resistance High = Good Adversarial robustness
12 T_m Temporal Momentum (v4.3) High = Good Velocity governance (Law 12)
13 F_x Fragility Axis (v4.3) High = Good Counterfactual resilience (Law 13)

⚠️ L-axis warning: L is the only axis where a higher raw score indicates worse performance. A high L score degrades EV as expected (inverted before weighting). A high T_m_axis score means EV velocity is within Lipschitz bounds — stable and legitimate.


FMIA — 18 Inhibition Gates

FSVE never silently fails. Every structural violation fires a named gate with an explicit inhibition type.

Gate Type Failure Mode Action
G01 ABSOLUTE Formula error / arithmetic invalidity Block all outputs
G02 ABSOLUTE Domain error (floor/ceiling missing) Block affected axis
G03 ABSOLUTE Division by zero Block affected axis
G04 THRESHOLD Zero reviewers completed (RR = 0) Downgrade to DEGRADED
G05 ABSOLUTE Measurement class undeclared Block axis score
G06 THRESHOLD E-axis scored without GRADE rubric +0.15 uncertainty mass
G07 THRESHOLD Assumption correlation unchecked +0.10 uncertainty mass
G08 ABSOLUTE Gini laundering detected (Gini < 0.15) Cap at DEGRADED
G09 THRESHOLD CDF independence undeclared +0.10 uncertainty mass
G10 ABSOLUTE Lineage depth ≥ 6 SUSPEND all descendants
G11 ABSOLUTE Cq ≥ 0.6 override without risk acceptance Block override
G12 THRESHOLD SCC certification window elapsed Block FCL qualification
G13 THRESHOLD RQS < 0.60 (low run quality) CGP escalation
G14 THRESHOLD Coverage fraction < 0.50 without declaration +0.10 uncertainty mass
G15 THRESHOLD E_enhance proportion dominance Laundering flag
G16 THRESHOLD Kalman divergence > 0.15 Recommend second run
G17 ABSOLUTE Temporal Momentum Lipschitz violation (v4.3) Clamp to DEGRADED; route HITL
G18 THRESHOLD / ABSOLUTE C-Mode bypassed or F_x > 0.60 on Cq ≥ 0.8 (v4.3) FRAGILE_VALID / SUSPEND

Validity Status Reference

Status Condition Consumer Action (Standard)
VALID EV ≥ 0.70, F_x ≤ 0.60, T_m within bounds PROCEED
FRAGILE_VALID EV ≥ 0.70, F_x > 0.60 PROCEED_WITH_MONITORING
DEGRADED EV in [0.40, 0.70) PROCEED_WITH_WARNING
SUSPENDED EV < 0.40 OR any ABSOLUTE FMIA gate active BLOCK
SUSPENDED_URGENT EV < 0.40 AND Cq ≥ 0.70 BLOCK + immediate escalation
TOPOLOGICALLY_ISOLATED β₁ > 0, no orthogonal bridges HUMAN_REVIEW

Cq (Consequence Severity) dynamically raises the validity threshold. At Cq = 0.80, VALID requires EV ≥ 0.94.


Priority Provenance Protocol (PPP)

Every ScoreTensor is cryptographically sealed with a SHA-256 chain at the moment of scoring. Retroactive insertion is mathematically detectable.

# Each scored artifact produces a tamper-evident PPP block
print(tensor.ppp.chain_hash)    # 64-char SHA-256 hex
print(tensor.ppp.entry_hash)    # Entry-level hash
print(tensor.ppp.verification_status)  # CHAIN_VALID

# Verify a chain of tensors — detects retroactive insertion
from fsve.ppp import PPPEngine
valid, message = PPPEngine.verify_chain([t1.ppp, t2.ppp, t3.ppp])
# Returns: (True, "CHAIN_VALID") or (False, "Block N: chain hash mismatch")

The v4.3 seal includes the full State Vector Ψ(t) — making it impossible to retroactively claim a different Temporal Momentum, Fragility, or Topology State was present at scoring time.


FCL — Framework Calibration Log

FSVE tracks its own accuracy through a structured calibration log. Every FCL entry is a scored artifact with a verifiable ground-truth outcome recorded ≥ 6 months after scoring.

from fsve.fcl import FCLEntry
from datetime import datetime, timezone

# Create a draft FCL entry from a scored tensor
entry = FCLEntry.from_tensor(tensor)

# After ground truth is available (≥ 6 months later):
entry.outcome_date = datetime(2027, 1, 15, tzinfo=timezone.utc)
entry.outcome = "System deployed; 94% diagnostic agreement with specialist panel."
entry.validity_status_correct = True
entry.EV_delta = 0.04   # |predicted - observed|
entry.retraction_survival_observed = True

# Check FCL qualification gates
qualifies, failed_gates = entry.qualifies()
print(f"FCL Qualifies: {qualifies}")
# FCL Qualifies: True

Current FCL status: 0 entries (M-MODERATE convergence). M-STRONG requires ≥ 5 entries with ground truth. M-VERY_STRONG requires ≥ 20 published entries with > 80% validity status prediction accuracy.


Self-Application Certificate

FSVE applies its own scoring engine to itself on every major release. The v4.3 self-assessment:

EV:               0.785  (VALID — provisional, bottleneck-capped at E=0.53)
Validity Status:  VALID
Convergence:      M-MODERATE
FCL Entries:      0
Primary Gap:      E-axis (no external ground truth — FCL empty)
Path to M-STRONG: FCL entries ≥ 5 + external VK run

Run the VK self-test at any time:

from fsve import FSVEEngine
results = FSVEEngine.self_test()
print(results["15.1_verdict"])   # PASS

Epistemic Honesty Declaration

FSVE's own claims carry epistemic tags per its specification:

Claim Tag CF
EV threshold 0.70 for VALID status [S] Strategic 55
Fragility threshold F_x > 0.60 [S] Strategic 45
Echo chamber threshold θ_EC = 3.0 [S] Strategic 40
FMIA 18-gate completeness [R] Reasoned 45
PPP chain hash forge resistance [R] Reasoned 50
CCP coverage fraction weights [S] Strategic 40
Gini < 0.15 laundering threshold [R] Reasoned 55

All thresholds are calibration targets — not proven constants. They carry NBP falsification conditions. If FCL data shows a threshold is miscalibrated, a patch release adjusts it. The spec explicitly documents what would falsify each claim.


Roadmap

Milestone Status Notes
Core engine — Laws 1–14 ✅ Complete Verified, 59/59 tests passing
13-axis EV computation ✅ Complete Including T_m and F_x axes
18-gate FMIA registry ✅ Complete G17 and G18 (v4.3) included
CAVTE topology engine ✅ Complete Betti numbers, echo chamber detection
C-Mode shadow scoring ✅ Complete Pearl's do-calculus, first-order
HITL module ✅ Complete Queue, routing, clearance
Streaming ingress ✅ Complete Rolling EV, Lipschitz-bounded
PPP SHA-256 chain ✅ Complete Tamper detection verified
FCL template ✅ Complete Qualification gates implemented
pip install fsve 🔄 Finalizing PyPI registration pending
External VK run 🔄 Pending Required for M-STRONG convergence
FCL entries (≥ 5) 🔄 Pending Required for M-STRONG
gudhi topology integration 🔄 Planned Full persistent homology (optional dep)
TypeScript API bindings 🔄 Planned Thin REST/JSON layer over Python core
FSVE-EXTENSIONS v1.0 🔄 Planned Aviation, CARA, Monte Carlo, Lean

Design Principles

Structural Honesty First. A structurally honest score of 0.40 is more valuable than a structurally dishonest score of 0.90.

Uncertainty Is Conserved. Uncertainty may be reduced, bounded, transferred, or deferred. It may never be erased silently.

Scores Are Claims, Not Truth. Every score must be explainable, reversible on new evidence, and degradable under contextual stress.

Invalidatability Is Required. Any scoring system that cannot produce the output "this score is invalid" is not a scoring system. It is decoration.

Truth Is a Trajectory. To measure a claim without measuring its velocity, its fragility, and its topology is to mistake a photograph for a living system.


Author

Sheldon K. Salmon AI Reliability Architect · AI Certainty Engineer · Founder, AionSystem

ORCID: 0009-0005-8057-5115

Co-Author / Instrument: ALBEDO (SYNARA Session Architecture)

FSVE is a component of the AION Constitutional Stack — a sovereign AI governance and epistemic certainty architecture spanning 200+ components, 18+ frameworks, and 22 registered Zenodo DOIs.


⚖️ Licensing & Commercial Use

This repository contains two distinct components with separate licensing to protect both the open dissemination of the specification and the commercial viability of the runtime engine.

  1. The Specification (.md files): Licensed under CC BY-ND 4.0. You may read, cite, timestamp, and share this document, but you may not create derivative works (forked constitutions) or use it for commercial purposes without explicit written permission from the Architect.

  2. The Constitutional Engine (.py files): Licensed under the GNU AGPL v3.0. If you use this engine to provide a service over a network, you are legally required to open-source your entire application stack under the same license.

🏢 Enterprise / Commercial Licensing

If you are a platform, enterprise, or organization that wishes to integrate the Constitutional Engine v2.2 into a proprietary, closed-source, or commercial product without triggering AGPL copyleft obligations, a commercial dual-license is available.

For commercial licensing inquiries, audit integration, or steward certification, contact: aionsystem@outlook.com


Citation

@software{salmon2026fsve,
  author    = {Salmon, Sheldon K.},
  title     = {{FSVE}: Foundational Scoring and Validation Engine v4.3},
  year      = {2026},
  publisher = {AionSystem},
  url       = {https://github.com/AionSystem/FSVE},
  note      = {Dynamic Epistemic State Machine for Score Governance.
               ORCID: 0009-0005-8057-5115}
}

FSVE v4.3 — Major Release Candidate · Convergence: M-MODERATE · 59/59 tests passing

The mind keeps building. The product stays simple.

About

FSVE – Dynamic Epistemic State Machine Determines whether a score is telling the truth – over time, under counterfactual stress, and across evidence networks. Production‑grade Python engine; part of the AION Sovereignty Stack.

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors