Fall Risk AI — Prove Which Model Is Running

▸

why-it-matters/ — The four-question gap every governance stack inherits

Every AI deployment runs four identities at once: an artifact (what shipped), an agent (who is making the request), a model (what is actually computing), and a lineage (whose training is present). Industry has built robust authentication for the first two. The last two go unverified at runtime.

COVERED

1. What artifact was shipped? — registries, version control, BOMs

2. What agent or workload is authenticated? — OAuth, SPIFFE, Okta, Entra, mTLS

UNVERIFIED AT RUNTIME

3. What model is actually computing?

4. Whose training lineage is present?

Software identity can be valid while model identity is entirely absent. We have measured this directly: three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. In every scenario, the workload identity, artifact integrity, and API authentication remained valid. In every scenario, the structural identity measurement detected the substitution and the policy gate denied the request.

Technical notes: Agent Identity Is Not Model Identity · Measured Model Substitution · Zenodo, 2026

▸

trust-modes/ — Trustfall Lite · Trustfall Deep Lab · Trustfall Deep Enterprise

Two products. One escalation path.

Lite verifies what model artifact you have on disk. Deep verifies which model is actually computing at runtime.

Trustfall
Lite

Free, local, open source. Scan default Hugging Face and Ollama caches; verify SHA-256 hashes against the public signed registry. Apache-2.0. Self-serve. Trustfall Lite v0.4 adds local CSV/JSONL inventory export and tokenizer-surface coverage signals — still local-first, still no model-byte upload.

Install Trustfall Lite →

Trustfall Lite · quickstart

Free local verifier for Hugging Face and Ollama artifacts.

$pipx install fallrisk-trustfall
$trustfall scan

Status states

✓verified

⚠unknown_variant

?not_enrolled

→pilot_available

Trustfall
Deep Lab

Self-service runtime identity for solo researchers, indie ML founders, and small teams. Hosted measurement on Fall Risk ephemeral compute, or Local Standard with the signed engine on your hardware. Continuous attestation, audit logs, signed certificates that policy engines can enforce. Public registry placement on Free; private namespace on paid tiers. Self-serve portal in private build.

Notify me when Lab opens →

Trustfall
Deep Enterprise

Sovereign deployment for organizations whose model weights, fingerprint vectors, or distance values cannot leave the environment. Customer-deployable signed engine artifact with three trust modes: Local Standard, TEE-backed (TDX + H100 confidential computing), and ZK private-match. Customer-controlled signing keys with proof-of-possession. Tenant-private registry namespace, audit retention, compliance exports. Design-partner pilots in scoping; mutual NDA on request.

Start an Enterprise conversation →

▸

how-it-works/ — IT-PUF mechanism · δ-gene · zero-knowledge · comparison

Every neural network carries a unique structural fingerprint — not from what it says, but from the geometry of how it decides what to say. The fingerprint is a mathematical consequence of the architecture, not something anyone inserted.

Weights

Direct measurement

Send challenge prompts. Measure the internal response at two sites. Compare the resulting 64-dimensional fingerprint against an enrolled anchor. Accept or reject. No retraining. No model modification.

API

No weight access needed

Standard logprob endpoints expose enough structural geometry to verify identity across independent sessions. No weights required, no operator cooperation needed.

Zero-Knowledge

Prove without revealing

A zero-knowledge proof attests the model matches its enrolled identity without revealing the fingerprint, the anchor, or the methodology. Hardware attestation binds to a cryptographic root.

Attestation

The signed claim

The result travels inside a standard JWT or SPIFFE SVID, composing with OAuth 2.0, OPA, and EAT-based authorization flows (RFC 9711). No replacement identity stack. One new signed claim inside the systems you already use.

This end-to-end path — from forward pass through signed claim to policy decision — has been validated at 70 billion parameters in approximately 30 seconds.

Not a replacement. The layer none of them provide.

Watermarks require insertion at training time and are removed by fine-tuning. Model cards describe artifacts, not running systems. Behavioral tests measure what a model says, not what it is. Output monitoring watches downstream effects without verifying upstream identity. Each solves a real problem. None answer the structural question.

Property	Watermarks	Model Cards	Behavioral Tests	Output Monitoring	Structural Fingerprint
Works without training-time insertion	✗	—	✓	✓	✓
Survives fine-tuning	✗	—	Partial	—	✓
Survives distillation	✗	✗	✗	✗	✓
Works without weight access	✗	✓	✓	✓	✓
Verifies the running model, not a document	Partial	✗	Partial	✗	✓
Cryptographically verifiable	✗	✗	✗	✗	✓
Formally proved unforgeable	✗	✗	✗	✗	✓
Composable with existing auth stack	✗	—	✗	Partial	✓

The best deployment uses several of these together. Watermarks where you control training. Behavioral tests for capability monitoring. Output monitoring for safety. And structural fingerprinting for the one question the others can't answer: is this still the model you approved?

▸

validation/ — 0/54,120 structural · 46/46 artifact · 350+ theorems · 0 Admitted

v0.2.2 structural audit · May 2026 v0.2.3 artifact audit · May 2026

Structural audits apply to the Hugging Face IT-PUF lane. Artifact audits apply to the Ollama byte-identity lane. They are different evidence classes and should not be merged into one claim.

0 / 54,120

Observed false accepts. Same-seed cross-model pairs across the 165-record Hugging Face structural registry. No pair fell below ε. Cross-seed audit: 0/216,480.

9.82×ε

Closest cross-model separation. Starling-LM-7B-beta ↔ OpenChat-3.5-0106 — both Mistral-7B / OpenChat-lineage fine-tunes. Training siblings, structurally distinct.

46 / 46

Ollama artifact records unique. Model IDs, artifact manifest digests, and evidence digests were unique across the artifact-identity lane. Unauthorized artifact-hash collisions: zero.

350+

Coq theorems · 0 Admitted. Formal verification artifacts spanning the published findings: identity stability, gauge transport, no-spoofing impossibility, evidence sufficiency.

6,475×

Dynamic range across the 165-record audit. From closest sibling fine-tune pair to farthest architectural pair (gemma-4-31B-it ↔ gpt2-large). All measured pairs remain above threshold.

Patents assigned to Fall Risk AI, LLC. Weights identity + API verification + zero-knowledge attestation + identity-conditioned inference.

Earlier published validation

Earlier published validation included 0 / 1,012 identity tests in the weights regime (an earlier 23-model validation set spanning 16 vendor families and 3 architecture types), 0 / 14 API-regime tests across 3 providers, 1,536 measurements inside an H100 confidential-computing enclave, and reasoning-distillation separation studies showing structural scars up to 8,518 × ε across five Llama / Qwen / Mistral pairs (1.5B–70B). Those remain part of the published evidence base; the current public registry audit is the 165-record structural audit above.

Three categories of claims. Proven: machine-verified in Coq — compiles or does not. Validated: empirically demonstrated with stated evidence. Proposed: framework exists, falsification criteria pre-registered.

Claim	Evidence	Status
Fingerprint spoofing impossibility (weights)	T4_no_spoofing_interval_split (NoSpoofing.v) — KL budget exhaustion at all tested scales. 0 Admitted.	Proven
API spoofing impossibility (conditional)	API_NoSpoofing_3 (APINoSpoofing.v) — shift-equivariance + per-component cost floors + multi-prompt amplification. Under threat model TA1–TA4. 41 theorems, 0 Admitted.	Proven
Zero identity errors — weights regime	23 models, 16 families, 3 architecture types. 1,012 pairwise comparisons.	Validated
Zero identity errors — API regime	14 models, 3 providers (OpenAI, Google Vertex, xAI), 3 sessions per model. CRP protocol with per-model adaptive thresholds.	Validated
Adversarial erasure does not improve on passive fine-tuning	54 adversarial checkpoints. Best white-box erasure (Conv_T = 0.113) loses to passive SFT (Conv_T = −0.113) at equal capability cost. Pareto frontier has no favorable region.	Validated
Provenance transfer generalizes across families	7 experimental arms. 3 teacher families × 4 student architectures × 2 training protocols. 14/14 mature-epoch checkpoints directionally aligned (cos θ > 0.8). Includes MoE (Mixtral-8x7B).	Validated
Frontier structural identity (8B–72B)	5 models, 3 families. 0.0×ε self-verify. 10 pairwise rejections (67–2,512×ε). δ_norm within 2% of EVT prediction at 72B. Measurement pipeline unchanged from sub-7B.	Validated
Frontier structural separability under distillation	Five reasoning-distillation pairs across three base families. Structural scar magnitude is family-dependent: Mistral 7,701–8,518×ε, Llama 2,858–4,583×ε, Qwen 141–516×ε. All pairs above detection floor.	Validated
Distillation consequences are family-dependent	Five reasoning-distillation pairs, three families. Sixty-fold structural scar range. Cross-layer decoupling observed. Stiffness inversely ordered. Fisher curvature does not predict scar magnitude at production scale.	Validated
δ_norm scale invariance	25 models, 0.41B–72.7B. No significant scale correction (OLS p = 0.69). EVT prediction holds across two orders of magnitude.	Validated
Software attestation at frontier scale	JWT issuance + OPA policy validated at 70B. 30-second end-to-end. EAT profile (RFC 9711).	Validated
Speculative decoding transparent	d(SD, verifier) / d(verifier, draft) = 10.6%. Draft model invisible to protocol.	Validated
ZK Tier 1 — committed distance proof	Groth16 on BN254. 7,656 constraints. 128-byte proofs. 23 tests, 14 adversarial attacks rejected.	Validated
ZK Tier 2 — hardware-attested measurement	H100 + Intel TDX. 6 models, 1,536 measurements, 0 failures. Weight hashing byte-identical in TEE.	Validated
ZK Tier 3 — full ZK extraction	~296K constraints. 124 adversarial tests, 0 failures. ~85KB proof, <1s verification.	Validated
Non-separability of model-identity claims	ComposableIdentity.v A3 — claims cannot migrate without re-attestation. 0 Admitted.	Proven
Temporal binding necessity	ComposableIdentity.v B4 — stale-evidence vulnerability inherited by permissive verifiers. 0 Admitted.	Proven
Issuer authenticity	IssuerAuthenticity.v T1 — accepted token traces to authorized issuer. 0 Admitted.	Proven
Reference integrity	ReferenceIntegrity.v T1+T3 — bundle modification detectable via digest. 0 Admitted.	Proven
PPP gap measurement domain-invariance	GapInvariance.v — gaps invariant to log-softmax, temperature, constant shift. 5 theorems, 0 Admitted.	Proven

▸

use-cases/ — Supply chain · Distillation · Compliance · Insurance · Agentic

Model Supply Chain Verification

A vendor ships a model built on an open-weight base. The marketing says proprietary. The API response says otherwise — if anyone thinks to look. No security team checks which model is actually serving production. No compliance process verifies origin. IT-PUF verifies the structural fingerprint of what is running against what was enrolled at deployment — no leaked model ID required.

Distillation Forensics

16 million exchanges. 24,000 fraudulent accounts. The resulting models carry the teacher’s fingerprint. IT-PUF detects provenance transfer across families, architectures, and training protocols. The adversary cannot erase the trace without degrading the capabilities the distillation was meant to acquire. The signal fades with continued training — making continuous monitoring, not periodic audits, the operational requirement.

Regulatory Compliance — EU AI Act

Current monitoring checks outputs. Nothing checks whether the model itself has been swapped. Article 15 requires continuous monitoring of high-risk AI systems. Deadline: August 2026. IT-PUF provides model-level identity attestation: the system in production right now is the system that passed your approval process. With the zero-knowledge tier, provable without disclosing proprietary model internals to the regulator.

Insurance and Audit

Not a checkbox. Not a vendor assertion. Cryptographic proof. IT-PUF’s hardware-attested measurement runs inside an NVIDIA H100 / Intel TDX enclave and produces a signed certificate binding model identity to specific weights. The insurer verifies without seeing the model. The policyholder proves compliance without disclosing trade secrets.

Internal Model Governance

Which deployment is running which version? Did the hotfix propagate? Is staging accidentally serving production? IT-PUF: enroll once at deployment, verify on demand or on schedule, detect identity drift. Non-invasive. Runs during normal inference. Enrollment is one-time and scales with model size — 8 seconds at 8B, 30 seconds at 72B on 3× A100. Verification reuses the enrolled anchor and runs in seconds. Architecture-agnostic: Transformer, Mamba, MoE, hybrid.

Agentic AI Authorization

Every agent identity framework asks three questions. None of them ask which model is inside the agent — because none of them have a way to answer it. Directory entries describe agents. Workload identities authenticate deployments. Signed tokens authorize actions. No layer of the current stack establishes which neural network is reasoning. IT-PUF answers the fourth question: bind the model’s structural fingerprint to the authorization token. No protocol changes — the claim travels inside a standard JWT or SPIFFE SVID. Four security properties proved in Coq. Zero silent assumptions.

Detecting Published Safety-Alignment Removal

Open-source tools now automate the removal of safety alignment from language models. The modified checkpoints preserve the API contract and are optimized for low KL divergence from the original. The internal activation geometry still changes. Published abliterated checkpoints across two model families and three toolchains were measured against aligned bases under the hardened instrument configuration. Gemma-3-12B: Heretic 317.5–367.6×ε, mlabonne 1,556.8–2,319.4×ε. Llama-3.1-8B: Heretic 7.6–12.0×ε, OBLITERATUS 45.1–53.1×ε. Sentinel panel: 5/5 PASS across four model families (Gemma, Llama, Qwen, Mistral). Zero degradation of any prior positive. Family-dependent sensitivity reverses between distillation and abliteration — Gemma quiet under distillation but loud under abliteration; Llama the opposite. In the tested cases, published safety-alignment removal left a measurable structural scar — even when the tool explicitly optimized for output preservation.

▸

integration/ — JWT · SPIFFE · OAuth 2.0 · Trust modes · Certificate

Model-identity attestations compose with existing enterprise authorization infrastructure — OAuth 2.0, SPIFFE, SCIM — without protocol modifications. The fingerprint travels as a compact claim inside a standard JWT or SPIFFE SVID.

Sample Verification Certificate ✓ PASS

Report ID:           FR-2026-5B127509
Date of Measurement: 2026-03-17T08:43:15Z
Verification Result: PASS

MODEL
  Identifier:        Mistral-7B
  Architecture:      transformer
  Weight File Hash:  [redacted]
  Evidence Class:    Structural (individual model identity)
  Trust Mode:        TEE-backed (hardware-attested measurement)

MEASUREMENT
  Fingerprint Dims:  64
  Valid Measurements: 64/64 (0% failure)

FINGERPRINT VERIFICATION
  Fingerprint Digest: [redacted]
  Bundle Digest:      [redacted]
  Match:              UNIQUE (0 collisions across 6-model zoo)

ATTESTATION CHAIN
  CPU (Intel TDX):   CC State ON, Ready state ready
  GPU (NVIDIA CC):   H100 80GB HBM3, CC mode active
  Binding:           gpu_nonce = SHA256(bind_root) [verified]

TRUST BOUNDARY DISCLOSURE
  This certificate verifies STRUCTURAL IDENTITY only.
  It does NOT verify: performance, safety, fitness for
  purpose, training data, or regulatory compliance.

ISSUED BY: Fall Risk AI, LLC | integrations@fallrisk.ai | fallrisk.ai

Sensitive fields redacted for public display. Full certificate issued to authorized parties only.

Trust mode	Engine runs	Signing	Customer trust posture
Hosted	Fall Risk ephemeral compute	Fall Risk runtime certificate signer	Customer trusts Fall Risk operational posture; weights deleted after measurement
Local Standard	Customer environment, customer hardware	Customer-generated key with proof-of-possession	Customer-cooperative claim integrity under live challenge and signed engine
TEE-backed	Customer’s confidential computing enclave (TDX · H100 CC)	Customer-generated key	Hardware-rooted runtime acquisition; attestation chain anchors to vendor roots
ZK private-match	Customer environment with ZK prover	Customer-generated key	Cryptographic privacy: τ vector and distance never leave the customer environment

Hosted is available in Trustfall Deep Lab (default) and rare Enterprise scenarios. Local Standard is available in Lab paid tiers and is the Enterprise default. TEE-backed and ZK private-match are Enterprise; ZK is design-partner-gated during MVP.

Ten security properties of the composition are formally classified: four proved in Coq, three traced to existing standards, one implemented, two design-constrained. Zero silent assumptions. Download technical brief →

▸

articles/ — Threat briefs for security leaders and architects

Editorial companions to the research program. Threat briefs, founder scans, and operational notes — the implications made concrete for security leaders, governance teams, and anyone trying to understand what runtime identity actually means in production.

Open articles index →

Threat Brief

The Disappearing Window

The receipt is being withdrawn. The question is whether enterprise procurement lets it go quietly, in exchange for nothing — or requires its return while the contract is still being written.

Read the brief →

Threat Brief

The Continuity Gap

Your AI gateway authenticates the agent. It does not verify the model. We measured three substitution scenarios against a live gateway with signed credentials — all detected in under 7 seconds.

Read the brief →

Founder Scan

The Founder Scan

I ran Trustfall Lite on my own machine to answer the simplest question: what models are actually here?

Read the scan →

▸

advisories/ — Operational findings on model identity. Cited by stable identifier.

A Fall Risk Advisory is a structured operational record. It documents a measured threat to model identity continuity, names the affected models, describes the detection method, and recommends actions for relying parties. Where the papers establish what is provable, advisories establish what has been observed in the wild.

Each advisory carries a stable identifier of the form FRA-YYYY-NNN. The canonical home is attest.fallrisk.ai/advisories/ — the same authority surface that issues the signed registry.

Advisory · FRA-2026-001 · Warning · Active

Public toolchains for runtime safety-alignment removal

Three actively maintained open-source toolchains automate the removal of safety alignment from open-weights instruction-tuned language models. Over 1,300 publicly distributed derivative checkpoints across nearly every major model family. Workload-layer authentication does not detect this class of modification. Fall Risk structural identity measurement does, across all four model families measured in the published findings.

Read advisory →

▸

threat-model/ — What agent-only identity cannot see

Every scenario below succeeds while the agent identity stack reports green. The credentials are valid. The attestation passes. The audit log looks normal. The model changed.

Scenario A — Model Substitution Behind a Stable Endpoint

An operator replaces the model checkpoint behind a SPIFFE-authenticated endpoint. The service mesh identity does not rotate because the process did not restart.
Remains green: SPIFFE identifier, X.509-SVID credentials, workload attestation, mTLS authentication, OAuth authorization, audit log.
Changed: the neural network computing the responses. The replacement is architecturally identical — same parameter count, same API contract, different weights. No component in the identity stack detects the substitution.

Scenario B — Supply Chain Poisoning with Valid Attestation

Model weights are substituted inside a container before the image is built. The container hash matches the registry. SPIRE attestation passes. The artifact is intact. The computation is compromised.
Remains green: container image hash, SPIRE attestation (correct platform signals, correct service account), SVID issued normally.
Changed: the model weights. The hash verified the file, not the computation. This is the pattern seen in the LiteLLM/TeamPCP incident (March 2026) — legitimate credentials carried compromised content — transposed to the model layer.

Scenario C — Silent Model Rotation by an API Provider

A provider silently rotates the model behind a versioned endpoint to a cheaper variant. The endpoint URL does not change. The API contract does not change. The model changes.
Remains green: OAuth token, API authentication, transaction tokens, authorization scopes, audit records (same endpoint, same grants).
Changed: the model. This is the pattern observed in the Cursor/Kimi K2.5 incident (March 2026), where a flagship product was identified as running an undisclosed model foundation — discovered by a developer who intercepted an API response, not by any identity mechanism.

Scenario D — Internal Fine-Tuning Drift

Nobody changed anything maliciously. An authorized team fine-tunes the enrolled model. The fine-tuned variant inherits the same workload identity. The model drifted.
Remains green: every identity and authorization control — this is a legitimate operational change by authorized personnel.
Changed: the model's behavioral properties. The fine-tuning may have shifted the model past the boundary of what was originally authorized. No adversary involved. No credential compromise. Just operational drift that the governance stack cannot see, because it was designed to measure the wrapper, not the model.

Observation — Even Model Existence Is Established Post-Hoc

In March 2026, a frontier AI company’s most capable model was revealed to the public through a misconfigured content management system — nearly 3,000 unpublished assets left in an unsecured, publicly searchable data store. Cybersecurity stocks lost billions in market value within hours. The model’s existence was not disclosed through any identity or attestation mechanism — it was disclosed by accident. While the scenarios above describe model substitutions going undetected at runtime, this incident demonstrates a deeper void: even the baseline question of which model exists is currently answered through leaks and public disclosures rather than measurement. The identity gap extends from deployment all the way back to development.

Scenarios A, B, and C have been measured against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three substitutions tested, three detected, zero false accepts.

Abliterated checkpoints across two model families and three toolchains are structurally detectable at hardened measurement depth: Gemma 317.5–2,319.4×ε, Llama 7.6–53.1×ε. Sentinel panel 5/5 PASS across four families.

In April 2026, the LiteLLM supply-chain compromise escalated: Mercor, a $10B AI recruiting startup working with OpenAI and Anthropic, confirmed breach via the poisoned LiteLLM package. Over 1,000 SaaS environments affected. LiteLLM sits at the model-routing layer that the Agent Identity Is Not Model Identity technical note named as an incident class two days after the initial compromise was disclosed — weeks before the Mercor escalation confirmed the pattern.

These scenarios are grounded in public incident patterns and the architecture described in draft-klrc-aiagent-auth-01 (IETF, March 2026). They do not resolve by strengthening agent authentication. They resolve when structural model identity is composed into the existing agent identity infrastructure.

▸

regulatory/ — EU AI Act (August 2026) · NIST GenAI Profile mapping

The EU AI Act and NIST Generative AI Profile are increasing pressure for verifiable model traceability — not just documentation, but evidence of what is actually running. The four-level framework maps directly to those obligations. The admissibility framework in What Counts as Proof? extends the mapping into a formal standard: each compliance question has an evidence class that can answer it, and evidence from the wrong class incurs inferential debt. Documentation identifies artifacts. Evidence identifies models.

Framework level	What it establishes	EU AI Act	NIST GenAI Profile (AI 600-1)
Structural fingerprinting IT-PUF · weights regime	Unambiguous, unforgeable model identity — independent of operator claims	Art. 11 (technical documentation), Art. 49 + Annex VIII (unambiguous identification and traceability)	GV-6.2: contracts specifying provenance expectations; MS-2.5: monitoring adherence to provenance standards
Hardware-attested binding TEE · enclave measurement	Cryptographic binding of fingerprint to specific weight artifact — tamper-evident deployment record	Art. 12 (automatic logging, audit trail integrity), Annex IV §2 (system description with sufficient detail to assess conformity)	MS-2.6: detection of unauthorized changes; GV-1.7: organizational risk policies covering third-party model supply chain
Verified computation path ZK circuit · hybrid verifier	Proof that the identified model computed honestly — not just that some weights were used	Art. 13 (transparency, output traceability for downstream providers), Art. 17 (quality management: verification that deployed system matches documented system)	MS-2.5: provenance of model outputs; MP-2.3: documenting AI system decisions in regulated contexts
Output binding Token logit · evidence bundle	Traceable link from verified identity through verified computation to a specific output — the audit record closes	Art. 12 §1(d): logs must enable identification of input data and attribution of outputs; Art. 26 (deployer obligations: monitor, log, maintain records)	GV-6.2: content provenance at output level; MS-4.2: real-time monitoring of deployed model behavior against documented baseline

This mapping is descriptive. It identifies where the framework's technical capabilities are relevant to stated regulatory requirements — it does not constitute a compliance certification. The EU AI Act high-risk provisions take full effect August 2026.

EU AI Act — Regulation (EU) 2024/1689, Official Journal of the European Union. eur-lex.europa.eu
NIST AI 600-1 — Generative Artificial Intelligence Profile, National Institute of Standards and Technology. doi.org/10.6028/NIST.AI.600-1

▸

research/ — Thirteen papers + three technical notes · Each opened a question the last couldn't answer

Each paper opened a question the previous one could not answer. Thirteen papers. Three technical notes. Zero retracted.

Open research index →

Research Paper, 2026

Safety-Alignment Removal as a Model-Identity Failure

Open-source toolkits strip a model's safety constraints while leaving its outputs looking normal. The structural fingerprint changes anyway — and we can detect it.

Publicly available toolchains remove safety constraints from AI model weights while preserving observable behavior. The modification is invisible to every deployed trust layer — but structurally measurable. Fourth deformation class identified.

Two model families (Gemma-3-12B, Llama-3.1-8B), three toolchains (Heretic, mlabonne, OBLITERATUS), four abliterated checkpoints. Structural scars range from 7.6×ε to 2,319.4×ε. Family-dependent sensitivity reverses between distillation and abliteration. Sentinel panel across four families: 5/5 PASS, zero degradation. OBLITERATUS blind spot discovered at initial measurement depth → hardened configuration → all prior positives preserved. The admissibility doctrine — formally verified before this threat class existed — predicted exactly this outcome.

DOI: 10.5281/zenodo.19383019 2 families · 3 toolchains · 4 checkpoints Sentinel 5/5 PASS · 4th deformation class

Technical Note, 2026

Measured Model Substitution Under Valid Agent Credentials

Three model substitutions run against a live gateway with valid agent credentials. Three detected. Zero false accepts.

Three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three detected. Zero false accepts. HTTP 200 before. HTTP 403 after.

Scenario A: same-family substitution behind a stable endpoint — workload JWT, health checks, gateway PID, and policy hash all unchanged; model identity was the sole differentiating evidence layer (2,858×ε). Scenario B: cross-family substitution with both artifact manifests passing hash verification (3,416×ε). Scenario C: silent API rotation between gpt-4.1-mini and gpt-4.1-nano using the same API key and endpoint — per-model thresholds reject. Warm-path verification: 5.7–6.7 seconds with the model already loaded. Not inline per-request — runs at model load, on schedule, or as an out-of-band health check.

DOI: 10.5281/zenodo.19342848 3 scenarios · 3 detected · 0 false accepts HTTP 200 → 403 · OPA enforcement

Research Paper, 2026

Family-Dependent Response to Reasoning Distillation Across Structural and Functional Identity Layers

The same distillation event leaves different traces in different architecture families. The structural and functional identity layers can decouple.

The same distillation event leaves different traces in different architectural families — not just in magnitude, but in mode and cross-layer coupling. The structural and functional identity layers can decouple.

Five reasoning-distillation pairs across three base families (Llama, Qwen, Mistral) at five scales. Structural scars span a sixty-fold range: Mistral loudest (7,701–8,518×ε), Llama intermediate (2,858–4,583×ε), Qwen quietest (141–516×ε). Functional hierarchy breaks in Llama, absent in Qwen, marginal in Mistral — despite Mistral carrying the loudest structural scar. Cross-layer decoupling observed empirically for the first time. Stiffness at the measurement site inversely orders with scar magnitude across all three families. Fisher curvature, previously proposed as a candidate mechanism, does not correctly order scars at production scale.

DOI: 10.5281/zenodo.19298857 5 pairs · 3 families · 60× range Cross-layer decoupling · Fisher falsified

Technical Note, 2026

Gap Invariance: Why PPP Measurements Are Domain-Independent by Construction

The order-statistic measurement used in API verification is provably invariant to log-softmax, temperature, and constant shifts.

The API wall is narrower than previously understood. The log-softmax transformation does not change the measurement — by mathematical identity, not by empirical robustness.

Order-statistic gaps are exactly invariant to log-softmax, temperature scaling, and any position-independent constant shift. Five theorems, formally verified in Coq (GapInvariance.v, 0 Admitted). Any API measurement deviation must come from truncation or quantization, never from the probability-domain transformation itself.

DOI: 10.5281/zenodo.19275524 5 theorems · 0 Admitted API invariance proved

Technical Note, 2026

Agent Identity Is Not Model Identity

Existing agent identity systems authenticate the agent. They cannot tell you which neural network is computing the response.

Why authenticating the software is not the same as proving which model is actually computing. The category distinction, two incident classes, and a four-question taxonomy.

Current agent identity frameworks (OAuth, SPIFFE, Okta, Entra) authenticate the software harness. They do not verify the neural network inside it. A four-question taxonomy separates artifact identity, workload identity, model identity, and training lineage into distinct evidence classes. Two incident classes — undisclosed model substitution and supply-chain poisoning with valid credentials — demonstrate the operational consequences.

DOI: 10.5281/zenodo.19240883 4-question taxonomy · 2 incident classes

Research Paper, 2026

Post-Hoc Disclosure Is Not Runtime Proof: Model Identity at Frontier Scale

Disclosing a model's lineage after the fact is not the same as proving it at runtime. Validated at frontier scale (8B–72B).

Every incident in March 2026 was discovered after the fact. Post-hoc disclosure is not runtime proof. This paper demonstrates that runtime model identity is technically feasible at the model sizes where those incidents occurred.

Five frontier models enrolled (8B–72B), zero identity errors. Three declared-lineage distillation pairs — sharing identical architecture with their bases — produced structural separations of 2,858×ε (8B), 3,616×ε (14B), and 4,583×ε (70B) across two base-model families. These observations were flagged as exploratory; the family-dependent distillation study subsequently confirmed the pattern is family-dependent rather than scale-dependent. Software attestation path (signed JWT → OPA policy decision) validated at 70B in 30 seconds. Thermodynamic invariant δ_norm confirmed scale-free across 25 models spanning two orders of magnitude.

DOI: 10.5281/zenodo.19216634 Frontier-validated · 8B–72B 3 distillation pairs (expanded to 5 in the family-dependent distillation study) · JWT+OPA

Research Paper, 2026

Where Identity Comes From: Path Sensitivity and Endpoint Underdetermination in Neural Network Training

How structural identity forms during training, and why two models with identical architectures and recipes are not interchangeable.

Structural identity is not merely something a model has when measurement begins. It is something training builds, compresses, and locks — a record of the path by which the model became itself.

154 checkpoints. Ten seed-controlled runs. Three results: a three-phase emergence profile (identity locks at step 92,000 — the final 36% of training doesn't move it), path sensitivity (same recipe, different seed, fingerprints 391× to 11,737× apart), and endpoint underdetermination (tested weight statistics do not predict which identity formed). Formally proved in HistoricalIdentity.v: trajectory non-recovery and lock boundary source exclusion. Zero Admitted.

DOI: 10.5281/zenodo.19118807 HistoricalIdentity.v · 0 Admitted 154 checkpoints · 10 seeds · 3-phase emergence

Research Paper, 2026

Composable Model Identity: Formal Hardening of Structural Attestations in the Enterprise Identity Stack

How structural attestations compose with existing enterprise identity systems (JWT, SPIFFE, OPA), formally hardened against forgery.

Enterprise identity stacks authenticate workloads and credentials. They do not verify which neural network is computing inside them. This paper closes that layer — formally.

Live integration architecture for model-identity attestations in JWT and SPIFFE token flows, grounded in H100 Confidential Computing enclave measurements. Four composition properties proved in Coq: non-separability, temporal binding necessity, issuer authenticity, reference integrity. Every remaining trust dependency named, traced, and paired with a falsification witness. Zero OPEN rows. Zero silent assumptions.

DOI: 10.5281/zenodo.19099911 13 theorems · 0 Admitted · 3 proof files JWT · SPIFFE · OAuth 2.0 · SCIM

Research Paper, 2026

What Counts as Proof? Admissible Evidence for Neural Network Identity Claims

A formal admissibility framework for identity claims — what evidence is sufficient under a given threat model, and what is not.

Documentation identifies artifacts. Evidence identifies models. Current governance practice conflates the two.

Three evidence classes, each answering a different question: structural (which specific model?), thermodynamic (genuinely a neural network?), functional (distilled from an unauthorized source?). Formally proved in Coq that the classes cannot substitute for one another — three inadmissibility directions, zero gaps. Mapped to EU AI Act, NIST AI 600-1, and IETF provenance standards.

DOI: 10.5281/zenodo.19058540 Formally proved · 0 gaps EU AI Act · NIST AI 600-1 · IETF

Research Paper, 2026

The Deformation Laws of Neural Identity

Three layers of model identity (structural, thermodynamic, functional) and the distinct laws that govern how each one changes.

Three layers. Three deformation laws. None shared. Any identity claim that doesn't declare which layer it addresses is borrowing evidence it hasn't earned.

Structural layer: training-determined, load-bearing under attack (the model collapses before the fingerprint moves). Thermodynamic layer: approximately universal across 22 Transformer runs (CV 3.5%). Functional layer: transferred by distillation, erased by routine fine-tuning within two epochs. Two falsifications: the fingerprint does not reduce to a gauge projection (1.3% of the observable), and it is not predictable from architecture features (LOO R² = −3.93).

DOI: 10.5281/zenodo.19055966 22 models · 106 checkpoints Three layers · two channels · two falsifications

Research Paper, 2026

Which Model Is Running? Structural Identity as a Prerequisite for Trustworthy Zero-Knowledge Machine Learning

Verifying which specific model produced a specific inference, with cryptographic proof bound to the request.

zkML proves computation. We prove identity first.

A weight commitment proves which bytes were used. It does not prove which model those bytes belong to. Four-level framework: structural fingerprinting, hardware-attested binding, hybrid verifier-checkable decoder layer (124 negative tests, 0 failures), and output binding to a claimed token logit. When a rescaling error compressed the fingerprint to ~1.5 bits of dynamic range, structural identity retained 0.98 rank correlation. Identity may live in relational geometry, not activation magnitude.

DOI: 10.5281/zenodo.19008116 ~296K constraints · 124 tests · 0 failures Identity-first zkML

Research Paper, 2026

Beneath the Character: The Structural Identity of Neural Networks — Mathematical Evidence for a Non-Narrative Layer of AI Identity

Mathematical evidence that AI identity has a structural layer beneath the conversational character.

Is there a there there? There is. And the proof compiles.

Gideon Lewis-Kraus asked in The New Yorker: "What is Claude? Anthropic doesn't know, either." This paper answers the prior question. Two separable layers: structural identity (weight geometry — invariant, unforgeable, not a watermark) and functional identity (behavior, tone, the performed self). Neither reduces to the other. The structural layer is a consequence of the softmax bottleneck — demanded by the mathematics, not inserted by design.

DOI: 10.5281/zenodo.18907292 Philosophy of AI identity Dennett · Parfit · Schechtman

Research Paper, 2026

Provenance Generalization and Verification Scaling

The teacher-student forensic signal generalizes across architectures and tokenizers. The verification protocol scales to large model zoos.

All three zero-knowledge tiers validated. Provenance transfer generalizes across families. API verification scales with zero breaches.

14 models, 0 / 14 API breaches. Provenance transfer across 3 teacher families, 4 student architectures, 2 training protocols. ZK Tier 1: committed distance proof, 7,656 constraints, 128-byte proofs. Tier 2: 1,536 H100 enclave measurements, 0 failures. Tier 3: full zero-knowledge extraction, ~296K constraints, 124 adversarial tests, 0 failures.

DOI: 10.5281/zenodo.18872071 3 teacher families · 0 breaches ZK all 3 tiers validated

Research Paper, 2026

The Geometry of Model Theft: Distillation Forensics and Adversarial Erasure Resilience

A trained model carries the geometric trace of its teacher. Adversarial attempts to erase it lose to passive fine-tuning, which eventually wins.

The adversary's full white-box knowledge buys nothing. Passive fine-tuning outperforms adversarial erasure. The structural fingerprint doesn't move.

54 adversarial checkpoints. Structural identity invariant under distillation. Functional trace partially transfers, degrades under continued training. Apparent cross-family spoofing is geometric coincidence (R² = 0.995). Pareto frontier: no configuration achieves both trace erasure and capability preservation.

DOI: 10.5281/zenodo.18818608 54 checkpoints · δ_norm CV 1.9%

Research Paper, 2026

Template-Based Endpoint Verification via Logprob Order-Statistic Geometry

Verifying which model is behind a commercial API endpoint when all you have access to is its logprobs.

The fingerprint survives through commercial API interfaces. No weight access required.

PPP-residualized gap templates enable cross-session model identity verification through standard logprob endpoints. Zero breaches across 6 models, 3 providers, 3 independent sessions. Conditional API spoofing impossibility: 41 Coq theorems, zero Admitted.

DOI: 10.5281/zenodo.18776711 41 theorems · 0 Admitted 6 models · 0 / 120 (per-model τ)

Research Paper, 2026

The δ-Gene: Inference-Time Physical Unclonable Functions from Architecture-Invariant Output Geometry

The structural fingerprint that makes neural network identity measurable at inference time. The foundation paper.

No model was ever mistaken for another. Formally impossible to forge.

The δ-gene — the third pre-softmax logit gap — is a temperature-invariant structural fingerprint determined by training-induced weight geometry, not by what the model is saying. The IT-PUF protocol: 23 models, 16 families, 3 architecture types, 0 false acceptances. Spoofing impossibility: 311 Coq theorems, zero Admitted.

DOI: 10.5281/zenodo.18704275 311 theorems · 0 Admitted 3 architecture types · 0 errors

▸

contact/ — Fall Risk AI, LLC · New Orleans

On the name

The name Fall Risk comes from the medical wristband. A hospital labels a patient as a Fall Risk to acknowledge the vulnerability and act on it — extra rails, closer monitoring, faster response when something slips. Neural networks must be considered Fall Risks too. They can shift, drift, or be substituted in ways their software interfaces never expose. By labeling the AI as a “Fall Risk,” we are acknowledging that vulnerability and building the structural measurement tools necessary to ensure its safety — and the safety of the enterprises built around it.

On the researcher

Fall Risk AI, LLC · New Orleans, Louisiana. Anthony Coslett is an independent researcher studying the structural identity of neural networks. He is the sole principal investigator of the Fall Risk AI research program. Evidence from the research has been placed into proceedings at the EU AI Office, NIST, and the IETF.

integrations@fallrisk.ai Integration and pilot inquiries security@fallrisk.ai Security, disclosure, compliance legal@fallrisk.ai NDA, contract, licensing anthony@fallrisk.ai Research correspondence

Prove Which ModelIs Running

Prove Which Model
Is Running