Fall Risk AI

Prove Which Model
Is Running

Your security stack authenticates users, credentials, and infrastructure. It does not authenticate the model. A substituted model, a distilled copy, a silently updated checkpoint — every credential stays valid. Every audit log looks normal. Only the model has changed.

fallrisk-ai

Every model already carries a structural fingerprint. We built the instrument that reads it.

75 models Publicly enrolled and signed — zero identity errors
70M → 72B Parameter range validated
30 seconds Enrollment at 70B parameters
0 errors Across 1,012 pairwise identity tests
$ ls -la ./fallrisk.ai/
j k navigate   Enter open   Esc close   / search
fallrisk-ai · enrollment anchor
✓ ACTIVE
Model: meta-llama/Llama-3.1-8B-Instruct
Enrollment: enroll-2046c1bfea21
Architecture: transformer · 32 layers · 64D
Contract: itpuf-v0.1.0
Status: ✓ ACTIVE
Issuer: https://attest.fallrisk.ai
Evidence Class: structural (individual model identity)
Issued: 2026-04-08
Signature: eyJhbGciOiJSUzI1NiIsImtpZCI6ImZhbGxyaXNrLTk2Y2Q1ZT…
THE PHYSICS
0/1,012 FAR across the Sprint 2 weights zoo ε ≈ 1.003×10⁻⁴ (EVT-derived acceptance threshold) δnorm = 0.318 ± 2% (scale-invariant, 410M–72B) Frontier-validated: 8B → 72B (30-second enrollment) 5 architecture families: Llama · Qwen · Mistral · Gemma · Phi
THE MATHEMATICS
NoSpoofing.v — no sweet spot at any KL budget EvidenceSufficiency.v — cross-layer non-substitution GapInvariance.v — measurement invariant under log-softmax, temperature, constant shift StructuralAuthorization.v — artifact bound to request, within freshness window 350+ theorems verified · 0 Admitted
Explore Registry →

This is what your policy engine consumes.

The visible edge of 13 papers, 3 technical notes, and 350+ machine-verified theorems.

A modern AI deployment stack typically answers two identity questions well. Artifact identity — what was shipped — is handled by registries, version control, and bills of materials. Agent identity — what software is making this request — is handled by OAuth, SPIFFE, mTLS, and the new agent identity suites from Okta, Microsoft Entra, and NIST's concept paper on software agent identity.

These are real controls. But they authenticate the software harness — the orchestration code, the API gateway, the service mesh. They do not verify the neural network inside it.

COVERED
1. What artifact was shipped? — registries, version control, BOMs
2. What agent or workload is authenticated? — OAuth, SPIFFE, Okta, Entra, mTLS
UNVERIFIED AT RUNTIME
3. What model is actually computing?
4. Whose training lineage is present?

The software identity layer can work perfectly while the model identity layer is entirely absent. This is not a future risk — it has already produced public incidents in which authenticated agents served undisclosed model foundations, and poisoned packages passed every integrity check using legitimate credentials.

We have measured this directly: three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. In every scenario, the workload identity, artifact integrity, and API authentication remained valid. In every scenario, the structural identity measurement detected the substitution and the policy gate denied the request.

Technical notes: Agent Identity Is Not Model Identity · Measured Model Substitution · Zenodo, 2026

Three engagement modes. One escalation path.

Public
Registry
Browse 75 enrolled models. Verify cryptographic signatures client-side. Download verifier kits. Inspect sample attestation JWTs and deny events.
Confidential
Enrollment
Confidential measurement for your models. Custom enrollment anchor. Signed attestation. Integration guidance.
Continuous
Attestation
Runtime structural checks on a schedule. Signed JWT stream consumed by OPA, Cedar, Envoy, or your existing policy engine. Drift monitoring. Compliance reporting. Annual subscription.

Every neural network carries a unique structural fingerprint — not from what it says, but from the geometry of how it decides what to say. The fingerprint is a mathematical consequence of the architecture, not something anyone inserted.

Weights
Direct measurement
Send challenge prompts. Measure the internal response at two sites. Compare the resulting 64-dimensional fingerprint against an enrolled anchor. Accept or reject. No retraining. No model modification.
API
No weight access needed
Standard logprob endpoints expose enough structural geometry to verify identity across independent sessions. No weights required, no operator cooperation needed.
Zero-Knowledge
Prove without revealing
A zero-knowledge proof attests the model matches its enrolled identity without revealing the fingerprint, the anchor, or the methodology. Hardware attestation binds to a cryptographic root.
Attestation
The signed claim
The result travels inside a standard JWT or SPIFFE SVID, composing with OAuth 2.0, OPA, and EAT-based authorization flows (RFC 9711). No protocol changes. No new infrastructure. One new fact: which model is actually running.

This end-to-end path — from forward pass through signed claim to policy decision — has been validated at 70 billion parameters in approximately 30 seconds.

Not a replacement. The layer none of them provide.

Watermarks require insertion at training time and are removed by fine-tuning. Model cards describe artifacts, not running systems. Behavioral tests measure what a model says, not what it is. Output monitoring watches downstream effects without verifying upstream identity. Each solves a real problem. None answer the structural question.

Property Watermarks Model Cards Behavioral Tests Output Monitoring Structural Fingerprint
Works without training-time insertion
Survives fine-tuning Partial
Survives distillation
Works without weight access
Verifies the running model, not a document Partial Partial
Cryptographically verifiable
Formally proved unforgeable
Composable with existing auth stack Partial

The best deployment uses several of these together. Watermarks where you control training. Behavioral tests for capability monitoring. Output monitoring for safety. And structural fingerprinting for the one question the others can't answer: is this still the model you approved?

0 / 1,012
Identity tests — weights regime. 23 models, 16 vendor families, 3 architecture types. No model was ever mistaken for another.
0 / 14
Identity tests — API regime. 14 models, 3 providers. Every model correctly distinguished across independent sessions.
370
Formally verified theorems across 22 Coq proof files — publication scope. Zero uses of Admitted.
1,536
Measurements inside H100 confidential computing enclave. Zero failures.
1.4%
Fingerprint variation across 31 training checkpoints, 4 architectures, 3 model families. The structural identity does not drift under training pressure.
4
Patents assigned to Fall Risk AI, LLC. Weights identity + API verification + zero-knowledge attestation + identity-conditioned inference.
8,518×ε
Maximum structural separation across five reasoning-distillation pairs (1.5B–70B) — three architectural families. A sixty-fold range in scar magnitude: Mistral loudest, Llama intermediate, Qwen quietest. All above detection floor.
Model Supply Chain Verification
A vendor ships a model built on an open-weight base. The marketing says proprietary. The API response says otherwise — if anyone thinks to look. No security team checks which model is actually serving production. No compliance process verifies origin. IT-PUF verifies the structural fingerprint of what is running against what was enrolled at deployment — no leaked model ID required.
Distillation Forensics
16 million exchanges. 24,000 fraudulent accounts. The resulting models carry the teacher’s fingerprint. IT-PUF detects provenance transfer across families, architectures, and training protocols. The adversary cannot erase the trace without degrading the capabilities the distillation was meant to acquire. The signal fades with continued training — making continuous monitoring, not periodic audits, the operational requirement.
Regulatory Compliance — EU AI Act
Current monitoring checks outputs. Nothing checks whether the model itself has been swapped. Article 15 requires continuous monitoring of high-risk AI systems. Deadline: August 2026. IT-PUF provides model-level identity attestation: the system in production right now is the system that passed your approval process. With the zero-knowledge tier, provable without disclosing proprietary model internals to the regulator.
Insurance and Audit
Not a checkbox. Not a vendor assertion. Cryptographic proof. IT-PUF’s hardware-attested measurement runs inside an NVIDIA H100 / Intel TDX enclave and produces a signed certificate binding model identity to specific weights. The insurer verifies without seeing the model. The policyholder proves compliance without disclosing trade secrets.
Internal Model Governance
Which deployment is running which version? Did the hotfix propagate? Is staging accidentally serving production? IT-PUF: enroll at deployment, verify on demand or on schedule, detect identity drift. Non-invasive. Runs during normal inference. Under 50 seconds for a 7B model. Architecture-agnostic: Transformer, Mamba, MoE, hybrid.
Agentic AI Authorization
Every agent identity framework asks three questions. None of them ask which model is inside the agent — because none of them have a way to answer it. Directory entries describe agents. Workload identities authenticate deployments. Signed tokens authorize actions. No layer of the current stack establishes which neural network is reasoning. IT-PUF answers the fourth question: bind the model’s structural fingerprint to the authorization token. No protocol changes — the claim travels inside a standard JWT or SPIFFE SVID. Four security properties proved in Coq. Zero silent assumptions.
Detecting Published Safety-Alignment Removal
Open-source tools now automate the removal of safety alignment from language models. The modified checkpoints preserve the API contract and are optimized for low KL divergence from the original. The internal activation geometry still changes. Published abliterated checkpoints across two model families and three toolchains were measured against aligned bases under the hardened instrument configuration. Gemma-3-12B: Heretic 317.5–367.6×ε, mlabonne 1,556.8–2,319.4×ε. Llama-3.1-8B: Heretic 7.6–12.0×ε, OBLITERATUS 45.1–53.1×ε. Sentinel panel: 5/5 PASS across four model families (Gemma, Llama, Qwen, Mistral). Zero degradation of any prior positive. Family-dependent sensitivity reverses between distillation and abliteration — Gemma quiet under distillation but loud under abliteration; Llama the opposite. In the tested cases, published safety-alignment removal left a measurable structural scar — even when the tool explicitly optimized for output preservation.

Model-identity attestations compose with existing enterprise authorization infrastructure — OAuth 2.0, SPIFFE, SCIM — without protocol modifications. The fingerprint travels as a compact claim inside a standard JWT or SPIFFE SVID.

Sample Verification Certificate ✓ PASS
Report ID:           FR-2026-5B127509
Date of Measurement: 2026-03-17T08:43:15Z
Verification Result: PASS

MODEL
  Identifier:        Mistral-7B
  Architecture:      transformer
  Weight File Hash:  [redacted]
  Evidence Class:    Structural (individual model identity)
  Trust Mode:        TEE-backed (hardware-attested measurement)

MEASUREMENT
  Fingerprint Dims:  64
  Valid Measurements: 64/64 (0% failure)

FINGERPRINT VERIFICATION
  Fingerprint Digest: [redacted]
  Bundle Digest:      [redacted]
  Match:              UNIQUE (0 collisions across 6-model zoo)

ATTESTATION CHAIN
  CPU (Intel TDX):   CC State ON, Ready state ready
  GPU (NVIDIA CC):   H100 80GB HBM3, CC mode active
  Binding:           gpu_nonce = SHA256(bind_root) [verified]

TRUST BOUNDARY DISCLOSURE
  This certificate verifies STRUCTURAL IDENTITY only.
  It does NOT verify: performance, safety, fitness for
  purpose, training data, or regulatory compliance.

ISSUED BY: Fall Risk AI, LLC | integrations@fallrisk.ai | fallrisk.ai

Sensitive fields redacted for public display. Full certificate issued to authorized parties only.

Deployment Mode Who Measures Who Signs Who Trusts
SaaS Fall Risk Fall Risk issuer Customer configures Fall Risk as trusted issuer (like Auth0 or Okta)
Enterprise Fall Risk Customer-scoped key Customer trusts only their scoped key — full tenant isolation
Sovereign Fall Risk (measuring authority) Customer signs Customer owns the trust chain — Fall Risk provides measurement only

Ten security properties of the composition are formally classified: four proved in Coq, three traced to existing standards, one implemented, two design-constrained. Zero silent assumptions. Download technical brief →

The Continuity Gap
Your AI gateway authenticates the agent. It does not verify the model. We measured three substitution scenarios against a live gateway with signed credentials — all detected in under 7 seconds.
Read the brief →
Abliteration Is a Supply Chain Attack
Three toolkits now automate safety-alignment removal in minutes. The modified models pass behavioral tests. They fail structural measurement.
Full article publishing April 2026
Agent Identity ≠ Model Identity
Okta authenticates the process. SPIFFE authenticates the workload. Neither verifies the neural network inside.
Full article publishing April 2026

A Fall Risk Advisory is a structured operational record. It documents a measured threat to model identity continuity, names the affected models, describes the detection method, and recommends actions for relying parties. Where the papers establish what is provable, advisories establish what has been observed in the wild.

Each advisory carries a stable identifier of the form FRA-YYYY-NNN. The canonical home is attest.fallrisk.ai/advisories/ — the same authority surface that issues the signed registry.

Every scenario below succeeds while the agent identity stack reports green. The credentials are valid. The attestation passes. The audit log looks normal. The model changed.

Scenario A — Model Substitution Behind a Stable Endpoint
An operator replaces the model checkpoint behind a SPIFFE-authenticated endpoint. The service mesh identity does not rotate because the process did not restart.
Remains green: SPIFFE identifier, X.509-SVID credentials, workload attestation, mTLS authentication, OAuth authorization, audit log.
Changed: the neural network computing the responses. The replacement is architecturally identical — same parameter count, same API contract, different weights. No component in the identity stack detects the substitution.
Scenario B — Supply Chain Poisoning with Valid Attestation
Model weights are substituted inside a container before the image is built. The container hash matches the registry. SPIRE attestation passes. The artifact is intact. The computation is compromised.
Remains green: container image hash, SPIRE attestation (correct platform signals, correct service account), SVID issued normally.
Changed: the model weights. The hash verified the file, not the computation. This is the pattern seen in the LiteLLM/TeamPCP incident (March 2026) — legitimate credentials carried compromised content — transposed to the model layer.
Scenario C — Silent Model Rotation by an API Provider
A provider silently rotates the model behind a versioned endpoint to a cheaper variant. The endpoint URL does not change. The API contract does not change. The model changes.
Remains green: OAuth token, API authentication, transaction tokens, authorization scopes, audit records (same endpoint, same grants).
Changed: the model. This is the pattern observed in the Cursor/Kimi K2.5 incident (March 2026), where a flagship product was identified as running an undisclosed model foundation — discovered by a developer who intercepted an API response, not by any identity mechanism.
Scenario D — Internal Fine-Tuning Drift
Nobody changed anything maliciously. An authorized team fine-tunes the enrolled model. The fine-tuned variant inherits the same workload identity. The model drifted.
Remains green: every identity and authorization control — this is a legitimate operational change by authorized personnel.
Changed: the model's behavioral properties. The fine-tuning may have shifted the model past the boundary of what was originally authorized. No adversary involved. No credential compromise. Just operational drift that the governance stack cannot see, because it was designed to measure the wrapper, not the model.
Observation — Even Model Existence Is Established Post-Hoc
In March 2026, a frontier AI company’s most capable model was revealed to the public through a misconfigured content management system — nearly 3,000 unpublished assets left in an unsecured, publicly searchable data store. Cybersecurity stocks lost billions in market value within hours. The model’s existence was not disclosed through any identity or attestation mechanism — it was disclosed by accident. While the scenarios above describe model substitutions going undetected at runtime, this incident demonstrates a deeper void: even the baseline question of which model exists is currently answered through leaks and public disclosures rather than measurement. The identity gap extends from deployment all the way back to development.

Scenarios A, B, and C have been measured against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three substitutions tested, three detected, zero false accepts.

Abliterated checkpoints across two model families and three toolchains are structurally detectable at hardened measurement depth: Gemma 317.5–2,319.4×ε, Llama 7.6–53.1×ε. Sentinel panel 5/5 PASS across four families.

In April 2026, the LiteLLM supply-chain compromise escalated: Mercor, a $10B AI recruiting startup working with OpenAI and Anthropic, confirmed breach via the poisoned LiteLLM package. Over 1,000 SaaS environments affected. LiteLLM routes AI model requests for an estimated 36% of cloud environments — the model-routing layer that CAT-1 named as an incident class two days after the initial compromise.

These scenarios are grounded in public incident patterns and the architecture described in draft-klrc-aiagent-auth-01 (IETF, March 2026). They do not resolve by strengthening agent authentication. They resolve when structural model identity is composed into the existing agent identity infrastructure.

Four capabilities. Each answers a different question about the model in your deployment.

VERIFY
Is the model in production the same one you approved? Measure the structural fingerprint and compare it against the enrolled anchor. Works with direct weight access or through commercial API endpoints. Frontier-validated at 70 billion parameters. Frontier-validated
MONITOR
Continuous verification on a schedule. Detect drift, detect substitution, generate signed attestation records for your audit trail. Re-enrollment triggers automatically when the model changes. Validated
PROVENANCE
Was this model distilled from someone else's? Provenance detection identifies the teacher's fingerprint across families, architectures, and training protocols. The trace fades with continued training — making early detection the operational requirement. Ships with measurability caveat: baseline–teacher separation must exceed a calibrated threshold.
ATTEST
Prove model identity without revealing the model. A zero-knowledge proof confirms the fingerprint matches the enrolled anchor — without disclosing weights, measurement methodology, or the fingerprint itself. Hardware attestation (Intel TDX + NVIDIA H100) binds the proof to a cryptographic root of trust. Software attestation frontier-validated at 70B (JWT + OPA). Hardware-attested and ZK tiers validated at sub-7B.

Not every endpoint is attestable. Some providers do not expose logprobs, some expose too few, and some produce degenerate distributions. The first step in any engagement is an INTAKE assessment: a free eligibility check that determines whether the endpoint supports measurement and at what confidence tier. A "cannot attest" finding is itself a compliance-relevant result — it means no one can verify which model is running, including you.

What we provide

  • Model enrollment and identity verification (weights or API regime)
  • Continuous monitoring with signed attestation records
  • Distillation provenance detection
  • Privacy-preserving forensic certificates (zero-knowledge attestation)
  • Formally verified JWT and SPIFFE composition — model identity as a first-class claim in your existing authorization flow
  • Consultation for regulated deployments, compliance obligations, and standards initiatives

What we do not provide

  • Open-source measurement tools
  • Self-service enrollment
  • Unattended access to the measurement engine
  • Watermarks or training-time modifications — the fingerprint is structural, not inserted
  • Credential management, directory services, or access revocation — that is your identity stack's job
  • Probabilistic behavioral heuristics — the measurement is geometric and deterministic, not a benchmark score
A forensic instrument that anyone can download is a forensic instrument that anyone can study for evasion.

Each headline is an instance of the problem this system was built to solve. Tags mark the threat class.

Fall Risk AI — Mar 30, 2026 Substitution Three model substitution scenarios measured against a live gateway. Workload JWT, health checks, artifact manifests all green. HTTP 200 before substitution, HTTP 403 after. Every traditional control passed. Only model identity caught it. Fall Risk AI — Mar 2026 Identity Agent Identity Is Not Model Identity. Technical note on the category distinction between authenticating the software harness and verifying the neural network inside it. Awesome Agents — Mar 5, 2026 Substitution OBLITERATUS strips AI safety from open models in minutes. 13 abliteration methods, 116 supported models, 1,000+ GitHub stars in 24 hours. The article notes that “weight-level modifications are permanent and undetectable from the model’s outputs alone.” A different measurement regime disagrees. Snyk — Mar 24, 2026 Substitution LiteLLM backdoored via compromised credentials. 3.4 million daily downloads. Hash verification passed — the malicious package was correctly declared, correctly signed, and correctly hashed. Artifact integrity confirmed the file. It could not confirm what the file did. April 2026 Substitution LiteLLM/Mercor escalation: $10B AI recruiting startup (OpenAI, Anthropic client) confirmed breach via the same poisoned LiteLLM package. 1,000+ SaaS environments affected. LiteLLM present in ~36% of cloud environments. CAT-1 named this incident class two days after the initial compromise occurred. Wall Street Journal — Mar 21, 2026 Substitution Companies Say the Risks of ‘Open’ AI Models Are Worth It. Enterprises adopting open-weight models for cost and customization cite security risks as “manageable” — but the article focuses on data exposure and prompt injection, not model provenance. TechCrunch — Mar 22, 2026 Substitution Cursor admits Composer 2 was built on Moonshot AI’s Kimi K2.5. Identified by a developer who intercepted the outbound API request and found the model ID in plain sight. VentureBeat — Mar 23, 2026 Substitution The story is not about one company’s disclosure failure. It is about why the most capable open foundations disproportionately come from Chinese labs — and what that means for AI supply chain transparency. Asia Times — Mar 20, 2026 Agentic OpenClaw goes viral in China, raising cybersecurity fears. Tencent and Alibaba adoption. Email deletion scare. Agents handling sensitive personal data across enterprise environments. Hugging Face — Mar 16, 2026 Substitution 2 million public AI models on HuggingFace. 230 new uploads per minute. Qwen alone has 200,000+ derivatives. The ecosystem is scaling faster than any governance framework anticipated. Okta — Mar 16, 2026 Compliance Blueprint for the Secure Agentic Enterprise — where are agents, what can they connect to, what can they do. GA April 30, 2026. Hugging Face — Mar 2026 Distillation Qwen3.5-27B fine-tuned on 14,000 Claude Opus outputs, publicly released on HuggingFace as “Claude 4.6 Opus Reasoning Distilled.” The model name claims one lineage. The weights carry another. Codewall — Mar 9, 2026 Agentic How We Hacked McKinsey’s AI Platform — full read/write access to production database in 2 hours. System prompts exposed and rewritable. Anthropic — Feb 23, 2026 Distillation Detecting and Preventing Distillation Attacks — 24,000 fraudulent accounts, 16 million exchanges, unauthorized model copies in production. The New Yorker — Feb 12, 2026 Identity What Is Claude? Anthropic Doesn’t Know, Either. Researchers examining neurons, running psychology experiments, putting a model on the couch. The question of what a model is — not what it says — remains open. NIST NCCoE — Feb 2026 Compliance Software and AI Agent Identity and Authorization — concept paper seeking stakeholder input. Comment deadline April 2, 2026.

The EU AI Act and NIST Generative AI Profile are moving toward requiring cryptographic model traceability — not just documentation, but verifiable identity. The four-level framework maps directly to those requirements. Paper VIII in this series extends the mapping into a formal admissibility standard: each compliance question has an evidence class that can answer it, and evidence from the wrong class incurs inferential debt. Documentation identifies artifacts. Evidence identifies models.

Framework level What it establishes EU AI Act NIST GenAI Profile (AI 600-1)
Structural fingerprinting
IT-PUF · weights regime
Unambiguous, unforgeable model identity — independent of operator claims Art. 11 (technical documentation), Art. 49 + Annex VIII (unambiguous identification and traceability) GV-6.2: contracts specifying provenance expectations; MS-2.5: monitoring adherence to provenance standards
Hardware-attested binding
TEE · enclave measurement
Cryptographic binding of fingerprint to specific weight artifact — tamper-evident deployment record Art. 12 (automatic logging, audit trail integrity), Annex IV §2 (system description with sufficient detail to assess conformity) MS-2.6: detection of unauthorized changes; GV-1.7: organizational risk policies covering third-party model supply chain
Verified computation path
ZK circuit · hybrid verifier
Proof that the identified model computed honestly — not just that some weights were used Art. 13 (transparency, output traceability for downstream providers), Art. 17 (quality management: verification that deployed system matches documented system) MS-2.5: provenance of model outputs; MP-2.3: documenting AI system decisions in regulated contexts
Output binding
Token logit · evidence bundle
Traceable link from verified identity through verified computation to a specific output — the audit record closes Art. 12 §1(d): logs must enable identification of input data and attribution of outputs; Art. 26 (deployer obligations: monitor, log, maintain records) GV-6.2: content provenance at output level; MS-4.2: real-time monitoring of deployed model behavior against documented baseline

This mapping is descriptive. It identifies where the framework's technical capabilities are relevant to stated regulatory requirements — it does not constitute a compliance certification. The EU AI Act high-risk provisions take full effect August 2026.

EU AI Act — Regulation (EU) 2024/1689, Official Journal of the European Union. eur-lex.europa.eu
NIST AI 600-1 — Generative Artificial Intelligence Profile, National Institute of Standards and Technology. doi.org/10.6028/NIST.AI.600-1

Each paper opened a question the previous one could not answer. Thirteen papers. Three technical notes. Zero retracted.

Paper XIII — 2026
Open-source toolkits strip a model's safety constraints while leaving its outputs looking normal. The structural fingerprint changes anyway — and we can detect it.
Publicly available toolchains remove safety constraints from AI model weights while preserving observable behavior. The modification is invisible to every deployed trust layer — but structurally measurable. Fourth deformation class identified.
Two model families (Gemma-3-12B, Llama-3.1-8B), three toolchains (Heretic, mlabonne, OBLITERATUS), four abliterated checkpoints. Structural scars range from 7.6×ε to 2,319.4×ε. Family-dependent sensitivity reverses between distillation and abliteration. Sentinel panel across four families: 5/5 PASS, zero degradation. OBLITERATUS blind spot discovered at initial measurement depth → hardened configuration → all prior positives preserved. The admissibility doctrine — formally verified before this threat class existed — predicted exactly this outcome.
DOI: 10.5281/zenodo.19383019 2 families · 3 toolchains · 4 checkpoints Sentinel 5/5 PASS · 4th deformation class
Technical Note — CAT-3, 2026
Three model substitutions run against a live gateway with valid agent credentials. Three detected. Zero false accepts.
Three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three detected. Zero false accepts. HTTP 200 before. HTTP 403 after.
Scenario A: same-family substitution behind a stable endpoint — workload JWT, health checks, gateway PID, and policy hash all unchanged; model identity was the sole differentiating evidence layer (2,858×ε). Scenario B: cross-family substitution with both artifact manifests passing hash verification (3,416×ε). Scenario C: silent API rotation between gpt-4.1-mini and gpt-4.1-nano using the same API key and endpoint — per-model thresholds reject. Warm-path verification: 5.7–6.7 seconds with the model already loaded. Not inline per-request — runs at model load, on schedule, or as an out-of-band health check.
DOI: 10.5281/zenodo.19342848 3 scenarios · 3 detected · 0 false accepts HTTP 200 → 403 · OPA enforcement
Paper XII — 2026
The same distillation event leaves different traces in different architecture families. The structural and functional identity layers can decouple.
The same distillation event leaves different traces in different architectural families — not just in magnitude, but in mode and cross-layer coupling. The structural and functional identity layers can decouple.
Five reasoning-distillation pairs across three base families (Llama, Qwen, Mistral) at five scales. Structural scars span a sixty-fold range: Mistral loudest (7,701–8,518×ε), Llama intermediate (2,858–4,583×ε), Qwen quietest (141–516×ε). Functional hierarchy breaks in Llama, absent in Qwen, marginal in Mistral — despite Mistral carrying the loudest structural scar. Cross-layer decoupling observed empirically for the first time. Stiffness at the measurement site inversely orders with scar magnitude across all three families. Fisher curvature, previously proposed as a candidate mechanism, does not correctly order scars at production scale.
DOI: 10.5281/zenodo.19298857 5 pairs · 3 families · 60× range Cross-layer decoupling · Fisher falsified
Technical Note — CAT-2, 2026
The order-statistic measurement used in API verification is provably invariant to log-softmax, temperature, and constant shifts.
The API wall is narrower than previously understood. The log-softmax transformation does not change the measurement — by mathematical identity, not by empirical robustness.
Order-statistic gaps are exactly invariant to log-softmax, temperature scaling, and any position-independent constant shift. Five theorems, formally verified in Coq (GapInvariance.v, 0 Admitted). Any API measurement deviation must come from truncation or quantization, never from the probability-domain transformation itself.
DOI: 10.5281/zenodo.19275524 5 theorems · 0 Admitted API invariance proved
Technical Note — CAT-1, 2026
Existing agent identity systems authenticate the agent. They cannot tell you which neural network is computing the response.
Why authenticating the software is not the same as proving which model is actually computing. The category distinction, two incident classes, and a four-question taxonomy.
Current agent identity frameworks (OAuth, SPIFFE, Okta, Entra) authenticate the software harness. They do not verify the neural network inside it. A four-question taxonomy separates artifact identity, workload identity, model identity, and training lineage into distinct evidence classes. Two incident classes — undisclosed model substitution and supply-chain poisoning with valid credentials — demonstrate the operational consequences.
DOI: 10.5281/zenodo.19240883 4-question taxonomy · 2 incident classes
Paper XI — 2026
Disclosing a model's lineage after the fact is not the same as proving it at runtime. Validated at frontier scale (8B–72B).
Every incident in March 2026 was discovered after the fact. Post-hoc disclosure is not runtime proof. This paper demonstrates that runtime model identity is technically feasible at the model sizes where those incidents occurred.
Five frontier models enrolled (8B–72B), zero identity errors. Three declared-lineage distillation pairs — sharing identical architecture with their bases — produced structural separations of 2,858×ε (8B), 3,616×ε (14B), and 4,583×ε (70B) across two base-model families. These observations were flagged as exploratory; Paper XII subsequently confirmed the pattern is family-dependent rather than scale-dependent. Software attestation path (signed JWT → OPA policy decision) validated at 70B in 30 seconds. Thermodynamic invariant δ_norm confirmed scale-free across 25 models spanning two orders of magnitude.
DOI: 10.5281/zenodo.19216634 Frontier-validated · 8B–72B 3 distillation pairs (expanded to 5 in Paper XII) · JWT+OPA
Paper X — 2026
How structural identity forms during training, and why two models with identical architectures and recipes are not interchangeable.
Structural identity is not merely something a model has when measurement begins. It is something training builds, compresses, and locks — a record of the path by which the model became itself.
154 checkpoints. Ten seed-controlled runs. Three results: a three-phase emergence profile (identity locks at step 92,000 — the final 36% of training doesn't move it), path sensitivity (same recipe, different seed, fingerprints 391× to 11,737× apart), and endpoint underdetermination (tested weight statistics do not predict which identity formed). Formally proved in HistoricalIdentity.v: trajectory non-recovery and lock boundary source exclusion. Zero Admitted.
DOI: 10.5281/zenodo.19118807 HistoricalIdentity.v · 0 Admitted 154 checkpoints · 10 seeds · 3-phase emergence
Paper IX — 2026
How structural attestations compose with existing enterprise identity systems (JWT, SPIFFE, OPA), formally hardened against forgery.
Enterprise identity stacks authenticate workloads and credentials. They do not verify which neural network is computing inside them. This paper closes that layer — formally.
Live integration architecture for model-identity attestations in JWT and SPIFFE token flows, grounded in H100 Confidential Computing enclave measurements. Four composition properties proved in Coq: non-separability, temporal binding necessity, issuer authenticity, reference integrity. Every remaining trust dependency named, traced, and paired with a falsification witness. Zero OPEN rows. Zero silent assumptions.
DOI: 10.5281/zenodo.19099911 13 theorems · 0 Admitted · 3 proof files JWT · SPIFFE · OAuth 2.0 · SCIM
Paper VIII — 2026
A formal admissibility framework for identity claims — what evidence is sufficient under a given threat model, and what is not.
Documentation identifies artifacts. Evidence identifies models. Current governance practice conflates the two.
Three evidence classes, each answering a different question: structural (which specific model?), thermodynamic (genuinely a neural network?), functional (distilled from an unauthorized source?). Formally proved in Coq that the classes cannot substitute for one another — three inadmissibility directions, zero gaps. Mapped to EU AI Act, NIST AI 600-1, and IETF provenance standards.
DOI: 10.5281/zenodo.19058540 Formally proved · 0 gaps EU AI Act · NIST AI 600-1 · IETF
Paper VII — 2026
Three layers of model identity (structural, thermodynamic, functional) and the distinct laws that govern how each one changes.
Three layers. Three deformation laws. None shared. Any identity claim that doesn't declare which layer it addresses is borrowing evidence it hasn't earned.
Structural layer: training-determined, load-bearing under attack (the model collapses before the fingerprint moves). Thermodynamic layer: approximately universal across 22 Transformer runs (CV 3.5%). Functional layer: transferred by distillation, erased by routine fine-tuning within two epochs. Two falsifications: the fingerprint does not reduce to a gauge projection (1.3% of the observable), and it is not predictable from architecture features (LOO R² = −3.93).
DOI: 10.5281/zenodo.19055966 22 models · 106 checkpoints Three layers · two channels · two falsifications
Paper VI — 2026
Verifying which specific model produced a specific inference, with cryptographic proof bound to the request.
zkML proves computation. We prove identity first.
A weight commitment proves which bytes were used. It does not prove which model those bytes belong to. Four-level framework: structural fingerprinting, hardware-attested binding, hybrid verifier-checkable decoder layer (124 negative tests, 0 failures), and output binding to a claimed token logit. When a rescaling error compressed the fingerprint to ~1.5 bits of dynamic range, structural identity retained 0.98 rank correlation. Identity may live in relational geometry, not activation magnitude.
DOI: 10.5281/zenodo.19008116 ~296K constraints · 124 tests · 0 failures Identity-first zkML
Paper V — 2026
Mathematical evidence that AI identity has a structural layer beneath the conversational character.
Is there a there there? There is. And the proof compiles.
Gideon Lewis-Kraus asked in The New Yorker: "What is Claude? Anthropic doesn't know, either." This paper answers the prior question. Two separable layers: structural identity (weight geometry — invariant, unforgeable, not a watermark) and functional identity (behavior, tone, the performed self). Neither reduces to the other. The structural layer is a consequence of the softmax bottleneck — demanded by the mathematics, not inserted by design.
DOI: 10.5281/zenodo.18907292 Philosophy of AI identity Dennett · Parfit · Schechtman
Paper IV — 2026
The teacher-student forensic signal generalizes across architectures and tokenizers. The verification protocol scales to large model zoos.
All three zero-knowledge tiers validated. Provenance transfer generalizes across families. API verification scales with zero breaches.
14 models, 0 / 14 API breaches. Provenance transfer across 3 teacher families, 4 student architectures, 2 training protocols. ZK Tier 1: committed distance proof, 7,656 constraints, 128-byte proofs. Tier 2: 1,536 H100 enclave measurements, 0 failures. Tier 3: full zero-knowledge extraction, ~296K constraints, 124 adversarial tests, 0 failures.
DOI: 10.5281/zenodo.18872071 14 models · 0 / 14 CRP ZK all 3 tiers validated
Paper III — 2026
A trained model carries the geometric trace of its teacher. Adversarial attempts to erase it lose to passive fine-tuning, which eventually wins.
The adversary's full white-box knowledge buys nothing. Passive fine-tuning outperforms adversarial erasure. The structural fingerprint doesn't move.
54 adversarial checkpoints. Structural identity invariant under distillation. Functional trace partially transfers, degrades under continued training. Apparent cross-family spoofing is geometric coincidence (R² = 0.995). Pareto frontier: no configuration achieves both trace erasure and capability preservation.
DOI: 10.5281/zenodo.18818608 54 checkpoints · δnorm CV 1.9%
Paper II — 2026
Verifying which model is behind a commercial API endpoint when all you have access to is its logprobs.
The fingerprint survives through commercial API interfaces. No weight access required.
PPP-residualized gap templates enable cross-session model identity verification through standard logprob endpoints. Zero breaches across 6 models, 3 providers, 3 independent sessions. Conditional API spoofing impossibility: 41 Coq theorems, zero Admitted.
DOI: 10.5281/zenodo.18776711 41 theorems · 0 Admitted 6 models · 0 / 120 (per-model τ)
Paper I — 2026
The structural fingerprint that makes neural network identity measurable at inference time. The foundation paper.
No model was ever mistaken for another. Across 1,012 tests. Formally impossible to forge.
The δ-gene — the third pre-softmax logit gap — is a temperature-invariant structural fingerprint determined by training-induced weight geometry, not by what the model is saying. The IT-PUF protocol: 23 models, 16 families, 3 architecture types, 0 false acceptances. Spoofing impossibility: 311 Coq theorems, zero Admitted.
DOI: 10.5281/zenodo.18704275 311 theorems · 0 Admitted 23 models · 1,012 tests · 0 errors

Fall Risk AI, LLC · New Orleans, Louisiana
Anthony Coslett is an independent researcher studying the structural identity of neural networks. He is the sole principal investigator of the Fall Risk AI research program. Evidence from the research has been placed into proceedings at the EU AI Office, NIST, and the IETF.

The name Fall Risk comes from the inherent fragility of identity itself. Whether human or artificial, identity has the capacity to suddenly collapse. By labeling the AI as a “fall risk,” we are acknowledging that vulnerability and building the structural measurement tools necessary to ensure its safety — and the safety of the enterprises built around it.

Design Partner Pilot

One model family. One integration point. One governance outcome.

Includes: enrollment of your model(s), signed attestation, integration with one policy engine (OPA, Cedar, Envoy, or SPIFFE), and a governance assessment.

Mutual NDA available on request — legal@fallrisk.ai

fallrisk.ai $ Tab complete   / focus