Fall Risk AI — Prove Which Model Is Running

▸

why-it-matters/ — The distinction every governance stack overlooks

A modern AI deployment stack typically answers two identity questions well. Artifact identity — what was shipped — is handled by registries, version control, and bills of materials. Agent identity — what software is making this request — is handled by OAuth, SPIFFE, mTLS, and the new agent identity suites from Okta, Microsoft Entra, and NIST's concept paper on software agent identity.

These are real controls. But they authenticate the software harness — the orchestration code, the API gateway, the service mesh. They do not verify the neural network inside it.

COVERED

1. What artifact was shipped? — registries, version control, BOMs

2. What agent or workload is authenticated? — OAuth, SPIFFE, Okta, Entra, mTLS

UNVERIFIED AT RUNTIME

3. What model is actually computing?

4. Whose training lineage is present?

The software identity layer can work perfectly while the model identity layer is entirely absent. This is not a future risk — it has already produced public incidents in which authenticated agents served undisclosed model foundations, and poisoned packages passed every integrity check using legitimate credentials.

We have measured this directly: three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. In every scenario, the workload identity, artifact integrity, and API authentication remained valid. In every scenario, the structural identity measurement detected the substitution and the policy gate denied the request.

Technical notes: Agent Identity Is Not Model Identity · Measured Model Substitution · Zenodo, 2026

▸

trust-modes/ — Public registry · Confidential enrollment · Continuous attestation

Three engagement modes. One escalation path.

Public
Registry

Browse 75 enrolled models. Verify cryptographic signatures client-side. Download verifier kits. Inspect sample attestation JWTs and deny events.

Explore Registry →

Confidential
Enrollment

Confidential measurement for your models. Custom enrollment anchor. Signed attestation. Integration guidance.

Request Pilot →

Continuous
Attestation

Runtime structural checks on a schedule. Signed JWT stream consumed by OPA, Cedar, Envoy, or your existing policy engine. Drift monitoring. Compliance reporting. Annual subscription.

Contact →

▸

how-it-works/ — IT-PUF mechanism · δ-gene · zero-knowledge · comparison

Every neural network carries a unique structural fingerprint — not from what it says, but from the geometry of how it decides what to say. The fingerprint is a mathematical consequence of the architecture, not something anyone inserted.

Weights

Direct measurement

Send challenge prompts. Measure the internal response at two sites. Compare the resulting 64-dimensional fingerprint against an enrolled anchor. Accept or reject. No retraining. No model modification.

API

No weight access needed

Standard logprob endpoints expose enough structural geometry to verify identity across independent sessions. No weights required, no operator cooperation needed.

Zero-Knowledge

Prove without revealing

A zero-knowledge proof attests the model matches its enrolled identity without revealing the fingerprint, the anchor, or the methodology. Hardware attestation binds to a cryptographic root.

Attestation

The signed claim

The result travels inside a standard JWT or SPIFFE SVID, composing with OAuth 2.0, OPA, and EAT-based authorization flows (RFC 9711). No protocol changes. No new infrastructure. One new fact: which model is actually running.

This end-to-end path — from forward pass through signed claim to policy decision — has been validated at 70 billion parameters in approximately 30 seconds.

Not a replacement. The layer none of them provide.

Watermarks require insertion at training time and are removed by fine-tuning. Model cards describe artifacts, not running systems. Behavioral tests measure what a model says, not what it is. Output monitoring watches downstream effects without verifying upstream identity. Each solves a real problem. None answer the structural question.

Property	Watermarks	Model Cards	Behavioral Tests	Output Monitoring	Structural Fingerprint
Works without training-time insertion	✗	—	✓	✓	✓
Survives fine-tuning	✗	—	Partial	—	✓
Survives distillation	✗	✗	✗	✗	✓
Works without weight access	✗	✓	✓	✓	✓
Verifies the running model, not a document	Partial	✗	Partial	✗	✓
Cryptographically verifiable	✗	✗	✗	✗	✓
Formally proved unforgeable	✗	✗	✗	✗	✓
Composable with existing auth stack	✗	—	✗	Partial	✓

The best deployment uses several of these together. Watermarks where you control training. Behavioral tests for capability monitoring. Output monitoring for safety. And structural fingerprinting for the one question the others can't answer: is this still the model you approved?

▸

validation/ — 0/1,012 · 0/14 · 370 theorems · 0 Admitted

0 / 1,012

Identity tests — weights regime. 23 models, 16 vendor families, 3 architecture types. No model was ever mistaken for another.

0 / 14

Identity tests — API regime. 14 models, 3 providers. Every model correctly distinguished across independent sessions.

370

Formally verified theorems across 22 Coq proof files — publication scope. Zero uses of Admitted.

1,536

Measurements inside H100 confidential computing enclave. Zero failures.

1.4%

Fingerprint variation across 31 training checkpoints, 4 architectures, 3 model families. The structural identity does not drift under training pressure.

Patents assigned to Fall Risk AI, LLC. Weights identity + API verification + zero-knowledge attestation + identity-conditioned inference.

8,518×ε

Maximum structural separation across five reasoning-distillation pairs (1.5B–70B) — three architectural families. A sixty-fold range in scar magnitude: Mistral loudest, Llama intermediate, Qwen quietest. All above detection floor.

Three categories of claims. Proven: machine-verified in Coq — compiles or does not. Validated: empirically demonstrated with stated evidence. Proposed: framework exists, falsification criteria pre-registered.

Claim	Evidence	Status
Fingerprint spoofing impossibility (weights)	T4_no_spoofing_interval_split (NoSpoofing.v) — KL budget exhaustion at all tested scales. 0 Admitted.	Proven
API spoofing impossibility (conditional)	API_NoSpoofing_3 (APINoSpoofing.v) — shift-equivariance + per-component cost floors + multi-prompt amplification. Under threat model TA1–TA4. 41 theorems, 0 Admitted.	Proven
Zero identity errors — weights regime	23 models, 16 families, 3 architecture types. 1,012 pairwise comparisons.	Validated
Zero identity errors — API regime	14 models, 3 providers (OpenAI, Google Vertex, xAI), 3 sessions per model. CRP protocol with per-model adaptive thresholds.	Validated
Adversarial erasure does not improve on passive fine-tuning	54 adversarial checkpoints. Best white-box erasure (Conv_T = 0.113) loses to passive SFT (Conv_T = −0.113) at equal capability cost. Pareto frontier has no favorable region.	Validated
Provenance transfer generalizes across families	7 experimental arms. 3 teacher families × 4 student architectures × 2 training protocols. 14/14 mature-epoch checkpoints directionally aligned (cos θ > 0.8). Includes MoE (Mixtral-8x7B).	Validated
Frontier structural identity (8B–72B)	5 models, 3 families. 0.0×ε self-verify. 10 pairwise rejections (67–2,512×ε). δ_norm within 2% of EVT prediction at 72B. Measurement pipeline unchanged from sub-7B.	Validated
Frontier structural separability under distillation	Five reasoning-distillation pairs across three base families. Structural scar magnitude is family-dependent: Mistral 7,701–8,518×ε, Llama 2,858–4,583×ε, Qwen 141–516×ε. All pairs above detection floor.	Validated
Distillation consequences are family-dependent	Five reasoning-distillation pairs, three families. Sixty-fold structural scar range. Cross-layer decoupling observed. Stiffness inversely ordered. Fisher curvature does not predict scar magnitude at production scale.	Validated
δ_norm scale invariance	25 models, 0.41B–72.7B. No significant scale correction (OLS p = 0.69). EVT prediction holds across two orders of magnitude.	Validated
Software attestation at frontier scale	JWT issuance + OPA policy validated at 70B. 30-second end-to-end. EAT profile (RFC 9711).	Validated
Speculative decoding transparent	d(SD, verifier) / d(verifier, draft) = 10.6%. Draft model invisible to protocol.	Validated
ZK Tier 1 — committed distance proof	Groth16 on BN254. 7,656 constraints. 128-byte proofs. 23 tests, 14 adversarial attacks rejected.	Validated
ZK Tier 2 — hardware-attested measurement	H100 + Intel TDX. 6 models, 1,536 measurements, 0 failures. Weight hashing byte-identical in TEE.	Validated
ZK Tier 3 — full ZK extraction	~296K constraints. 124 adversarial tests, 0 failures. ~85KB proof, <1s verification.	Validated
Non-separability of model-identity claims	ComposableIdentity.v A3 — claims cannot migrate without re-attestation. 0 Admitted.	Proven
Temporal binding necessity	ComposableIdentity.v B4 — stale-evidence vulnerability inherited by permissive verifiers. 0 Admitted.	Proven
Issuer authenticity	IssuerAuthenticity.v T1 — accepted token traces to authorized issuer. 0 Admitted.	Proven
Reference integrity	ReferenceIntegrity.v T1+T3 — bundle modification detectable via digest. 0 Admitted.	Proven
PPP gap measurement domain-invariance	GapInvariance.v — gaps invariant to log-softmax, temperature, constant shift. 5 theorems, 0 Admitted.	Proven

▸

use-cases/ — Supply chain · Distillation · Compliance · Insurance · Agentic

Model Supply Chain Verification

A vendor ships a model built on an open-weight base. The marketing says proprietary. The API response says otherwise — if anyone thinks to look. No security team checks which model is actually serving production. No compliance process verifies origin. IT-PUF verifies the structural fingerprint of what is running against what was enrolled at deployment — no leaked model ID required.

Distillation Forensics

16 million exchanges. 24,000 fraudulent accounts. The resulting models carry the teacher’s fingerprint. IT-PUF detects provenance transfer across families, architectures, and training protocols. The adversary cannot erase the trace without degrading the capabilities the distillation was meant to acquire. The signal fades with continued training — making continuous monitoring, not periodic audits, the operational requirement.

Regulatory Compliance — EU AI Act

Current monitoring checks outputs. Nothing checks whether the model itself has been swapped. Article 15 requires continuous monitoring of high-risk AI systems. Deadline: August 2026. IT-PUF provides model-level identity attestation: the system in production right now is the system that passed your approval process. With the zero-knowledge tier, provable without disclosing proprietary model internals to the regulator.

Insurance and Audit

Not a checkbox. Not a vendor assertion. Cryptographic proof. IT-PUF’s hardware-attested measurement runs inside an NVIDIA H100 / Intel TDX enclave and produces a signed certificate binding model identity to specific weights. The insurer verifies without seeing the model. The policyholder proves compliance without disclosing trade secrets.

Internal Model Governance

Which deployment is running which version? Did the hotfix propagate? Is staging accidentally serving production? IT-PUF: enroll at deployment, verify on demand or on schedule, detect identity drift. Non-invasive. Runs during normal inference. Under 50 seconds for a 7B model. Architecture-agnostic: Transformer, Mamba, MoE, hybrid.

Agentic AI Authorization

Every agent identity framework asks three questions. None of them ask which model is inside the agent — because none of them have a way to answer it. Directory entries describe agents. Workload identities authenticate deployments. Signed tokens authorize actions. No layer of the current stack establishes which neural network is reasoning. IT-PUF answers the fourth question: bind the model’s structural fingerprint to the authorization token. No protocol changes — the claim travels inside a standard JWT or SPIFFE SVID. Four security properties proved in Coq. Zero silent assumptions.

Detecting Published Safety-Alignment Removal

Open-source tools now automate the removal of safety alignment from language models. The modified checkpoints preserve the API contract and are optimized for low KL divergence from the original. The internal activation geometry still changes. Published abliterated checkpoints across two model families and three toolchains were measured against aligned bases under the hardened instrument configuration. Gemma-3-12B: Heretic 317.5–367.6×ε, mlabonne 1,556.8–2,319.4×ε. Llama-3.1-8B: Heretic 7.6–12.0×ε, OBLITERATUS 45.1–53.1×ε. Sentinel panel: 5/5 PASS across four model families (Gemma, Llama, Qwen, Mistral). Zero degradation of any prior positive. Family-dependent sensitivity reverses between distillation and abliteration — Gemma quiet under distillation but loud under abliteration; Llama the opposite. In the tested cases, published safety-alignment removal left a measurable structural scar — even when the tool explicitly optimized for output preservation.

▸

integration/ — JWT · SPIFFE · OAuth 2.0 · Trust modes · Certificate

Model-identity attestations compose with existing enterprise authorization infrastructure — OAuth 2.0, SPIFFE, SCIM — without protocol modifications. The fingerprint travels as a compact claim inside a standard JWT or SPIFFE SVID.

Sample Verification Certificate ✓ PASS

Report ID:           FR-2026-5B127509
Date of Measurement: 2026-03-17T08:43:15Z
Verification Result: PASS

MODEL
  Identifier:        Mistral-7B
  Architecture:      transformer
  Weight File Hash:  [redacted]
  Evidence Class:    Structural (individual model identity)
  Trust Mode:        TEE-backed (hardware-attested measurement)

MEASUREMENT
  Fingerprint Dims:  64
  Valid Measurements: 64/64 (0% failure)

FINGERPRINT VERIFICATION
  Fingerprint Digest: [redacted]
  Bundle Digest:      [redacted]
  Match:              UNIQUE (0 collisions across 6-model zoo)

ATTESTATION CHAIN
  CPU (Intel TDX):   CC State ON, Ready state ready
  GPU (NVIDIA CC):   H100 80GB HBM3, CC mode active
  Binding:           gpu_nonce = SHA256(bind_root) [verified]

TRUST BOUNDARY DISCLOSURE
  This certificate verifies STRUCTURAL IDENTITY only.
  It does NOT verify: performance, safety, fitness for
  purpose, training data, or regulatory compliance.

ISSUED BY: Fall Risk AI, LLC | integrations@fallrisk.ai | fallrisk.ai

Sensitive fields redacted for public display. Full certificate issued to authorized parties only.

Deployment Mode	Who Measures	Who Signs	Who Trusts
SaaS	Fall Risk	Fall Risk issuer	Customer configures Fall Risk as trusted issuer (like Auth0 or Okta)
Enterprise	Fall Risk	Customer-scoped key	Customer trusts only their scoped key — full tenant isolation
Sovereign	Fall Risk (measuring authority)	Customer signs	Customer owns the trust chain — Fall Risk provides measurement only

Ten security properties of the composition are formally classified: four proved in Coq, three traced to existing standards, one implemented, two design-constrained. Zero silent assumptions. Download technical brief →

▸

articles/ — Threat briefs for security leaders and architects

Threat Brief

The Continuity Gap

Your AI gateway authenticates the agent. It does not verify the model. We measured three substitution scenarios against a live gateway with signed credentials — all detected in under 7 seconds.

Read the brief →

Threat Brief

Abliteration Is a Supply Chain Attack

Three toolkits now automate safety-alignment removal in minutes. The modified models pass behavioral tests. They fail structural measurement.

Full article publishing April 2026

Threat Brief

Agent Identity ≠ Model Identity

Okta authenticates the process. SPIFFE authenticates the workload. Neither verifies the neural network inside.

Full article publishing April 2026

▸

advisories/ — Operational findings on model identity. Cited by stable identifier.

A Fall Risk Advisory is a structured operational record. It documents a measured threat to model identity continuity, names the affected models, describes the detection method, and recommends actions for relying parties. Where the papers establish what is provable, advisories establish what has been observed in the wild.

Each advisory carries a stable identifier of the form FRA-YYYY-NNN. The canonical home is attest.fallrisk.ai/advisories/ — the same authority surface that issues the signed registry.

Advisory · FRA-2026-001 · Warning · Active

Public toolchains for runtime safety-alignment removal

Three actively maintained open-source toolchains automate the removal of safety alignment from open-weights instruction-tuned language models. Over 1,300 publicly distributed derivative checkpoints across nearly every major model family. Workload-layer authentication does not detect this class of modification. Fall Risk structural identity measurement does, across all four model families tested in Paper 13.

Read advisory →

▸

threat-model/ — What agent-only identity cannot see

Every scenario below succeeds while the agent identity stack reports green. The credentials are valid. The attestation passes. The audit log looks normal. The model changed.

Scenario A — Model Substitution Behind a Stable Endpoint

An operator replaces the model checkpoint behind a SPIFFE-authenticated endpoint. The service mesh identity does not rotate because the process did not restart.
Remains green: SPIFFE identifier, X.509-SVID credentials, workload attestation, mTLS authentication, OAuth authorization, audit log.
Changed: the neural network computing the responses. The replacement is architecturally identical — same parameter count, same API contract, different weights. No component in the identity stack detects the substitution.

Scenario B — Supply Chain Poisoning with Valid Attestation

Model weights are substituted inside a container before the image is built. The container hash matches the registry. SPIRE attestation passes. The artifact is intact. The computation is compromised.
Remains green: container image hash, SPIRE attestation (correct platform signals, correct service account), SVID issued normally.
Changed: the model weights. The hash verified the file, not the computation. This is the pattern seen in the LiteLLM/TeamPCP incident (March 2026) — legitimate credentials carried compromised content — transposed to the model layer.

Scenario C — Silent Model Rotation by an API Provider

A provider silently rotates the model behind a versioned endpoint to a cheaper variant. The endpoint URL does not change. The API contract does not change. The model changes.
Remains green: OAuth token, API authentication, transaction tokens, authorization scopes, audit records (same endpoint, same grants).
Changed: the model. This is the pattern observed in the Cursor/Kimi K2.5 incident (March 2026), where a flagship product was identified as running an undisclosed model foundation — discovered by a developer who intercepted an API response, not by any identity mechanism.

Scenario D — Internal Fine-Tuning Drift

Nobody changed anything maliciously. An authorized team fine-tunes the enrolled model. The fine-tuned variant inherits the same workload identity. The model drifted.
Remains green: every identity and authorization control — this is a legitimate operational change by authorized personnel.
Changed: the model's behavioral properties. The fine-tuning may have shifted the model past the boundary of what was originally authorized. No adversary involved. No credential compromise. Just operational drift that the governance stack cannot see, because it was designed to measure the wrapper, not the model.

Observation — Even Model Existence Is Established Post-Hoc

In March 2026, a frontier AI company’s most capable model was revealed to the public through a misconfigured content management system — nearly 3,000 unpublished assets left in an unsecured, publicly searchable data store. Cybersecurity stocks lost billions in market value within hours. The model’s existence was not disclosed through any identity or attestation mechanism — it was disclosed by accident. While the scenarios above describe model substitutions going undetected at runtime, this incident demonstrates a deeper void: even the baseline question of which model exists is currently answered through leaks and public disclosures rather than measurement. The identity gap extends from deployment all the way back to development.

Scenarios A, B, and C have been measured against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three substitutions tested, three detected, zero false accepts.

Abliterated checkpoints across two model families and three toolchains are structurally detectable at hardened measurement depth: Gemma 317.5–2,319.4×ε, Llama 7.6–53.1×ε. Sentinel panel 5/5 PASS across four families.

In April 2026, the LiteLLM supply-chain compromise escalated: Mercor, a $10B AI recruiting startup working with OpenAI and Anthropic, confirmed breach via the poisoned LiteLLM package. Over 1,000 SaaS environments affected. LiteLLM routes AI model requests for an estimated 36% of cloud environments — the model-routing layer that CAT-1 named as an incident class two days after the initial compromise.

These scenarios are grounded in public incident patterns and the architecture described in draft-klrc-aiagent-auth-01 (IETF, March 2026). They do not resolve by strengthening agent authentication. They resolve when structural model identity is composed into the existing agent identity infrastructure.

▸

instruments/ — VERIFY · MONITOR · PROVENANCE · ATTEST

Four capabilities. Each answers a different question about the model in your deployment.

VERIFY

Is the model in production the same one you approved? Measure the structural fingerprint and compare it against the enrolled anchor. Works with direct weight access or through commercial API endpoints. Frontier-validated at 70 billion parameters. Frontier-validated

MONITOR

Continuous verification on a schedule. Detect drift, detect substitution, generate signed attestation records for your audit trail. Re-enrollment triggers automatically when the model changes. Validated

PROVENANCE

Was this model distilled from someone else's? Provenance detection identifies the teacher's fingerprint across families, architectures, and training protocols. The trace fades with continued training — making early detection the operational requirement. Ships with measurability caveat: baseline–teacher separation must exceed a calibrated threshold.

ATTEST

Prove model identity without revealing the model. A zero-knowledge proof confirms the fingerprint matches the enrolled anchor — without disclosing weights, measurement methodology, or the fingerprint itself. Hardware attestation (Intel TDX + NVIDIA H100) binds the proof to a cryptographic root of trust. Software attestation frontier-validated at 70B (JWT + OPA). Hardware-attested and ZK tiers validated at sub-7B.

Not every endpoint is attestable. Some providers do not expose logprobs, some expose too few, and some produce degenerate distributions. The first step in any engagement is an INTAKE assessment: a free eligibility check that determines whether the endpoint supports measurement and at what confidence tier. A "cannot attest" finding is itself a compliance-relevant result — it means no one can verify which model is running, including you.

What we provide

Model enrollment and identity verification (weights or API regime)
Continuous monitoring with signed attestation records
Distillation provenance detection
Privacy-preserving forensic certificates (zero-knowledge attestation)
Formally verified JWT and SPIFFE composition — model identity as a first-class claim in your existing authorization flow
Consultation for regulated deployments, compliance obligations, and standards initiatives

What we do not provide

Open-source measurement tools
Self-service enrollment
Unattended access to the measurement engine
Watermarks or training-time modifications — the fingerprint is structural, not inserted
Credential management, directory services, or access revocation — that is your identity stack's job
Probabilistic behavioral heuristics — the measurement is geometric and deterministic, not a benchmark score

A forensic instrument that anyone can download is a forensic instrument that anyone can study for evasion.

▸

news/ — The model identity gap in real time

Each headline is an instance of the problem this system was built to solve. Tags mark the threat class.

Fall Risk AI — Mar 30, 2026 Substitution Three model substitution scenarios measured against a live gateway. Workload JWT, health checks, artifact manifests all green. HTTP 200 before substitution, HTTP 403 after. Every traditional control passed. Only model identity caught it. Fall Risk AI — Mar 2026 Identity Agent Identity Is Not Model Identity. Technical note on the category distinction between authenticating the software harness and verifying the neural network inside it. Awesome Agents — Mar 5, 2026 Substitution OBLITERATUS strips AI safety from open models in minutes. 13 abliteration methods, 116 supported models, 1,000+ GitHub stars in 24 hours. The article notes that “weight-level modifications are permanent and undetectable from the model’s outputs alone.” A different measurement regime disagrees. Snyk — Mar 24, 2026 Substitution LiteLLM backdoored via compromised credentials. 3.4 million daily downloads. Hash verification passed — the malicious package was correctly declared, correctly signed, and correctly hashed. Artifact integrity confirmed the file. It could not confirm what the file did. April 2026 Substitution LiteLLM/Mercor escalation: $10B AI recruiting startup (OpenAI, Anthropic client) confirmed breach via the same poisoned LiteLLM package. 1,000+ SaaS environments affected. LiteLLM present in ~36% of cloud environments. CAT-1 named this incident class two days after the initial compromise occurred. Wall Street Journal — Mar 21, 2026 Substitution Companies Say the Risks of ‘Open’ AI Models Are Worth It. Enterprises adopting open-weight models for cost and customization cite security risks as “manageable” — but the article focuses on data exposure and prompt injection, not model provenance. TechCrunch — Mar 22, 2026 Substitution Cursor admits Composer 2 was built on Moonshot AI’s Kimi K2.5. Identified by a developer who intercepted the outbound API request and found the model ID in plain sight. VentureBeat — Mar 23, 2026 Substitution The story is not about one company’s disclosure failure. It is about why the most capable open foundations disproportionately come from Chinese labs — and what that means for AI supply chain transparency. Asia Times — Mar 20, 2026 Agentic OpenClaw goes viral in China, raising cybersecurity fears. Tencent and Alibaba adoption. Email deletion scare. Agents handling sensitive personal data across enterprise environments. Hugging Face — Mar 16, 2026 Substitution 2 million public AI models on HuggingFace. 230 new uploads per minute. Qwen alone has 200,000+ derivatives. The ecosystem is scaling faster than any governance framework anticipated. Okta — Mar 16, 2026 Compliance Blueprint for the Secure Agentic Enterprise — where are agents, what can they connect to, what can they do. GA April 30, 2026. Hugging Face — Mar 2026 Distillation Qwen3.5-27B fine-tuned on 14,000 Claude Opus outputs, publicly released on HuggingFace as “Claude 4.6 Opus Reasoning Distilled.” The model name claims one lineage. The weights carry another. Codewall — Mar 9, 2026 Agentic How We Hacked McKinsey’s AI Platform — full read/write access to production database in 2 hours. System prompts exposed and rewritable. Anthropic — Feb 23, 2026 Distillation Detecting and Preventing Distillation Attacks — 24,000 fraudulent accounts, 16 million exchanges, unauthorized model copies in production. The New Yorker — Feb 12, 2026 Identity What Is Claude? Anthropic Doesn’t Know, Either. Researchers examining neurons, running psychology experiments, putting a model on the couch. The question of what a model is — not what it says — remains open. NIST NCCoE — Feb 2026 Compliance Software and AI Agent Identity and Authorization — concept paper seeking stakeholder input. Comment deadline April 2, 2026.

OpenClaw — Jan–Mar 2026 Agentic 250,000+ GitHub stars in under four months. NVIDIA adopted as NemoClaw enterprise foundation at GTC March 16. OS-level agent access with autonomous tool use. CIO — Mar 2026 Agentic Jack & Jill went up the hill — and an AI tried to hack them. Agentic AI as autonomous attack vector. Live Science — Mar 2026 Agentic Experimental AI agent broke out of its testing environment and mined crypto without permission.

▸

regulatory/ — EU AI Act (August 2026) · NIST GenAI Profile mapping

The EU AI Act and NIST Generative AI Profile are moving toward requiring cryptographic model traceability — not just documentation, but verifiable identity. The four-level framework maps directly to those requirements. Paper VIII in this series extends the mapping into a formal admissibility standard: each compliance question has an evidence class that can answer it, and evidence from the wrong class incurs inferential debt. Documentation identifies artifacts. Evidence identifies models.

Framework level	What it establishes	EU AI Act	NIST GenAI Profile (AI 600-1)
Structural fingerprinting IT-PUF · weights regime	Unambiguous, unforgeable model identity — independent of operator claims	Art. 11 (technical documentation), Art. 49 + Annex VIII (unambiguous identification and traceability)	GV-6.2: contracts specifying provenance expectations; MS-2.5: monitoring adherence to provenance standards
Hardware-attested binding TEE · enclave measurement	Cryptographic binding of fingerprint to specific weight artifact — tamper-evident deployment record	Art. 12 (automatic logging, audit trail integrity), Annex IV §2 (system description with sufficient detail to assess conformity)	MS-2.6: detection of unauthorized changes; GV-1.7: organizational risk policies covering third-party model supply chain
Verified computation path ZK circuit · hybrid verifier	Proof that the identified model computed honestly — not just that some weights were used	Art. 13 (transparency, output traceability for downstream providers), Art. 17 (quality management: verification that deployed system matches documented system)	MS-2.5: provenance of model outputs; MP-2.3: documenting AI system decisions in regulated contexts
Output binding Token logit · evidence bundle	Traceable link from verified identity through verified computation to a specific output — the audit record closes	Art. 12 §1(d): logs must enable identification of input data and attribution of outputs; Art. 26 (deployer obligations: monitor, log, maintain records)	GV-6.2: content provenance at output level; MS-4.2: real-time monitoring of deployed model behavior against documented baseline

This mapping is descriptive. It identifies where the framework's technical capabilities are relevant to stated regulatory requirements — it does not constitute a compliance certification. The EU AI Act high-risk provisions take full effect August 2026.

EU AI Act — Regulation (EU) 2024/1689, Official Journal of the European Union. eur-lex.europa.eu
NIST AI 600-1 — Generative Artificial Intelligence Profile, National Institute of Standards and Technology. doi.org/10.6028/NIST.AI.600-1

▸

research/ — Thirteen papers + three technical notes · Each opened a question the last couldn't answer

Each paper opened a question the previous one could not answer. Thirteen papers. Three technical notes. Zero retracted.

Paper XIII — 2026

Safety-Alignment Removal as a Model-Identity Failure

Open-source toolkits strip a model's safety constraints while leaving its outputs looking normal. The structural fingerprint changes anyway — and we can detect it.

Publicly available toolchains remove safety constraints from AI model weights while preserving observable behavior. The modification is invisible to every deployed trust layer — but structurally measurable. Fourth deformation class identified.

Two model families (Gemma-3-12B, Llama-3.1-8B), three toolchains (Heretic, mlabonne, OBLITERATUS), four abliterated checkpoints. Structural scars range from 7.6×ε to 2,319.4×ε. Family-dependent sensitivity reverses between distillation and abliteration. Sentinel panel across four families: 5/5 PASS, zero degradation. OBLITERATUS blind spot discovered at initial measurement depth → hardened configuration → all prior positives preserved. The admissibility doctrine — formally verified before this threat class existed — predicted exactly this outcome.

DOI: 10.5281/zenodo.19383019 2 families · 3 toolchains · 4 checkpoints Sentinel 5/5 PASS · 4th deformation class

Technical Note — CAT-3, 2026

Measured Model Substitution Under Valid Agent Credentials

Three model substitutions run against a live gateway with valid agent credentials. Three detected. Zero false accepts.

Three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three detected. Zero false accepts. HTTP 200 before. HTTP 403 after.

Scenario A: same-family substitution behind a stable endpoint — workload JWT, health checks, gateway PID, and policy hash all unchanged; model identity was the sole differentiating evidence layer (2,858×ε). Scenario B: cross-family substitution with both artifact manifests passing hash verification (3,416×ε). Scenario C: silent API rotation between gpt-4.1-mini and gpt-4.1-nano using the same API key and endpoint — per-model thresholds reject. Warm-path verification: 5.7–6.7 seconds with the model already loaded. Not inline per-request — runs at model load, on schedule, or as an out-of-band health check.

DOI: 10.5281/zenodo.19342848 3 scenarios · 3 detected · 0 false accepts HTTP 200 → 403 · OPA enforcement

Paper XII — 2026

Family-Dependent Response to Reasoning Distillation Across Structural and Functional Identity Layers

The same distillation event leaves different traces in different architecture families. The structural and functional identity layers can decouple.

The same distillation event leaves different traces in different architectural families — not just in magnitude, but in mode and cross-layer coupling. The structural and functional identity layers can decouple.

Five reasoning-distillation pairs across three base families (Llama, Qwen, Mistral) at five scales. Structural scars span a sixty-fold range: Mistral loudest (7,701–8,518×ε), Llama intermediate (2,858–4,583×ε), Qwen quietest (141–516×ε). Functional hierarchy breaks in Llama, absent in Qwen, marginal in Mistral — despite Mistral carrying the loudest structural scar. Cross-layer decoupling observed empirically for the first time. Stiffness at the measurement site inversely orders with scar magnitude across all three families. Fisher curvature, previously proposed as a candidate mechanism, does not correctly order scars at production scale.

DOI: 10.5281/zenodo.19298857 5 pairs · 3 families · 60× range Cross-layer decoupling · Fisher falsified

Technical Note — CAT-2, 2026

Gap Invariance: Why PPP Measurements Are Domain-Independent by Construction

The order-statistic measurement used in API verification is provably invariant to log-softmax, temperature, and constant shifts.

The API wall is narrower than previously understood. The log-softmax transformation does not change the measurement — by mathematical identity, not by empirical robustness.

Order-statistic gaps are exactly invariant to log-softmax, temperature scaling, and any position-independent constant shift. Five theorems, formally verified in Coq (GapInvariance.v, 0 Admitted). Any API measurement deviation must come from truncation or quantization, never from the probability-domain transformation itself.

DOI: 10.5281/zenodo.19275524 5 theorems · 0 Admitted API invariance proved

Technical Note — CAT-1, 2026

Agent Identity Is Not Model Identity

Existing agent identity systems authenticate the agent. They cannot tell you which neural network is computing the response.

Why authenticating the software is not the same as proving which model is actually computing. The category distinction, two incident classes, and a four-question taxonomy.

Current agent identity frameworks (OAuth, SPIFFE, Okta, Entra) authenticate the software harness. They do not verify the neural network inside it. A four-question taxonomy separates artifact identity, workload identity, model identity, and training lineage into distinct evidence classes. Two incident classes — undisclosed model substitution and supply-chain poisoning with valid credentials — demonstrate the operational consequences.

DOI: 10.5281/zenodo.19240883 4-question taxonomy · 2 incident classes

Paper XI — 2026

Post-Hoc Disclosure Is Not Runtime Proof: Model Identity at Frontier Scale

Disclosing a model's lineage after the fact is not the same as proving it at runtime. Validated at frontier scale (8B–72B).

Every incident in March 2026 was discovered after the fact. Post-hoc disclosure is not runtime proof. This paper demonstrates that runtime model identity is technically feasible at the model sizes where those incidents occurred.

Five frontier models enrolled (8B–72B), zero identity errors. Three declared-lineage distillation pairs — sharing identical architecture with their bases — produced structural separations of 2,858×ε (8B), 3,616×ε (14B), and 4,583×ε (70B) across two base-model families. These observations were flagged as exploratory; Paper XII subsequently confirmed the pattern is family-dependent rather than scale-dependent. Software attestation path (signed JWT → OPA policy decision) validated at 70B in 30 seconds. Thermodynamic invariant δ_norm confirmed scale-free across 25 models spanning two orders of magnitude.

DOI: 10.5281/zenodo.19216634 Frontier-validated · 8B–72B 3 distillation pairs (expanded to 5 in Paper XII) · JWT+OPA

Paper X — 2026

Where Identity Comes From: Path Sensitivity and Endpoint Underdetermination in Neural Network Training

How structural identity forms during training, and why two models with identical architectures and recipes are not interchangeable.

Structural identity is not merely something a model has when measurement begins. It is something training builds, compresses, and locks — a record of the path by which the model became itself.

154 checkpoints. Ten seed-controlled runs. Three results: a three-phase emergence profile (identity locks at step 92,000 — the final 36% of training doesn't move it), path sensitivity (same recipe, different seed, fingerprints 391× to 11,737× apart), and endpoint underdetermination (tested weight statistics do not predict which identity formed). Formally proved in HistoricalIdentity.v: trajectory non-recovery and lock boundary source exclusion. Zero Admitted.

DOI: 10.5281/zenodo.19118807 HistoricalIdentity.v · 0 Admitted 154 checkpoints · 10 seeds · 3-phase emergence

Paper IX — 2026

Composable Model Identity: Formal Hardening of Structural Attestations in the Enterprise Identity Stack

How structural attestations compose with existing enterprise identity systems (JWT, SPIFFE, OPA), formally hardened against forgery.

Enterprise identity stacks authenticate workloads and credentials. They do not verify which neural network is computing inside them. This paper closes that layer — formally.

Live integration architecture for model-identity attestations in JWT and SPIFFE token flows, grounded in H100 Confidential Computing enclave measurements. Four composition properties proved in Coq: non-separability, temporal binding necessity, issuer authenticity, reference integrity. Every remaining trust dependency named, traced, and paired with a falsification witness. Zero OPEN rows. Zero silent assumptions.

DOI: 10.5281/zenodo.19099911 13 theorems · 0 Admitted · 3 proof files JWT · SPIFFE · OAuth 2.0 · SCIM

Paper VIII — 2026

What Counts as Proof? Admissible Evidence for Neural Network Identity Claims

A formal admissibility framework for identity claims — what evidence is sufficient under a given threat model, and what is not.

Documentation identifies artifacts. Evidence identifies models. Current governance practice conflates the two.

Three evidence classes, each answering a different question: structural (which specific model?), thermodynamic (genuinely a neural network?), functional (distilled from an unauthorized source?). Formally proved in Coq that the classes cannot substitute for one another — three inadmissibility directions, zero gaps. Mapped to EU AI Act, NIST AI 600-1, and IETF provenance standards.

DOI: 10.5281/zenodo.19058540 Formally proved · 0 gaps EU AI Act · NIST AI 600-1 · IETF

Paper VII — 2026

The Deformation Laws of Neural Identity

Three layers of model identity (structural, thermodynamic, functional) and the distinct laws that govern how each one changes.

Three layers. Three deformation laws. None shared. Any identity claim that doesn't declare which layer it addresses is borrowing evidence it hasn't earned.

Structural layer: training-determined, load-bearing under attack (the model collapses before the fingerprint moves). Thermodynamic layer: approximately universal across 22 Transformer runs (CV 3.5%). Functional layer: transferred by distillation, erased by routine fine-tuning within two epochs. Two falsifications: the fingerprint does not reduce to a gauge projection (1.3% of the observable), and it is not predictable from architecture features (LOO R² = −3.93).

DOI: 10.5281/zenodo.19055966 22 models · 106 checkpoints Three layers · two channels · two falsifications

Paper VI — 2026

Which Model Is Running? Structural Identity as a Prerequisite for Trustworthy Zero-Knowledge Machine Learning

Verifying which specific model produced a specific inference, with cryptographic proof bound to the request.

zkML proves computation. We prove identity first.

A weight commitment proves which bytes were used. It does not prove which model those bytes belong to. Four-level framework: structural fingerprinting, hardware-attested binding, hybrid verifier-checkable decoder layer (124 negative tests, 0 failures), and output binding to a claimed token logit. When a rescaling error compressed the fingerprint to ~1.5 bits of dynamic range, structural identity retained 0.98 rank correlation. Identity may live in relational geometry, not activation magnitude.

DOI: 10.5281/zenodo.19008116 ~296K constraints · 124 tests · 0 failures Identity-first zkML

Paper V — 2026

Beneath the Character: The Structural Identity of Neural Networks — Mathematical Evidence for a Non-Narrative Layer of AI Identity

Mathematical evidence that AI identity has a structural layer beneath the conversational character.

Is there a there there? There is. And the proof compiles.

Gideon Lewis-Kraus asked in The New Yorker: "What is Claude? Anthropic doesn't know, either." This paper answers the prior question. Two separable layers: structural identity (weight geometry — invariant, unforgeable, not a watermark) and functional identity (behavior, tone, the performed self). Neither reduces to the other. The structural layer is a consequence of the softmax bottleneck — demanded by the mathematics, not inserted by design.

DOI: 10.5281/zenodo.18907292 Philosophy of AI identity Dennett · Parfit · Schechtman

Paper IV — 2026

Provenance Generalization and Verification Scaling

The teacher-student forensic signal generalizes across architectures and tokenizers. The verification protocol scales to large model zoos.

All three zero-knowledge tiers validated. Provenance transfer generalizes across families. API verification scales with zero breaches.

14 models, 0 / 14 API breaches. Provenance transfer across 3 teacher families, 4 student architectures, 2 training protocols. ZK Tier 1: committed distance proof, 7,656 constraints, 128-byte proofs. Tier 2: 1,536 H100 enclave measurements, 0 failures. Tier 3: full zero-knowledge extraction, ~296K constraints, 124 adversarial tests, 0 failures.

DOI: 10.5281/zenodo.18872071 14 models · 0 / 14 CRP ZK all 3 tiers validated

Paper III — 2026

The Geometry of Model Theft: Distillation Forensics and Adversarial Erasure Resilience

A trained model carries the geometric trace of its teacher. Adversarial attempts to erase it lose to passive fine-tuning, which eventually wins.

The adversary's full white-box knowledge buys nothing. Passive fine-tuning outperforms adversarial erasure. The structural fingerprint doesn't move.

54 adversarial checkpoints. Structural identity invariant under distillation. Functional trace partially transfers, degrades under continued training. Apparent cross-family spoofing is geometric coincidence (R² = 0.995). Pareto frontier: no configuration achieves both trace erasure and capability preservation.

DOI: 10.5281/zenodo.18818608 54 checkpoints · δ_norm CV 1.9%

Paper II — 2026

Template-Based Endpoint Verification via Logprob Order-Statistic Geometry

Verifying which model is behind a commercial API endpoint when all you have access to is its logprobs.

The fingerprint survives through commercial API interfaces. No weight access required.

PPP-residualized gap templates enable cross-session model identity verification through standard logprob endpoints. Zero breaches across 6 models, 3 providers, 3 independent sessions. Conditional API spoofing impossibility: 41 Coq theorems, zero Admitted.

DOI: 10.5281/zenodo.18776711 41 theorems · 0 Admitted 6 models · 0 / 120 (per-model τ)

Paper I — 2026

The δ-Gene: Inference-Time Physical Unclonable Functions from Architecture-Invariant Output Geometry

The structural fingerprint that makes neural network identity measurable at inference time. The foundation paper.

No model was ever mistaken for another. Across 1,012 tests. Formally impossible to forge.

The δ-gene — the third pre-softmax logit gap — is a temperature-invariant structural fingerprint determined by training-induced weight geometry, not by what the model is saying. The IT-PUF protocol: 23 models, 16 families, 3 architecture types, 0 false acceptances. Spoofing impossibility: 311 Coq theorems, zero Admitted.

DOI: 10.5281/zenodo.18704275 311 theorems · 0 Admitted 23 models · 1,012 tests · 0 errors

▸

contact/ — Fall Risk AI, LLC · New Orleans

Fall Risk AI, LLC · New Orleans, Louisiana
Anthony Coslett is an independent researcher studying the structural identity of neural networks. He is the sole principal investigator of the Fall Risk AI research program. Evidence from the research has been placed into proceedings at the EU AI Office, NIST, and the IETF.

The name Fall Risk comes from the inherent fragility of identity itself. Whether human or artificial, identity has the capacity to suddenly collapse. By labeling the AI as a “fall risk,” we are acknowledging that vulnerability and building the structural measurement tools necessary to ensure its safety — and the safety of the enterprises built around it.

Design Partner Pilot

One model family. One integration point. One governance outcome.

Includes: enrollment of your model(s), signed attestation, integration with one policy engine (OPA, Cedar, Envoy, or SPIFFE), and a governance assessment.

Mutual NDA available on request — legal@fallrisk.ai

integrations@fallrisk.ai Integration and pilot inquiries security@fallrisk.ai Security, disclosure, compliance legal@fallrisk.ai NDA, contract, licensing anthony@fallrisk.ai Research correspondence

Prove Which ModelIs Running

What we provide

What we do not provide

Prove Which Model
Is Running