$ ls -la ./fallrisk.ai/
j k navigate   Enter open   Esc close   / search
Fall Risk AI

Prove Which Model
Is Running

Your stack authenticates users, agents, and infrastructure. It does not authenticate the model. We built the instrument that does.

fallrisk-ai · enrollment anchor
✓ ACTIVE
Model: meta-llama/Llama-3.1-8B-Instruct
Enrollment: enroll-2046c1bfea21
Architecture: transformer · 32 layers · 64D
Contract: itpuf-v0.1.0
Status: ✓ ACTIVE
Issuer: https://attest.fallrisk.ai
Evidence Class: structural (individual model identity)
Issued: 2026-04-08
Signature: eyJhbGciOiJSUzI1NiIsImtpZCI6ImZhbGxyaXNrLTk2Y2Q1ZT…
THE PHYSICS
0/54,120 observed false accepts in the 165-record structural audit ε ≈ 1.003×10⁻⁴ (EVT-derived acceptance threshold) δnorm = 0.318 ± 2% (scale-invariant, 410M–72B) Frontier-validated: 8B → 72B (30-second enrollment) 5 architecture families: Llama · Qwen · Mistral · Gemma · Phi
THE MATHEMATICS
NoSpoofing.v — no sweet spot at any KL budget EvidenceSufficiency.v — cross-layer non-substitution GapInvariance.v — measurement invariant under log-softmax, temperature, constant shift StructuralAuthorization.v — artifact bound to request, within freshness window Rocq 9.1.1 · 58 source files · 350+ theorems · all PROVEN, none Admitted
Explore Registry →

This is what your policy engine consumes to protect your stack from an unauthorized model.

Backed by 13 papers, 4 technical notes, and 350+ Coq machine-verified theorems. Zero admitted.

Structural and artifact identity, validated against the public signed registry
Live registry Publicly enrolled and signed
70M → 72B Parameter range validated
30 seconds Enrollment at 70B parameters
0 errors Zero observed identity errors in published validation

Every AI deployment runs four identities at once: an artifact (what shipped), an agent (who is making the request), a model (what is actually computing), and a lineage (whose training is present). Industry has built robust authentication for the first two. The last two go unverified at runtime.

COVERED
1. What artifact was shipped? — registries, version control, BOMs
2. What agent or workload is authenticated? — OAuth, SPIFFE, Okta, Entra, mTLS
UNVERIFIED AT RUNTIME
3. What model is actually computing?
4. Whose training lineage is present?

Software identity can be valid while model identity is entirely absent. We have measured this directly: three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. In every scenario, the workload identity, artifact integrity, and API authentication remained valid. In every scenario, the structural identity measurement detected the substitution and the policy gate denied the request.

Technical notes: Agent Identity Is Not Model Identity · Measured Model Substitution · Zenodo, 2026

Two products. One escalation path.

Lite verifies what model artifact you have on disk. Deep verifies which model is actually computing at runtime.

Trustfall
Lite
Free, local, open source. Scan default Hugging Face and Ollama caches; verify SHA-256 hashes against the public signed registry. Apache-2.0. Self-serve. Trustfall Lite v0.4 adds local CSV/JSONL inventory export and tokenizer-surface coverage signals — still local-first, still no model-byte upload.
Trustfall Lite · quickstart

Free local verifier for Hugging Face and Ollama artifacts.

$pipx install fallrisk-trustfall
$trustfall scan
Status states
verified
unknown_variant
?not_enrolled
pilot_available
Trustfall
Deep Lab
Self-service runtime identity for solo researchers, indie ML founders, and small teams. Hosted measurement on Fall Risk ephemeral compute, or Local Standard with the signed engine on your hardware. Continuous attestation, audit logs, signed certificates that policy engines can enforce. Public registry placement on Free; private namespace on paid tiers. Self-serve portal in private build.
Trustfall
Deep Enterprise
Sovereign deployment for organizations whose model weights, fingerprint vectors, or distance values cannot leave the environment. Customer-deployable signed engine artifact with three trust modes: Local Standard, TEE-backed (TDX + H100 confidential computing), and ZK private-match. Customer-controlled signing keys with proof-of-possession. Tenant-private registry namespace, audit retention, compliance exports. Design-partner pilots in scoping; mutual NDA on request.

Every neural network carries a unique structural fingerprint — not from what it says, but from the geometry of how it decides what to say. The fingerprint is a mathematical consequence of the architecture, not something anyone inserted.

Weights
Direct measurement
Send challenge prompts. Measure the internal response at two sites. Compare the resulting 64-dimensional fingerprint against an enrolled anchor. Accept or reject. No retraining. No model modification.
API
No weight access needed
Standard logprob endpoints expose enough structural geometry to verify identity across independent sessions. No weights required, no operator cooperation needed.
Zero-Knowledge
Prove without revealing
A zero-knowledge proof attests the model matches its enrolled identity without revealing the fingerprint, the anchor, or the methodology. Hardware attestation binds to a cryptographic root.
Attestation
The signed claim
The result travels inside a standard JWT or SPIFFE SVID, composing with OAuth 2.0, OPA, and EAT-based authorization flows (RFC 9711). No replacement identity stack. One new signed claim inside the systems you already use.

This end-to-end path — from forward pass through signed claim to policy decision — has been validated at 70 billion parameters in approximately 30 seconds.

Not a replacement. The layer none of them provide.

Watermarks require insertion at training time and are removed by fine-tuning. Model cards describe artifacts, not running systems. Behavioral tests measure what a model says, not what it is. Output monitoring watches downstream effects without verifying upstream identity. Each solves a real problem. None answer the structural question.

Property Watermarks Model Cards Behavioral Tests Output Monitoring Structural Fingerprint
Works without training-time insertion
Survives fine-tuning Partial
Survives distillation
Works without weight access
Verifies the running model, not a document Partial Partial
Cryptographically verifiable
Formally proved unforgeable
Composable with existing auth stack Partial

The best deployment uses several of these together. Watermarks where you control training. Behavioral tests for capability monitoring. Output monitoring for safety. And structural fingerprinting for the one question the others can't answer: is this still the model you approved?

v0.2.2 structural audit · May 2026 v0.2.3 artifact audit · May 2026

Structural audits apply to the Hugging Face IT-PUF lane. Artifact audits apply to the Ollama byte-identity lane. They are different evidence classes and should not be merged into one claim.

0 / 54,120
Observed false accepts. Same-seed cross-model pairs across the 165-record Hugging Face structural registry. No pair fell below ε. Cross-seed audit: 0/216,480.
9.82×ε
Closest cross-model separation. Starling-LM-7B-beta ↔ OpenChat-3.5-0106 — both Mistral-7B / OpenChat-lineage fine-tunes. Training siblings, structurally distinct.
46 / 46
Ollama artifact records unique. Model IDs, artifact manifest digests, and evidence digests were unique across the artifact-identity lane. Unauthorized artifact-hash collisions: zero.
350+
Coq theorems · 0 Admitted. Formal verification artifacts spanning the published findings: identity stability, gauge transport, no-spoofing impossibility, evidence sufficiency.
6,475×
Dynamic range across the 165-record audit. From closest sibling fine-tune pair to farthest architectural pair (gemma-4-31B-it ↔ gpt2-large). All measured pairs remain above threshold.
4
Patents assigned to Fall Risk AI, LLC. Weights identity + API verification + zero-knowledge attestation + identity-conditioned inference.
Earlier published validation
Earlier published validation included 0 / 1,012 identity tests in the weights regime (an earlier 23-model validation set spanning 16 vendor families and 3 architecture types), 0 / 14 API-regime tests across 3 providers, 1,536 measurements inside an H100 confidential-computing enclave, and reasoning-distillation separation studies showing structural scars up to 8,518 × ε across five Llama / Qwen / Mistral pairs (1.5B–70B). Those remain part of the published evidence base; the current public registry audit is the 165-record structural audit above.
Model Supply Chain Verification
A vendor ships a model built on an open-weight base. The marketing says proprietary. The API response says otherwise — if anyone thinks to look. No security team checks which model is actually serving production. No compliance process verifies origin. IT-PUF verifies the structural fingerprint of what is running against what was enrolled at deployment — no leaked model ID required.
Distillation Forensics
16 million exchanges. 24,000 fraudulent accounts. The resulting models carry the teacher’s fingerprint. IT-PUF detects provenance transfer across families, architectures, and training protocols. The adversary cannot erase the trace without degrading the capabilities the distillation was meant to acquire. The signal fades with continued training — making continuous monitoring, not periodic audits, the operational requirement.
Regulatory Compliance — EU AI Act
Current monitoring checks outputs. Nothing checks whether the model itself has been swapped. Article 15 requires continuous monitoring of high-risk AI systems. Deadline: August 2026. IT-PUF provides model-level identity attestation: the system in production right now is the system that passed your approval process. With the zero-knowledge tier, provable without disclosing proprietary model internals to the regulator.
Insurance and Audit
Not a checkbox. Not a vendor assertion. Cryptographic proof. IT-PUF’s hardware-attested measurement runs inside an NVIDIA H100 / Intel TDX enclave and produces a signed certificate binding model identity to specific weights. The insurer verifies without seeing the model. The policyholder proves compliance without disclosing trade secrets.
Internal Model Governance
Which deployment is running which version? Did the hotfix propagate? Is staging accidentally serving production? IT-PUF: enroll once at deployment, verify on demand or on schedule, detect identity drift. Non-invasive. Runs during normal inference. Enrollment is one-time and scales with model size — 8 seconds at 8B, 30 seconds at 72B on 3× A100. Verification reuses the enrolled anchor and runs in seconds. Architecture-agnostic: Transformer, Mamba, MoE, hybrid.
Agentic AI Authorization
Every agent identity framework asks three questions. None of them ask which model is inside the agent — because none of them have a way to answer it. Directory entries describe agents. Workload identities authenticate deployments. Signed tokens authorize actions. No layer of the current stack establishes which neural network is reasoning. IT-PUF answers the fourth question: bind the model’s structural fingerprint to the authorization token. No protocol changes — the claim travels inside a standard JWT or SPIFFE SVID. Four security properties proved in Coq. Zero silent assumptions.
Detecting Published Safety-Alignment Removal
Open-source tools now automate the removal of safety alignment from language models. The modified checkpoints preserve the API contract and are optimized for low KL divergence from the original. The internal activation geometry still changes. Published abliterated checkpoints across two model families and three toolchains were measured against aligned bases under the hardened instrument configuration. Gemma-3-12B: Heretic 317.5–367.6×ε, mlabonne 1,556.8–2,319.4×ε. Llama-3.1-8B: Heretic 7.6–12.0×ε, OBLITERATUS 45.1–53.1×ε. Sentinel panel: 5/5 PASS across four model families (Gemma, Llama, Qwen, Mistral). Zero degradation of any prior positive. Family-dependent sensitivity reverses between distillation and abliteration — Gemma quiet under distillation but loud under abliteration; Llama the opposite. In the tested cases, published safety-alignment removal left a measurable structural scar — even when the tool explicitly optimized for output preservation.

Model-identity attestations compose with existing enterprise authorization infrastructure — OAuth 2.0, SPIFFE, SCIM — without protocol modifications. The fingerprint travels as a compact claim inside a standard JWT or SPIFFE SVID.

Sample Verification Certificate ✓ PASS
Report ID:           FR-2026-5B127509
Date of Measurement: 2026-03-17T08:43:15Z
Verification Result: PASS

MODEL
  Identifier:        Mistral-7B
  Architecture:      transformer
  Weight File Hash:  [redacted]
  Evidence Class:    Structural (individual model identity)
  Trust Mode:        TEE-backed (hardware-attested measurement)

MEASUREMENT
  Fingerprint Dims:  64
  Valid Measurements: 64/64 (0% failure)

FINGERPRINT VERIFICATION
  Fingerprint Digest: [redacted]
  Bundle Digest:      [redacted]
  Match:              UNIQUE (0 collisions across 6-model zoo)

ATTESTATION CHAIN
  CPU (Intel TDX):   CC State ON, Ready state ready
  GPU (NVIDIA CC):   H100 80GB HBM3, CC mode active
  Binding:           gpu_nonce = SHA256(bind_root) [verified]

TRUST BOUNDARY DISCLOSURE
  This certificate verifies STRUCTURAL IDENTITY only.
  It does NOT verify: performance, safety, fitness for
  purpose, training data, or regulatory compliance.

ISSUED BY: Fall Risk AI, LLC | integrations@fallrisk.ai | fallrisk.ai

Sensitive fields redacted for public display. Full certificate issued to authorized parties only.

Trust mode Engine runs Signing Customer trust posture
Hosted Fall Risk ephemeral compute Fall Risk runtime certificate signer Customer trusts Fall Risk operational posture; weights deleted after measurement
Local Standard Customer environment, customer hardware Customer-generated key with proof-of-possession Customer-cooperative claim integrity under live challenge and signed engine
TEE-backed Customer’s confidential computing enclave (TDX · H100 CC) Customer-generated key Hardware-rooted runtime acquisition; attestation chain anchors to vendor roots
ZK private-match Customer environment with ZK prover Customer-generated key Cryptographic privacy: τ vector and distance never leave the customer environment

Hosted is available in Trustfall Deep Lab (default) and rare Enterprise scenarios. Local Standard is available in Lab paid tiers and is the Enterprise default. TEE-backed and ZK private-match are Enterprise; ZK is design-partner-gated during MVP.

Ten security properties of the composition are formally classified: four proved in Coq, three traced to existing standards, one implemented, two design-constrained. Zero silent assumptions. Download technical brief →

A Fall Risk Advisory is a structured operational record. It documents a measured threat to model identity continuity, names the affected models, describes the detection method, and recommends actions for relying parties. Where the papers establish what is provable, advisories establish what has been observed in the wild.

Each advisory carries a stable identifier of the form FRA-YYYY-NNN. The canonical home is attest.fallrisk.ai/advisories/ — the same authority surface that issues the signed registry.

Every scenario below succeeds while the agent identity stack reports green. The credentials are valid. The attestation passes. The audit log looks normal. The model changed.

Scenario A — Model Substitution Behind a Stable Endpoint
An operator replaces the model checkpoint behind a SPIFFE-authenticated endpoint. The service mesh identity does not rotate because the process did not restart.
Remains green: SPIFFE identifier, X.509-SVID credentials, workload attestation, mTLS authentication, OAuth authorization, audit log.
Changed: the neural network computing the responses. The replacement is architecturally identical — same parameter count, same API contract, different weights. No component in the identity stack detects the substitution.
Scenario B — Supply Chain Poisoning with Valid Attestation
Model weights are substituted inside a container before the image is built. The container hash matches the registry. SPIRE attestation passes. The artifact is intact. The computation is compromised.
Remains green: container image hash, SPIRE attestation (correct platform signals, correct service account), SVID issued normally.
Changed: the model weights. The hash verified the file, not the computation. This is the pattern seen in the LiteLLM/TeamPCP incident (March 2026) — legitimate credentials carried compromised content — transposed to the model layer.
Scenario C — Silent Model Rotation by an API Provider
A provider silently rotates the model behind a versioned endpoint to a cheaper variant. The endpoint URL does not change. The API contract does not change. The model changes.
Remains green: OAuth token, API authentication, transaction tokens, authorization scopes, audit records (same endpoint, same grants).
Changed: the model. This is the pattern observed in the Cursor/Kimi K2.5 incident (March 2026), where a flagship product was identified as running an undisclosed model foundation — discovered by a developer who intercepted an API response, not by any identity mechanism.
Scenario D — Internal Fine-Tuning Drift
Nobody changed anything maliciously. An authorized team fine-tunes the enrolled model. The fine-tuned variant inherits the same workload identity. The model drifted.
Remains green: every identity and authorization control — this is a legitimate operational change by authorized personnel.
Changed: the model's behavioral properties. The fine-tuning may have shifted the model past the boundary of what was originally authorized. No adversary involved. No credential compromise. Just operational drift that the governance stack cannot see, because it was designed to measure the wrapper, not the model.
Observation — Even Model Existence Is Established Post-Hoc
In March 2026, a frontier AI company’s most capable model was revealed to the public through a misconfigured content management system — nearly 3,000 unpublished assets left in an unsecured, publicly searchable data store. Cybersecurity stocks lost billions in market value within hours. The model’s existence was not disclosed through any identity or attestation mechanism — it was disclosed by accident. While the scenarios above describe model substitutions going undetected at runtime, this incident demonstrates a deeper void: even the baseline question of which model exists is currently answered through leaks and public disclosures rather than measurement. The identity gap extends from deployment all the way back to development.

Scenarios A, B, and C have been measured against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three substitutions tested, three detected, zero false accepts.

Abliterated checkpoints across two model families and three toolchains are structurally detectable at hardened measurement depth: Gemma 317.5–2,319.4×ε, Llama 7.6–53.1×ε. Sentinel panel 5/5 PASS across four families.

In April 2026, the LiteLLM supply-chain compromise escalated: Mercor, a $10B AI recruiting startup working with OpenAI and Anthropic, confirmed breach via the poisoned LiteLLM package. Over 1,000 SaaS environments affected. LiteLLM sits at the model-routing layer that the Agent Identity Is Not Model Identity technical note named as an incident class two days after the initial compromise was disclosed — weeks before the Mercor escalation confirmed the pattern.

These scenarios are grounded in public incident patterns and the architecture described in draft-klrc-aiagent-auth-01 (IETF, March 2026). They do not resolve by strengthening agent authentication. They resolve when structural model identity is composed into the existing agent identity infrastructure.

The EU AI Act and NIST Generative AI Profile are increasing pressure for verifiable model traceability — not just documentation, but evidence of what is actually running. The four-level framework maps directly to those obligations. The admissibility framework in What Counts as Proof? extends the mapping into a formal standard: each compliance question has an evidence class that can answer it, and evidence from the wrong class incurs inferential debt. Documentation identifies artifacts. Evidence identifies models.

Framework level What it establishes EU AI Act NIST GenAI Profile (AI 600-1)
Structural fingerprinting
IT-PUF · weights regime
Unambiguous, unforgeable model identity — independent of operator claims Art. 11 (technical documentation), Art. 49 + Annex VIII (unambiguous identification and traceability) GV-6.2: contracts specifying provenance expectations; MS-2.5: monitoring adherence to provenance standards
Hardware-attested binding
TEE · enclave measurement
Cryptographic binding of fingerprint to specific weight artifact — tamper-evident deployment record Art. 12 (automatic logging, audit trail integrity), Annex IV §2 (system description with sufficient detail to assess conformity) MS-2.6: detection of unauthorized changes; GV-1.7: organizational risk policies covering third-party model supply chain
Verified computation path
ZK circuit · hybrid verifier
Proof that the identified model computed honestly — not just that some weights were used Art. 13 (transparency, output traceability for downstream providers), Art. 17 (quality management: verification that deployed system matches documented system) MS-2.5: provenance of model outputs; MP-2.3: documenting AI system decisions in regulated contexts
Output binding
Token logit · evidence bundle
Traceable link from verified identity through verified computation to a specific output — the audit record closes Art. 12 §1(d): logs must enable identification of input data and attribution of outputs; Art. 26 (deployer obligations: monitor, log, maintain records) GV-6.2: content provenance at output level; MS-4.2: real-time monitoring of deployed model behavior against documented baseline

This mapping is descriptive. It identifies where the framework's technical capabilities are relevant to stated regulatory requirements — it does not constitute a compliance certification. The EU AI Act high-risk provisions take full effect August 2026.

EU AI Act — Regulation (EU) 2024/1689, Official Journal of the European Union. eur-lex.europa.eu
NIST AI 600-1 — Generative Artificial Intelligence Profile, National Institute of Standards and Technology. doi.org/10.6028/NIST.AI.600-1

Each paper opened a question the previous one could not answer. Thirteen papers. Three technical notes. Zero retracted.

Research Paper, 2026
Open-source toolkits strip a model's safety constraints while leaving its outputs looking normal. The structural fingerprint changes anyway — and we can detect it.
Publicly available toolchains remove safety constraints from AI model weights while preserving observable behavior. The modification is invisible to every deployed trust layer — but structurally measurable. Fourth deformation class identified.
Two model families (Gemma-3-12B, Llama-3.1-8B), three toolchains (Heretic, mlabonne, OBLITERATUS), four abliterated checkpoints. Structural scars range from 7.6×ε to 2,319.4×ε. Family-dependent sensitivity reverses between distillation and abliteration. Sentinel panel across four families: 5/5 PASS, zero degradation. OBLITERATUS blind spot discovered at initial measurement depth → hardened configuration → all prior positives preserved. The admissibility doctrine — formally verified before this threat class existed — predicted exactly this outcome.
DOI: 10.5281/zenodo.19383019 2 families · 3 toolchains · 4 checkpoints Sentinel 5/5 PASS · 4th deformation class
Technical Note, 2026
Three model substitutions run against a live gateway with valid agent credentials. Three detected. Zero false accepts.
Three substitution scenarios executed against a live gateway with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. Three detected. Zero false accepts. HTTP 200 before. HTTP 403 after.
Scenario A: same-family substitution behind a stable endpoint — workload JWT, health checks, gateway PID, and policy hash all unchanged; model identity was the sole differentiating evidence layer (2,858×ε). Scenario B: cross-family substitution with both artifact manifests passing hash verification (3,416×ε). Scenario C: silent API rotation between gpt-4.1-mini and gpt-4.1-nano using the same API key and endpoint — per-model thresholds reject. Warm-path verification: 5.7–6.7 seconds with the model already loaded. Not inline per-request — runs at model load, on schedule, or as an out-of-band health check.
DOI: 10.5281/zenodo.19342848 3 scenarios · 3 detected · 0 false accepts HTTP 200 → 403 · OPA enforcement
Research Paper, 2026
The same distillation event leaves different traces in different architecture families. The structural and functional identity layers can decouple.
The same distillation event leaves different traces in different architectural families — not just in magnitude, but in mode and cross-layer coupling. The structural and functional identity layers can decouple.
Five reasoning-distillation pairs across three base families (Llama, Qwen, Mistral) at five scales. Structural scars span a sixty-fold range: Mistral loudest (7,701–8,518×ε), Llama intermediate (2,858–4,583×ε), Qwen quietest (141–516×ε). Functional hierarchy breaks in Llama, absent in Qwen, marginal in Mistral — despite Mistral carrying the loudest structural scar. Cross-layer decoupling observed empirically for the first time. Stiffness at the measurement site inversely orders with scar magnitude across all three families. Fisher curvature, previously proposed as a candidate mechanism, does not correctly order scars at production scale.
DOI: 10.5281/zenodo.19298857 5 pairs · 3 families · 60× range Cross-layer decoupling · Fisher falsified
Technical Note, 2026
The order-statistic measurement used in API verification is provably invariant to log-softmax, temperature, and constant shifts.
The API wall is narrower than previously understood. The log-softmax transformation does not change the measurement — by mathematical identity, not by empirical robustness.
Order-statistic gaps are exactly invariant to log-softmax, temperature scaling, and any position-independent constant shift. Five theorems, formally verified in Coq (GapInvariance.v, 0 Admitted). Any API measurement deviation must come from truncation or quantization, never from the probability-domain transformation itself.
DOI: 10.5281/zenodo.19275524 5 theorems · 0 Admitted API invariance proved
Technical Note, 2026
Existing agent identity systems authenticate the agent. They cannot tell you which neural network is computing the response.
Why authenticating the software is not the same as proving which model is actually computing. The category distinction, two incident classes, and a four-question taxonomy.
Current agent identity frameworks (OAuth, SPIFFE, Okta, Entra) authenticate the software harness. They do not verify the neural network inside it. A four-question taxonomy separates artifact identity, workload identity, model identity, and training lineage into distinct evidence classes. Two incident classes — undisclosed model substitution and supply-chain poisoning with valid credentials — demonstrate the operational consequences.
DOI: 10.5281/zenodo.19240883 4-question taxonomy · 2 incident classes
Research Paper, 2026
Disclosing a model's lineage after the fact is not the same as proving it at runtime. Validated at frontier scale (8B–72B).
Every incident in March 2026 was discovered after the fact. Post-hoc disclosure is not runtime proof. This paper demonstrates that runtime model identity is technically feasible at the model sizes where those incidents occurred.
Five frontier models enrolled (8B–72B), zero identity errors. Three declared-lineage distillation pairs — sharing identical architecture with their bases — produced structural separations of 2,858×ε (8B), 3,616×ε (14B), and 4,583×ε (70B) across two base-model families. These observations were flagged as exploratory; the family-dependent distillation study subsequently confirmed the pattern is family-dependent rather than scale-dependent. Software attestation path (signed JWT → OPA policy decision) validated at 70B in 30 seconds. Thermodynamic invariant δ_norm confirmed scale-free across 25 models spanning two orders of magnitude.
DOI: 10.5281/zenodo.19216634 Frontier-validated · 8B–72B 3 distillation pairs (expanded to 5 in the family-dependent distillation study) · JWT+OPA
Research Paper, 2026
How structural identity forms during training, and why two models with identical architectures and recipes are not interchangeable.
Structural identity is not merely something a model has when measurement begins. It is something training builds, compresses, and locks — a record of the path by which the model became itself.
154 checkpoints. Ten seed-controlled runs. Three results: a three-phase emergence profile (identity locks at step 92,000 — the final 36% of training doesn't move it), path sensitivity (same recipe, different seed, fingerprints 391× to 11,737× apart), and endpoint underdetermination (tested weight statistics do not predict which identity formed). Formally proved in HistoricalIdentity.v: trajectory non-recovery and lock boundary source exclusion. Zero Admitted.
DOI: 10.5281/zenodo.19118807 HistoricalIdentity.v · 0 Admitted 154 checkpoints · 10 seeds · 3-phase emergence
Research Paper, 2026
How structural attestations compose with existing enterprise identity systems (JWT, SPIFFE, OPA), formally hardened against forgery.
Enterprise identity stacks authenticate workloads and credentials. They do not verify which neural network is computing inside them. This paper closes that layer — formally.
Live integration architecture for model-identity attestations in JWT and SPIFFE token flows, grounded in H100 Confidential Computing enclave measurements. Four composition properties proved in Coq: non-separability, temporal binding necessity, issuer authenticity, reference integrity. Every remaining trust dependency named, traced, and paired with a falsification witness. Zero OPEN rows. Zero silent assumptions.
DOI: 10.5281/zenodo.19099911 13 theorems · 0 Admitted · 3 proof files JWT · SPIFFE · OAuth 2.0 · SCIM
Research Paper, 2026
A formal admissibility framework for identity claims — what evidence is sufficient under a given threat model, and what is not.
Documentation identifies artifacts. Evidence identifies models. Current governance practice conflates the two.
Three evidence classes, each answering a different question: structural (which specific model?), thermodynamic (genuinely a neural network?), functional (distilled from an unauthorized source?). Formally proved in Coq that the classes cannot substitute for one another — three inadmissibility directions, zero gaps. Mapped to EU AI Act, NIST AI 600-1, and IETF provenance standards.
DOI: 10.5281/zenodo.19058540 Formally proved · 0 gaps EU AI Act · NIST AI 600-1 · IETF
Research Paper, 2026
Three layers of model identity (structural, thermodynamic, functional) and the distinct laws that govern how each one changes.
Three layers. Three deformation laws. None shared. Any identity claim that doesn't declare which layer it addresses is borrowing evidence it hasn't earned.
Structural layer: training-determined, load-bearing under attack (the model collapses before the fingerprint moves). Thermodynamic layer: approximately universal across 22 Transformer runs (CV 3.5%). Functional layer: transferred by distillation, erased by routine fine-tuning within two epochs. Two falsifications: the fingerprint does not reduce to a gauge projection (1.3% of the observable), and it is not predictable from architecture features (LOO R² = −3.93).
DOI: 10.5281/zenodo.19055966 22 models · 106 checkpoints Three layers · two channels · two falsifications
Research Paper, 2026
Verifying which specific model produced a specific inference, with cryptographic proof bound to the request.
zkML proves computation. We prove identity first.
A weight commitment proves which bytes were used. It does not prove which model those bytes belong to. Four-level framework: structural fingerprinting, hardware-attested binding, hybrid verifier-checkable decoder layer (124 negative tests, 0 failures), and output binding to a claimed token logit. When a rescaling error compressed the fingerprint to ~1.5 bits of dynamic range, structural identity retained 0.98 rank correlation. Identity may live in relational geometry, not activation magnitude.
DOI: 10.5281/zenodo.19008116 ~296K constraints · 124 tests · 0 failures Identity-first zkML
Research Paper, 2026
Mathematical evidence that AI identity has a structural layer beneath the conversational character.
Is there a there there? There is. And the proof compiles.
Gideon Lewis-Kraus asked in The New Yorker: "What is Claude? Anthropic doesn't know, either." This paper answers the prior question. Two separable layers: structural identity (weight geometry — invariant, unforgeable, not a watermark) and functional identity (behavior, tone, the performed self). Neither reduces to the other. The structural layer is a consequence of the softmax bottleneck — demanded by the mathematics, not inserted by design.
DOI: 10.5281/zenodo.18907292 Philosophy of AI identity Dennett · Parfit · Schechtman
Research Paper, 2026
The teacher-student forensic signal generalizes across architectures and tokenizers. The verification protocol scales to large model zoos.
All three zero-knowledge tiers validated. Provenance transfer generalizes across families. API verification scales with zero breaches.
14 models, 0 / 14 API breaches. Provenance transfer across 3 teacher families, 4 student architectures, 2 training protocols. ZK Tier 1: committed distance proof, 7,656 constraints, 128-byte proofs. Tier 2: 1,536 H100 enclave measurements, 0 failures. Tier 3: full zero-knowledge extraction, ~296K constraints, 124 adversarial tests, 0 failures.
DOI: 10.5281/zenodo.18872071 3 teacher families · 0 breaches ZK all 3 tiers validated
Research Paper, 2026
A trained model carries the geometric trace of its teacher. Adversarial attempts to erase it lose to passive fine-tuning, which eventually wins.
The adversary's full white-box knowledge buys nothing. Passive fine-tuning outperforms adversarial erasure. The structural fingerprint doesn't move.
54 adversarial checkpoints. Structural identity invariant under distillation. Functional trace partially transfers, degrades under continued training. Apparent cross-family spoofing is geometric coincidence (R² = 0.995). Pareto frontier: no configuration achieves both trace erasure and capability preservation.
DOI: 10.5281/zenodo.18818608 54 checkpoints · δnorm CV 1.9%
Research Paper, 2026
Verifying which model is behind a commercial API endpoint when all you have access to is its logprobs.
The fingerprint survives through commercial API interfaces. No weight access required.
PPP-residualized gap templates enable cross-session model identity verification through standard logprob endpoints. Zero breaches across 6 models, 3 providers, 3 independent sessions. Conditional API spoofing impossibility: 41 Coq theorems, zero Admitted.
DOI: 10.5281/zenodo.18776711 41 theorems · 0 Admitted 6 models · 0 / 120 (per-model τ)
Research Paper, 2026
The structural fingerprint that makes neural network identity measurable at inference time. The foundation paper.
No model was ever mistaken for another. Formally impossible to forge.
The δ-gene — the third pre-softmax logit gap — is a temperature-invariant structural fingerprint determined by training-induced weight geometry, not by what the model is saying. The IT-PUF protocol: 23 models, 16 families, 3 architecture types, 0 false acceptances. Spoofing impossibility: 311 Coq theorems, zero Admitted.
DOI: 10.5281/zenodo.18704275 311 theorems · 0 Admitted 3 architecture types · 0 errors
On the name

The name Fall Risk comes from the medical wristband. A hospital labels a patient as a Fall Risk to acknowledge the vulnerability and act on it — extra rails, closer monitoring, faster response when something slips. Neural networks must be considered Fall Risks too. They can shift, drift, or be substituted in ways their software interfaces never expose. By labeling the AI as a “Fall Risk,” we are acknowledging that vulnerability and building the structural measurement tools necessary to ensure its safety — and the safety of the enterprises built around it.

On the researcher

Fall Risk AI, LLC · New Orleans, Louisiana. Anthony Coslett is an independent researcher studying the structural identity of neural networks. He is the sole principal investigator of the Fall Risk AI research program. Evidence from the research has been placed into proceedings at the EU AI Office, NIST, and the IETF.

fallrisk.ai $ Tab complete   / focus