AI Governance Proof (AIGP)
Validation Evidence & Cryptographic Proof
Reproducible, auditable evidence that the AIGP spec delivers on its governance promises across 15 regulated industries. AgentGP serves as the reference implementation validating the AIGP thesis in a real-world agentic AI context. Every number on this page is derived from the validation artifacts listed in the Reproducibility section.
Scope of Evidence
Simulated data: All 15 tenant configurations — including regulated financial services tenants (trading, banking, risk management, compliance) and the internal AI governance canary tenant — use AI-generated synthetic data. Agent identities, policies, prompts, tools, and organizational hierarchies are fabricated for validation purposes. No real customer data, production workloads, or actual financial institution configurations are used.
Abstract#
This document presents the validation evidence for the AI Governance Proof (AIGP), an open spec for cryptographically provable governance of AI agent systems. To validate the AIGP thesis in a real-world agentic AI context, we built AgentGP — a production-grade AI governance platform that implements the spec end-to-end. We demonstrate that AIGP achieves 97.4% governance accountability across 38 traces and 267 events spanning15 regulated and internal governance tenants covering trading, banking, risk management, compliance, and AI governance.
The validation program tests 5 formal claims through deterministic event simulation, cryptographic verification, and multi-tenant isolation checks. All results are reproducible from a single seed value and all verification scripts are open-source.
Claims Under Test#
The AIGP validation program tests 5 specific, falsifiable claims. Each claim has a defined measurement methodology and pass/fail criteria. We present the claims with full transparency about what “100%” means and what it does not.
Governance Coverage
97.4% (37/38 traces)Every AIGP-integrated agent run is governance-accounted — full proof OR explicit fail-closed denial. No silent bypass.
37 traces received full GOVERNANCE_PROOF events. 1 traces were denied or lacked proof events but ALL were explicitly accounted for in the audit trail (DENIED, BLOCKED, or lifecycle events). Zero traces escaped governance.
Hash Chain Integrity
100.0% (38/38 chains)Every event in a trace links to the previous event via SHA-256 parent hash. Any tampering breaks the chain.
For each trace, we verify that event N's parent_hash equals SHA-256(canonical(event N-1)). Canonicalization excludes event_signature, signature_key_id, and parent_hash itself; keys are sorted, compact JSON, falsy values omitted.
Merkle Root Verification
100.0% (37/37 roots)The AIGP Merkle root is independently recomputed from governance_merkle_tree resource leaf hashes using a binary SHA-256 tree. The verifier extracts leaf hashes from the v0.12 structure and rebuilds the tree to confirm aigp_hash matches.
Every GOVERNANCE_PROOF event carries a governance_merkle_tree with resource leaf hashes (policy, prompt, tool, context). The verifier independently extracts these hashes, recomputes the binary Merkle tree using pairwise SHA-256, and confirms the root matches aigp_hash. 37 Merkle roots checked, 37 passed, 0 failures.
Event Signature Verification (ES256)
100.0% (267/267 signatures)Signed events can be independently verified using JWS ES256 (ECDSA P-256) cryptographic signatures over the canonical event body. This validates both the signature verification logic and coverage patterns.
Of the 267 total events, 267 (100.0%) carry JWS ES256 (ECDSA P-256) signatures. All 267 checked signatures verify correctly.
Multi-Tenant Isolation (Simulation-Scoped)
38 traces across 15 tenantsNo cross-tenant data leakage in generated events. Each tenant's data is cryptographically separated via UUID5 namespacing and RLS enforcement.
Each tenant receives approximately 3 traces (average across 15 tenants). Event generation uses per-tenant agent pools with UUID5-namespaced identifiers (impossible cross-tenant ID collision). Row-Level Security (RLS) is enforced at the database level. This validation confirms isolation in the simulation harness; live cross-tenant penetration testing is a separate phase.
Methodology#
The validation program operates in 4 phases, all driven by deterministic seed values for reproducibility:
Seed Generation
AI-generated tenant dictionary produces agents, policies, prompts, and tools for 15 regulated-industry tenants. Dictionary is version-controlled.
SQL Loading
generate_seed_sql.py --tenants reads the dictionary and produces deterministic UUID5-based SQL. Loaded into the database with row-level security enforcement.
Event Simulation
Event driver generates 38 traces (3/tenant) using governance scenarios in dry-run mode. Each trace maintains AIGP chain integrity with SHA-256 parent hashes and Merkle proofs.
Crypto Verification
verify_crypto.py independently recomputes every parent hash, Merkle root, and JWS ES256 signature. Zero-tolerance: any single failure = overall FAIL.
Validation Scope: In-Memory Simulation
Industry Coverage#
The validation program uses 15 tenants representing distinct regulated industries. Each tenant has its own agents, policies, prompts, tools, and organizational hierarchy AI-generated to ensure realistic, domain-appropriate configurations.
Synthetic Tenant Data
| Tenant | Industry | Key Regulations | Traces | Events | Proofs |
|---|---|---|---|---|---|
| Fixed Income Americas | Fixed Income Trading | SEC, FINRA +2 | 2 | 15 | 2 |
| Markets Trading EMEA | Markets / Trading | MiFID II, FCA +2 | 2 | 17 | 2 |
| Equities APAC | Equities Trading | MAS, SFC +2 | 3 | 20 | 3 |
| Regulatory Reporting | Regulatory Compliance | Basel III, SOX +2 | 6 | 53 | 6 |
| Investment Banking EMEA | Investment Banking | MiFID II, MAR +2 | 1 | 4 | 1 |
| Retail Banking Americas | Retail Banking | CFPB, TILA +2 | 1 | 7 | 1 |
| Compliance Global | Compliance Operations | SOX, GDPR +2 | 3 | 23 | 3 |
| Credit Risk EMEA | Credit Risk | Basel III, IFRS 9 +2 | 5 | 39 | 5 |
| Risk Management Global | Enterprise Risk | Basel III, NIST RMF +1 | 2 | 11 | 2 |
| AML Compliance Global | Anti-Money Laundering | BSA/AML, FATF +2 | 3 | 20 | 3 |
| Fraud Prevention Americas | Fraud Detection | BSA/AML, CFPB +2 | 3 | 23 | 3 |
| Operational Risk APAC | Operational Risk | Basel III, MAS TRM +1 | 1 | 6 | 1 |
| Commercial Banking | Commercial Banking | OCC, FDIC +2 | 2 | 11 | 2 |
| Wealth Management APAC | Wealth Management | MAS FAA, SFC +2 | 2 | 10 | 2 |
| AgentGP AI Canary | AI Governance (Internal) | EU AI Act, NIST AI RMF +1 | 2 | 8 | 1 |
| Total (15 tenants) | 38 | 267 | 37 | ||
Per-Tenant Cryptographic Results#
Every tenant achieves zero verification failures across SHA-256 hash chains, Merkle roots, and JWS ES256 signatures. The table below is generated from the validation artifacts:
| Tenant | Industry | Traces | Events | Chains | Merkle | Sigs* | Proofs | Failures |
|---|---|---|---|---|---|---|---|---|
| Fixed Income Americas | Fixed Income Trading | 2 | 15 | 2 | 2 | 15 | 2 | 0 |
| Markets Trading EMEA | Markets / Trading | 2 | 17 | 2 | 2 | 17 | 2 | 0 |
| Equities APAC | Equities Trading | 3 | 20 | 3 | 3 | 20 | 3 | 0 |
| Regulatory Reporting | Regulatory Compliance | 6 | 53 | 6 | 6 | 53 | 6 | 0 |
| Investment Banking EMEA | Investment Banking | 1 | 4 | 1 | 1 | 4 | 1 | 0 |
| Retail Banking Americas | Retail Banking | 1 | 7 | 1 | 1 | 7 | 1 | 0 |
| Compliance Global | Compliance Operations | 3 | 23 | 3 | 3 | 23 | 3 | 0 |
| Credit Risk EMEA | Credit Risk | 5 | 39 | 5 | 5 | 39 | 5 | 0 |
| Risk Management Global | Enterprise Risk | 2 | 11 | 2 | 2 | 11 | 2 | 0 |
| AML Compliance Global | Anti-Money Laundering | 3 | 20 | 3 | 3 | 20 | 3 | 0 |
| Fraud Prevention Americas | Fraud Detection | 3 | 23 | 3 | 3 | 23 | 3 | 0 |
| Operational Risk APAC | Operational Risk | 1 | 6 | 1 | 1 | 6 | 1 | 0 |
| Commercial Banking | Commercial Banking | 2 | 11 | 2 | 2 | 11 | 2 | 0 |
| Wealth Management APAC | Wealth Management | 2 | 10 | 2 | 2 | 10 | 2 | 0 |
| AgentGP AI Canary | AI Governance (Internal) | 2 | 8 | 2 | 1 | 8 | 1 | 0 |
| TOTAL | 15 Industries | 38 | 267 | 38 | 37 | 267 | 37 | 0 |
* Sigs = JWS ES256 (ECDSA P-256) signature verification over canonical event body
Merkle Root Verification — 100.0% Pass Rate
governance_merkle_tree.resources[], recomputes the binary Merkle tree using pairwise SHA-256, and confirms the root matches aigp_hash. All 37 Merkle roots verified successfully with zero failures.Parent Hash Chain Verification#
Every AIGP event contains a parent_hash field: the SHA-256 digest of the canonicalized previous event in the trace. This creates a tamper-evident linked chain — modifying any event invalidates all subsequent hashes.
Canonicalization Rules
Before hashing, the event is canonicalized:
- Keys sorted alphabetically (deterministic ordering)
- Compact JSON (no whitespace)
- Falsy values omitted (null, empty string, 0, false)
- Three fields excluded:
event_signature,signature_key_id,parent_hash
# Verification pseudocode
for i in range(1, len(events)):
canonical = canonicalize(events[i-1]) # sort keys, compact JSON, exclude 3 fields
expected_hash = sha256(canonical).hexdigest()
assert events[i].parent_hash == expected_hash # must match exactly38
Chains Verified
of 38 total
0
Chain Breaks
zero tolerance
267
Events Linked
avg 7.0/trace
100.0%
Integrity Rate
pass threshold: 100%
Merkle Root Verification#
The AIGP Merkle root provides a single cryptographic commitment covering all governed resources in a trace. The root is computed from the resource leaf hashes (policy, prompt, tool, and any additional governed resources):
AIGP Merkle Root Computation
AIGP Merkle Root
e9a7e2d850f4cb23...
SHA-256(L0 + L1)
SHA-256(L2 + L3)
Policy Hash
a1b2c3d4...
Prompt Hash
852fc381...
Dynamic Prompt Hash
c3d4e5f6...
Tool Hash
80cf336d...
Leaf = SHA-256("aigp-leaf-v1:" + resource_type + ":" + resource_name + ":" + content)✓ Verified: recomputed root matches stored aigp_hash
# Merkle root verification (v0.12 — binary tree from governance_merkle_tree)
for trace in governed_traces:
proof_event = trace.events[-1] # GOVERNANCE_PROOF event
tree = json.loads(proof_event.governance_merkle_tree)
leaf_hashes = [r["hash"] for r in tree["resources"]]
# Recompute binary Merkle tree (pairwise SHA-256)
level = leaf_hashes[:]
while len(level) > 1:
next_level = []
for i in range(0, len(level), 2):
left, right = level[i], level[i+1] if i+1 < len(level) else level[i]
next_level.append(sha256(f"{left}:{right}").hexdigest())
level = next_level
assert proof_event.aigp_hash == level[0] # root must match
# Result: 37/37 = 100.0% matchJWS ES256 Signature Verification#
Real Cryptographic Signatures
Governance-critical events carry a JWS ES256 signature covering the canonicalized event body — any modification to the event content invalidates the signature.
Signature Coverage
267
Events with ES256 signatures
0
Events without signatures
100.0%
Signature coverage rate
100.0%
ES256 verification pass rate
Why not 100% signature coverage?
Trace Anatomy — Verified Example#
Below is a complete 7-event governed trace from the validation run, showing the full lifecycle from policy creation to governance proof:

| Seq | Event Type | Parent Hash | Valid |
|---|---|---|---|
| #1 | POLICY_LOADED | (genesis) | |
| #2 | GOVERNANCE_INJECT | afbadf269d61d787... | |
| #3 | PROMPT_USED | 89f3411521d07e3b... | |
| #4 | ENFORCEMENT_EVALUATED | 235e88640a5400b6... | |
| #5 | INFERENCE_STARTED | d7bff74d1ce15e3c... | |
| #6 | INFERENCE_COMPLETED | e0ecb39e49f173b8... | |
| #7 | GOVERNANCE_PROOF | 2bb6cee3f7fe1f4b... |
Merkle verification for this trace:
aigp_hash = d64b97206ada92a3...
leaf_count = 5
resource_types = [policy, prompt, context, context, context]
Formula: binary_merkle_tree(resource_hashes)
Result: MATCH — recomputed root equals stored aigp_hashZero Governance Blind Spots#
The central claim of AIGP is that every agent run is governance-accounted. This does not mean every run is approved — it means every run produces either a full governance proof (GOVERNED) or an explicit denial with audit trail (DENIED). There are no silent bypasses.

37
Full Proof
GOVERNED
1
Fail-Closed
DENIED
0
Silent Bypass
Zero tolerance
Deny Rate Analysis#
The 2.63% deny rate is by design — the validation program intentionally includes adversarial scenarios (policy violations, data residency breaches, unregistered agent attempts) alongside normal workflows. A high deny rate under adversarial conditions is a feature: it proves the enforcement layer is working.
In a production deployment, the deny rate would depend on the organization's policy configuration, agent behavior, and input distribution. A deny rate of 0-5% would be typical for well-configured production systems, with the enforcement layer catching only genuine violations.
Scenario Distribution#
The event driver generates traces across three scenario categories — normal governance workflows, adversarial violations, and infrastructure failure modes. Per-scenario trace counts are not yet aggregated by the validation harness.
Standard governance flows: agent requests policy, receives prompt, enforcement evaluates, proof generated. Includes multi-agent orchestration, tool invocation, and collaborative mesh patterns.
Intentional violations: unauthorized agent access, policy violations, data residency breaches, unregistered agent attempts. Traces that trigger these scenarios are expected to be DENIED — a high deny rate under adversarial conditions is a feature.
Simulated infrastructure disruption: service unavailable, enforcement timeout, missing policy. Proves fail-closed behavior (DENIED, never silent pass-through).
Scenario Metrics — Planned
Known Limitations#
Limitations
Same-Codebase Generation and Verification
The crypto verifier (verify_crypto.py) and event driver share the same normalization function. In --from-driver mode, events are generated and verified in the same process. Mitigations: (1) --cross-check flag serializes events to JSON, deserializes, and re-verifies — catching normalization bugs that only exist in-memory; (2) golden normalization tests validate both normalizers against a pre-computed hash on every CI run. Independent third-party verification against a reference implementation is still recommended before using these results for regulatory submissions.
Merkle Verification Scope
Merkle root verification recomputes the binary tree from governance_merkle_tree resource hashes. This confirms the root binding is correct but does not independently verify the individual leaf hashes (policy_hash, prompt_hash, tool_hash) against their source artifacts. Leaf-to-source binding requires access to the original policy/prompt/tool content, which is planned for Phase A live-path validation.
In-Memory Simulation, Not Production Pipeline
All crypto verification runs in-memory via dry-run event driver — 267 events across 38 traces. Events never traverse the live data pipeline. Serialization round-trip issues could break hash chains in production. Live pipeline validation (Phase A) is planned but not yet complete.
Monte Carlo Determinism
Governance coverage and hash integrity have σ = 0 across seeds — these are code-level invariants, not statistical properties. Monte Carlo confirms determinism but adds no additional confidence for these metrics. Only throughput (events/sec) shows genuine statistical variance.
No Latency Measurement
We do not measure governance overhead latency (p95/p99) in this validation. The 0.38s generation time is in-memory simulation speed, not representative of production latency where events traverse the live data pipeline.
Tenant Isolation: Simulation-Scoped
Multi-tenant isolation is verified through UUID5 namespacing and event generation pool separation, not through live cross-tenant query tests or penetration testing. Row-level security enforcement is tested at the database level in separate integration tests, not in this validation harness.
Reproducibility#
Every result in this white paper can be independently reproduced. The validation scripts, seed data, and event driver are all open-source.
Artifact Binding
scripts/db/data/crypto_appendix.json and scripts/db/data/validation_metrics.json at page render time. Re-running the validation harness will update the artifacts and this page will reflect the new results automatically.Step 1: Generate tenant dictionary
python scripts/db/generate_tenant_seed.py \
--seed 42 \
--output scripts/db/data/tenant_dictionary.jsonStep 2: Load seed data into the database
python scripts/db/generate_seed_sql.py --tenants --loadStep 3: Run event simulation
# --traces 50 = max traces per tenant; actual count varies by scenario distribution
# with 15 tenants, produces 38 traces / 267 events
python .build-event-driver/agentgp_event_driver.py \
--all-tenants \
--dictionary scripts/db/data/tenant_dictionary.json \
--seed 42 \
--traces 50 \
--include-adversarial \
--metrics \
--sink dry-runStep 4: Verify cryptographic integrity
python scripts/validation/verify_crypto.py \
--from-driver \
--traces 50 \
--seed 42 \
--include-adversarial \
--output scripts/db/data/crypto_appendix.jsonStep 5: Run full validation harness
python scripts/validation/run_validation.py --traces 50 --seed 42
# Produces: validation_metrics.json, crypto_appendix.jsonExpected Output
governance_coverage: 100% | hash_chain_integrity: 100% | merkle_root: 100% | signatures: 100% | all_passed: truePath to Standards-Grade Evidence#
The current validation program proves the AIGP spec's cryptographic design through deterministic simulation. The following phases are planned to elevate this evidence to standards-grade (e.g., SOC 2, ISO 42001, or regulatory submission quality):
Live-Path Validation Cohort
PlannedRun the identical scenario matrix against the live production pipeline. Compare dry-run artifacts to live-path artifacts event-by-event. This proves governance properties hold through the actual production data path, not just in simulation.
JWKS Rotation & HSM-Backed Key Storage
PlannedES256 (ECDSA P-256) signing is now implemented. This phase extends the key management layer with automated JWKS endpoint rotation and HSM-backed key storage for production hardening. This elevates the trust model from "signatures are cryptographically valid" to "signatures are cryptographically valid under a hardware-rooted key hierarchy."
Live Tenant Isolation Testing
PlannedExecute cross-tenant penetration tests against live row-level security, data store tenant filters, and API-layer tenant context validation. Current isolation evidence is simulation-scoped (UUID5 namespacing). This phase proves isolation under adversarial cross-tenant query conditions against the running system.
Standards Alignment