AIGP v0.12Validation ProgramAll Claims Verified2026-03-02T07:44:15.281928+00:00

AI Governance Proof (AIGP)
Validation Evidence & Cryptographic Proof

Reproducible, auditable evidence that the AIGP spec delivers on its governance promises across 15 regulated industries. AgentGP serves as the reference implementation validating the AIGP thesis in a real-world agentic AI context. Every number on this page is derived from the validation artifacts listed in the Reproducibility section.

38 traces 267 events 0.38s generation AIGP v0.12

Scope of Evidence

This is simulation-level evidence for AIGP spec properties. Events are generated by a deterministic event driver in dry-run mode, not by live AI agents against production infrastructure. Hash chains and Merkle roots use SHA-256; signatures use JWS ES256 with ECDSA P-256 (see § Signatures). Production validation against live infrastructure is a separate phase. Whitepaper O extends this evidence with Monte Carlo replay across multiple seeds.

Simulated data: All 15 tenant configurations — including regulated financial services tenants (trading, banking, risk management, compliance) and the internal AI governance canary tenant — use AI-generated synthetic data. Agent identities, policies, prompts, tools, and organizational hierarchies are fabricated for validation purposes. No real customer data, production workloads, or actual financial institution configurations are used.

Abstract#

This document presents the validation evidence for the AI Governance Proof (AIGP), an open spec for cryptographically provable governance of AI agent systems. To validate the AIGP thesis in a real-world agentic AI context, we built AgentGP — a production-grade AI governance platform that implements the spec end-to-end. We demonstrate that AIGP achieves 97.4% governance accountability across 38 traces and 267 events spanning15 regulated and internal governance tenants covering trading, banking, risk management, compliance, and AI governance.

The validation program tests 5 formal claims through deterministic event simulation, cryptographic verification, and multi-tenant isolation checks. All results are reproducible from a single seed value and all verification scripts are open-source.

Claims Under Test#

The AIGP validation program tests 5 specific, falsifiable claims. Each claim has a defined measurement methodology and pass/fail criteria. We present the claims with full transparency about what “100%” means and what it does not.

Governance Coverage

97.4% (37/38 traces)

Every AIGP-integrated agent run is governance-accounted — full proof OR explicit fail-closed denial. No silent bypass.

37 traces received full GOVERNANCE_PROOF events. 1 traces were denied or lacked proof events but ALL were explicitly accounted for in the audit trail (DENIED, BLOCKED, or lifecycle events). Zero traces escaped governance.

Hash Chain Integrity

100.0% (38/38 chains)

Every event in a trace links to the previous event via SHA-256 parent hash. Any tampering breaks the chain.

For each trace, we verify that event N's parent_hash equals SHA-256(canonical(event N-1)). Canonicalization excludes event_signature, signature_key_id, and parent_hash itself; keys are sorted, compact JSON, falsy values omitted.

Merkle Root Verification

100.0% (37/37 roots)

The AIGP Merkle root is independently recomputed from governance_merkle_tree resource leaf hashes using a binary SHA-256 tree. The verifier extracts leaf hashes from the v0.12 structure and rebuilds the tree to confirm aigp_hash matches.

Every GOVERNANCE_PROOF event carries a governance_merkle_tree with resource leaf hashes (policy, prompt, tool, context). The verifier independently extracts these hashes, recomputes the binary Merkle tree using pairwise SHA-256, and confirms the root matches aigp_hash. 37 Merkle roots checked, 37 passed, 0 failures.

Event Signature Verification (ES256)

100.0% (267/267 signatures)

Signed events can be independently verified using JWS ES256 (ECDSA P-256) cryptographic signatures over the canonical event body. This validates both the signature verification logic and coverage patterns.

Of the 267 total events, 267 (100.0%) carry JWS ES256 (ECDSA P-256) signatures. All 267 checked signatures verify correctly.

Multi-Tenant Isolation (Simulation-Scoped)

38 traces across 15 tenants

No cross-tenant data leakage in generated events. Each tenant's data is cryptographically separated via UUID5 namespacing and RLS enforcement.

Each tenant receives approximately 3 traces (average across 15 tenants). Event generation uses per-tenant agent pools with UUID5-namespaced identifiers (impossible cross-tenant ID collision). Row-Level Security (RLS) is enforced at the database level. This validation confirms isolation in the simulation harness; live cross-tenant penetration testing is a separate phase.

Methodology#

The validation program operates in 4 phases, all driven by deterministic seed values for reproducibility:

Seed Generation

AI-generated tenant dictionary produces agents, policies, prompts, and tools for 15 regulated-industry tenants. Dictionary is version-controlled.

SQL Loading

generate_seed_sql.py --tenants reads the dictionary and produces deterministic UUID5-based SQL. Loaded into the database with row-level security enforcement.

Event Simulation

Event driver generates 38 traces (3/tenant) using governance scenarios in dry-run mode. Each trace maintains AIGP chain integrity with SHA-256 parent hashes and Merkle proofs.

Crypto Verification

verify_crypto.py independently recomputes every parent hash, Merkle root, and JWS ES256 signature. Zero-tolerance: any single failure = overall FAIL.

Validation Scope: In-Memory Simulation

Cryptographic verification uses a deterministic event driver in dry-run mode (in-memory) covering 267 events across 38 traces — this validates the AIGP spec's hash chain and signature design with a 100.0% pass rate. Merkle root verification independently recomputes the binary SHA-256 tree from governance_merkle_tree resource hashes — 100.0% pass rate (see Merkle Root Verification). Live pipeline validation (events traversing the production data pipeline) is planned as Phase A — see Path to Standards-Grade Evidence.

Industry Coverage#

The validation program uses 15 tenants representing distinct regulated industries. Each tenant has its own agents, policies, prompts, tools, and organizational hierarchy AI-generated to ensure realistic, domain-appropriate configurations.

Synthetic Tenant Data

All tenant configurations in the table below are AI-generated and synthetic. They model realistic regulated-industry scenarios (financial services, compliance, risk management) but do not represent real organizations, customer data, or production workloads. The “AgentGP AI Canary” tenant is an internal governance test fixture.

Tenant	Industry	Key Regulations	Traces	Events	Proofs
Fixed Income Americas	Fixed Income Trading	SEC, FINRA +2	2	15	2
Markets Trading EMEA	Markets / Trading	MiFID II, FCA +2	2	17	2
Equities APAC	Equities Trading	MAS, SFC +2	3	20	3
Regulatory Reporting	Regulatory Compliance	Basel III, SOX +2	6	53	6
Investment Banking EMEA	Investment Banking	MiFID II, MAR +2	1	4	1
Retail Banking Americas	Retail Banking	CFPB, TILA +2	1	7	1
Compliance Global	Compliance Operations	SOX, GDPR +2	3	23	3
Credit Risk EMEA	Credit Risk	Basel III, IFRS 9 +2	5	39	5
Risk Management Global	Enterprise Risk	Basel III, NIST RMF +1	2	11	2
AML Compliance Global	Anti-Money Laundering	BSA/AML, FATF +2	3	20	3
Fraud Prevention Americas	Fraud Detection	BSA/AML, CFPB +2	3	23	3
Operational Risk APAC	Operational Risk	Basel III, MAS TRM +1	1	6	1
Commercial Banking	Commercial Banking	OCC, FDIC +2	2	11	2
Wealth Management APAC	Wealth Management	MAS FAA, SFC +2	2	10	2
AgentGP AI Canary	AI Governance (Internal)	EU AI Act, NIST AI RMF +1	2	8	1
Total (15 tenants)			38	267	37

Per-Tenant Cryptographic Results#

Every tenant achieves zero verification failures across SHA-256 hash chains, Merkle roots, and JWS ES256 signatures. The table below is generated from the validation artifacts:

Tenant	Industry	Traces	Events	Chains	Merkle	Sigs*	Proofs
Fixed Income Americas	Fixed Income Trading	2	15	2	2	15	2
Markets Trading EMEA	Markets / Trading	2	17	2	2	17	2
Equities APAC	Equities Trading	3	20	3	3	20	3
Regulatory Reporting	Regulatory Compliance	6	53	6	6	53	6
Investment Banking EMEA	Investment Banking	1	4	1	1	4	1
Retail Banking Americas	Retail Banking	1	7	1	1	7	1
Compliance Global	Compliance Operations	3	23	3	3	23	3
Credit Risk EMEA	Credit Risk	5	39	5	5	39	5
Risk Management Global	Enterprise Risk	2	11	2	2	11	2
AML Compliance Global	Anti-Money Laundering	3	20	3	3	20	3
Fraud Prevention Americas	Fraud Detection	3	23	3	3	23	3
Operational Risk APAC	Operational Risk	1	6	1	1	6	1
Commercial Banking	Commercial Banking	2	11	2	2	11	2
Wealth Management APAC	Wealth Management	2	10	2	2	10	2
AgentGP AI Canary	AI Governance (Internal)	2	8	2	1	8	1
TOTAL	15 Industries	38	267	38	37	267	37

* Sigs = JWS ES256 (ECDSA P-256) signature verification over canonical event body

Merkle Root Verification — 100.0% Pass Rate

The v0.12 verifier independently extracts resource leaf hashes from governance_merkle_tree.resources[], recomputes the binary Merkle tree using pairwise SHA-256, and confirms the root matches aigp_hash. All 37 Merkle roots verified successfully with zero failures.

Parent Hash Chain Verification#

Every AIGP event contains a parent_hash field: the SHA-256 digest of the canonicalized previous event in the trace. This creates a tamper-evident linked chain — modifying any event invalidates all subsequent hashes.

Canonicalization Rules

Before hashing, the event is canonicalized:

Keys sorted alphabetically (deterministic ordering)
Compact JSON (no whitespace)
Falsy values omitted (null, empty string, 0, false)
Three fields excluded: event_signature, signature_key_id, parent_hash

Pythonpython

# Verification pseudocode
for i in range(1, len(events)):
    canonical = canonicalize(events[i-1])  # sort keys, compact JSON, exclude 3 fields
    expected_hash = sha256(canonical).hexdigest()
    assert events[i].parent_hash == expected_hash  # must match exactly

Chains Verified

of 38 total

Chain Breaks

zero tolerance

267

Events Linked

avg 7.0/trace

100.0%

Integrity Rate

pass threshold: 100%

Merkle Root Verification#

The AIGP Merkle root provides a single cryptographic commitment covering all governed resources in a trace. The root is computed from the resource leaf hashes (policy, prompt, tool, and any additional governed resources):

AIGP Merkle Root Computation

AIGP Merkle Root

e9a7e2d850f4cb23...

SHA-256(L0 + L1)

SHA-256(L2 + L3)

Policy Hash

a1b2c3d4...

Prompt Hash

852fc381...

Dynamic Prompt Hash

c3d4e5f6...

Tool Hash

80cf336d...

Leaf = SHA-256("aigp-leaf-v1:" + resource_type + ":" + resource_name + ":" + content)

✓ Verified: recomputed root matches stored aigp_hash

Fig. 2. Merkle root construction from governance resource hashes (policy, prompt, dynamic prompt, tool). The binary tree uses pairwise SHA-256 with domain-separated leaf hashes and is independently verifiable from the governance_merkle_tree structure.

Pythonpython

# Merkle root verification (v0.12 — binary tree from governance_merkle_tree)
for trace in governed_traces:
    proof_event = trace.events[-1]  # GOVERNANCE_PROOF event
    tree = json.loads(proof_event.governance_merkle_tree)
    leaf_hashes = [r["hash"] for r in tree["resources"]]

    # Recompute binary Merkle tree (pairwise SHA-256)
    level = leaf_hashes[:]
    while len(level) > 1:
        next_level = []
        for i in range(0, len(level), 2):
            left, right = level[i], level[i+1] if i+1 < len(level) else level[i]
            next_level.append(sha256(f"{left}:{right}").hexdigest())
        level = next_level

    assert proof_event.aigp_hash == level[0]  # root must match
    # Result: 37/37 = 100.0% match

JWS ES256 Signature Verification#

Real Cryptographic Signatures

This validation uses JWS ES256 (ECDSA P-256) signatures over the canonical event body. Each governance-critical event is signed with a real ECDSA P-256 key pair, producing a standard JWS compact serialization that can be independently verified against the published JWKS endpoint.

Governance-critical events carry a JWS ES256 signature covering the canonicalized event body — any modification to the event content invalidates the signature.

Signature Coverage

267

Events with ES256 signatures

Events without signatures

100.0%

Signature coverage rate

100.0%

ES256 verification pass rate

Why not 100% signature coverage?

Not all event types require signatures. Lightweight telemetry events (INFERENCE_STARTED, INFERENCE_COMPLETED) may be unsigned to reduce overhead. Governance-critical events (GOVERNANCE_INJECT, ENFORCEMENT_EVALUATED, GOVERNANCE_PROOF, POLICY_LOADED) are always signed. The 100.0% coverage rate reflects this intentional design choice.

Trace Anatomy — Verified Example#

Below is a complete 7-event governed trace from the validation run, showing the full lifecycle from policy creation to governance proof:

Trace anatomy: governed trace with hash chain verification — Fig. 3. Trace 4cceadfebe934040… — 7 events from POLICY_LOADED to GOVERNANCE_PROOF. Every parent hash matches. Verdict: GOVERNED.

Seq	Event Type	Parent Hash
#1	POLICY_LOADED	(genesis)
#2	GOVERNANCE_INJECT	afbadf269d61d787...
#3	PROMPT_USED	89f3411521d07e3b...
#4	ENFORCEMENT_EVALUATED	235e88640a5400b6...
#5	INFERENCE_STARTED	d7bff74d1ce15e3c...
#6	INFERENCE_COMPLETED	e0ecb39e49f173b8...
#7	GOVERNANCE_PROOF	2bb6cee3f7fe1f4b...

Merkle verification for this trace:

Pythonpython

aigp_hash       = d64b97206ada92a3...
leaf_count      = 5
resource_types  = [policy, prompt, context, context, context]

Formula: binary_merkle_tree(resource_hashes)
Result:  MATCH — recomputed root equals stored aigp_hash

Zero Governance Blind Spots#

The central claim of AIGP is that every agent run is governance-accounted. This does not mean every run is approved — it means every run produces either a full governance proof (GOVERNED) or an explicit denial with audit trail (DENIED). There are no silent bypasses.

Full Proof

GOVERNED

Fail-Closed

DENIED

Silent Bypass

Zero tolerance

Deny Rate Analysis#

The 2.63% deny rate is by design — the validation program intentionally includes adversarial scenarios (policy violations, data residency breaches, unregistered agent attempts) alongside normal workflows. A high deny rate under adversarial conditions is a feature: it proves the enforcement layer is working.

In a production deployment, the deny rate would depend on the organization's policy configuration, agent behavior, and input distribution. A deny rate of 0-5% would be typical for well-configured production systems, with the enforcement layer catching only genuine violations.

Scenario Distribution#

The event driver generates traces across three scenario categories — normal governance workflows, adversarial violations, and infrastructure failure modes. Per-scenario trace counts are not yet aggregated by the validation harness.

Normal Workflows

Standard governance flows: agent requests policy, receives prompt, enforcement evaluates, proof generated. Includes multi-agent orchestration, tool invocation, and collaborative mesh patterns.

Adversarial Scenarios

Intentional violations: unauthorized agent access, policy violations, data residency breaches, unregistered agent attempts. Traces that trigger these scenarios are expected to be DENIED — a high deny rate under adversarial conditions is a feature.

Infrastructure Failure Modes

Simulated infrastructure disruption: service unavailable, enforcement timeout, missing policy. Proves fail-closed behavior (DENIED, never silent pass-through).

Scenario Metrics — Planned

Per-scenario trace counts (e.g., how many traces exercise “policy violation” vs “multi-agent orchestration”) are not yet collected by the validation harness. The event driver generates traces across all categories listed above, but the metrics output does not yet disaggregate by individual scenario type. This is planned for a future validation harness update. The aggregate deny rate (2.6%) confirms that adversarial and failure scenarios are present in the trace population.

Known Limitations#

Limitations

The following limitations apply to this validation. Readers evaluating these claims for regulatory or compliance purposes should review each item carefully.

Same-Codebase Generation and Verification

The crypto verifier (verify_crypto.py) and event driver share the same normalization function. In --from-driver mode, events are generated and verified in the same process. Mitigations: (1) --cross-check flag serializes events to JSON, deserializes, and re-verifies — catching normalization bugs that only exist in-memory; (2) golden normalization tests validate both normalizers against a pre-computed hash on every CI run. Independent third-party verification against a reference implementation is still recommended before using these results for regulatory submissions.

Merkle Verification Scope

Merkle root verification recomputes the binary tree from governance_merkle_tree resource hashes. This confirms the root binding is correct but does not independently verify the individual leaf hashes (policy_hash, prompt_hash, tool_hash) against their source artifacts. Leaf-to-source binding requires access to the original policy/prompt/tool content, which is planned for Phase A live-path validation.

In-Memory Simulation, Not Production Pipeline

All crypto verification runs in-memory via dry-run event driver — 267 events across 38 traces. Events never traverse the live data pipeline. Serialization round-trip issues could break hash chains in production. Live pipeline validation (Phase A) is planned but not yet complete.

Monte Carlo Determinism

Governance coverage and hash integrity have σ = 0 across seeds — these are code-level invariants, not statistical properties. Monte Carlo confirms determinism but adds no additional confidence for these metrics. Only throughput (events/sec) shows genuine statistical variance.

No Latency Measurement

We do not measure governance overhead latency (p95/p99) in this validation. The 0.38s generation time is in-memory simulation speed, not representative of production latency where events traverse the live data pipeline.

Tenant Isolation: Simulation-Scoped

Multi-tenant isolation is verified through UUID5 namespacing and event generation pool separation, not through live cross-tenant query tests or penetration testing. Row-level security enforcement is tested at the database level in separate integration tests, not in this validation harness.

Reproducibility#

Every result in this white paper can be independently reproduced. The validation scripts, seed data, and event driver are all open-source.

Artifact Binding

All metrics on this page are read from scripts/db/data/crypto_appendix.json and scripts/db/data/validation_metrics.json at page render time. Re-running the validation harness will update the artifacts and this page will reflect the new results automatically.

Step 1: Generate tenant dictionary

Seed generationbash

python scripts/db/generate_tenant_seed.py \
  --seed 42 \
  --output scripts/db/data/tenant_dictionary.json

Step 2: Load seed data into the database

SQL loadingbash

python scripts/db/generate_seed_sql.py --tenants --load

Step 3: Run event simulation

Event simulation (dry-run)bash

# --traces 50 = max traces per tenant; actual count varies by scenario distribution
# with 15 tenants, produces 38 traces / 267 events
python .build-event-driver/agentgp_event_driver.py \
  --all-tenants \
  --dictionary scripts/db/data/tenant_dictionary.json \
  --seed 42 \
  --traces 50 \
  --include-adversarial \
  --metrics \
  --sink dry-run

Step 4: Verify cryptographic integrity

Crypto verificationbash

python scripts/validation/verify_crypto.py \
  --from-driver \
  --traces 50 \
  --seed 42 \
  --include-adversarial \
  --output scripts/db/data/crypto_appendix.json

Step 5: Run full validation harness

Full harness (combines steps 2-4)bash

python scripts/validation/run_validation.py --traces 50 --seed 42
# Produces: validation_metrics.json, crypto_appendix.json

Expected Output

If all claims hold, you should see:
governance_coverage: 100% | hash_chain_integrity: 100% | merkle_root: 100% | signatures: 100% | all_passed: true

Path to Standards-Grade Evidence#

The current validation program proves the AIGP spec's cryptographic design through deterministic simulation. The following phases are planned to elevate this evidence to standards-grade (e.g., SOC 2, ISO 42001, or regulatory submission quality):

Live-Path Validation Cohort

Planned

Run the identical scenario matrix against the live production pipeline. Compare dry-run artifacts to live-path artifacts event-by-event. This proves governance properties hold through the actual production data path, not just in simulation.

JWKS Rotation & HSM-Backed Key Storage

Planned

ES256 (ECDSA P-256) signing is now implemented. This phase extends the key management layer with automated JWKS endpoint rotation and HSM-backed key storage for production hardening. This elevates the trust model from "signatures are cryptographically valid" to "signatures are cryptographically valid under a hardware-rooted key hierarchy."

Live Tenant Isolation Testing

Planned

Execute cross-tenant penetration tests against live row-level security, data store tenant filters, and API-layer tenant context validation. Current isolation evidence is simulation-scoped (UUID5 namespacing). This phase proves isolation under adversarial cross-tenant query conditions against the running system.

Standards Alignment

Completing Phases A–C would provide evidence suitable for SOC 2 Type II (governance control effectiveness), ISO/IEC 42001 (AI management system), and sector-specific regulatory submissions (HIPAA, PSD2, 21 CFR Part 11). Each phase produces independently verifiable artifacts that extend the current evidence pack.

AIGP v0.12Validation ProgramAll Claims Verified2026-03-02T07:44:15.281928+00:00

AI Governance Proof (AIGP)
Validation Evidence & Cryptographic Proof

38 traces 267 events 0.38s generation AIGP v0.12

Scope of Evidence

Abstract#

Claims Under Test#

Governance Coverage

97.4% (37/38 traces)

Every AIGP-integrated agent run is governance-accounted — full proof OR explicit fail-closed denial. No silent bypass.

Hash Chain Integrity

100.0% (38/38 chains)

Every event in a trace links to the previous event via SHA-256 parent hash. Any tampering breaks the chain.

Merkle Root Verification

100.0% (37/37 roots)

Event Signature Verification (ES256)

100.0% (267/267 signatures)

Of the 267 total events, 267 (100.0%) carry JWS ES256 (ECDSA P-256) signatures. All 267 checked signatures verify correctly.

Multi-Tenant Isolation (Simulation-Scoped)

38 traces across 15 tenants

No cross-tenant data leakage in generated events. Each tenant's data is cryptographically separated via UUID5 namespacing and RLS enforcement.

Methodology#

The validation program operates in 4 phases, all driven by deterministic seed values for reproducibility:

Seed Generation

AI-generated tenant dictionary produces agents, policies, prompts, and tools for 15 regulated-industry tenants. Dictionary is version-controlled.

SQL Loading

generate_seed_sql.py --tenants reads the dictionary and produces deterministic UUID5-based SQL. Loaded into the database with row-level security enforcement.

Event Simulation

Event driver generates 38 traces (3/tenant) using governance scenarios in dry-run mode. Each trace maintains AIGP chain integrity with SHA-256 parent hashes and Merkle proofs.

Crypto Verification

verify_crypto.py independently recomputes every parent hash, Merkle root, and JWS ES256 signature. Zero-tolerance: any single failure = overall FAIL.

Validation Scope: In-Memory Simulation

Industry Coverage#

Synthetic Tenant Data

Tenant	Industry	Key Regulations	Traces	Events	Proofs
Fixed Income Americas	Fixed Income Trading	SEC, FINRA +2	2	15	2
Markets Trading EMEA	Markets / Trading	MiFID II, FCA +2	2	17	2
Equities APAC	Equities Trading	MAS, SFC +2	3	20	3
Regulatory Reporting	Regulatory Compliance	Basel III, SOX +2	6	53	6
Investment Banking EMEA	Investment Banking	MiFID II, MAR +2	1	4	1
Retail Banking Americas	Retail Banking	CFPB, TILA +2	1	7	1
Compliance Global	Compliance Operations	SOX, GDPR +2	3	23	3
Credit Risk EMEA	Credit Risk	Basel III, IFRS 9 +2	5	39	5
Risk Management Global	Enterprise Risk	Basel III, NIST RMF +1	2	11	2
AML Compliance Global	Anti-Money Laundering	BSA/AML, FATF +2	3	20	3
Fraud Prevention Americas	Fraud Detection	BSA/AML, CFPB +2	3	23	3
Operational Risk APAC	Operational Risk	Basel III, MAS TRM +1	1	6	1
Commercial Banking	Commercial Banking	OCC, FDIC +2	2	11	2
Wealth Management APAC	Wealth Management	MAS FAA, SFC +2	2	10	2
AgentGP AI Canary	AI Governance (Internal)	EU AI Act, NIST AI RMF +1	2	8	1
Total (15 tenants)			38	267	37

Per-Tenant Cryptographic Results#

Every tenant achieves zero verification failures across SHA-256 hash chains, Merkle roots, and JWS ES256 signatures. The table below is generated from the validation artifacts:

Tenant	Industry	Traces	Events	Chains	Merkle	Sigs*	Proofs
Fixed Income Americas	Fixed Income Trading	2	15	2	2	15	2
Markets Trading EMEA	Markets / Trading	2	17	2	2	17	2
Equities APAC	Equities Trading	3	20	3	3	20	3
Regulatory Reporting	Regulatory Compliance	6	53	6	6	53	6
Investment Banking EMEA	Investment Banking	1	4	1	1	4	1
Retail Banking Americas	Retail Banking	1	7	1	1	7	1
Compliance Global	Compliance Operations	3	23	3	3	23	3
Credit Risk EMEA	Credit Risk	5	39	5	5	39	5
Risk Management Global	Enterprise Risk	2	11	2	2	11	2
AML Compliance Global	Anti-Money Laundering	3	20	3	3	20	3
Fraud Prevention Americas	Fraud Detection	3	23	3	3	23	3
Operational Risk APAC	Operational Risk	1	6	1	1	6	1
Commercial Banking	Commercial Banking	2	11	2	2	11	2
Wealth Management APAC	Wealth Management	2	10	2	2	10	2
AgentGP AI Canary	AI Governance (Internal)	2	8	2	1	8	1
TOTAL	15 Industries	38	267	38	37	267	37

* Sigs = JWS ES256 (ECDSA P-256) signature verification over canonical event body

Merkle Root Verification — 100.0% Pass Rate

Parent Hash Chain Verification#

Canonicalization Rules

Before hashing, the event is canonicalized:

Keys sorted alphabetically (deterministic ordering)
Compact JSON (no whitespace)
Falsy values omitted (null, empty string, 0, false)
Three fields excluded: event_signature, signature_key_id, parent_hash

Pythonpython

# Verification pseudocode
for i in range(1, len(events)):
    canonical = canonicalize(events[i-1])  # sort keys, compact JSON, exclude 3 fields
    expected_hash = sha256(canonical).hexdigest()
    assert events[i].parent_hash == expected_hash  # must match exactly

Chains Verified

of 38 total

Chain Breaks

zero tolerance

267

Events Linked

avg 7.0/trace

100.0%

Integrity Rate

pass threshold: 100%

Merkle Root Verification#

AIGP Merkle Root Computation

AIGP Merkle Root

e9a7e2d850f4cb23...

SHA-256(L0 + L1)

SHA-256(L2 + L3)

Policy Hash

a1b2c3d4...

Prompt Hash

852fc381...

Dynamic Prompt Hash

c3d4e5f6...

Tool Hash

80cf336d...

Leaf = SHA-256("aigp-leaf-v1:" + resource_type + ":" + resource_name + ":" + content)

✓ Verified: recomputed root matches stored aigp_hash

Pythonpython

# Merkle root verification (v0.12 — binary tree from governance_merkle_tree)
for trace in governed_traces:
    proof_event = trace.events[-1]  # GOVERNANCE_PROOF event
    tree = json.loads(proof_event.governance_merkle_tree)
    leaf_hashes = [r["hash"] for r in tree["resources"]]

    # Recompute binary Merkle tree (pairwise SHA-256)
    level = leaf_hashes[:]
    while len(level) > 1:
        next_level = []
        for i in range(0, len(level), 2):
            left, right = level[i], level[i+1] if i+1 < len(level) else level[i]
            next_level.append(sha256(f"{left}:{right}").hexdigest())
        level = next_level

    assert proof_event.aigp_hash == level[0]  # root must match
    # Result: 37/37 = 100.0% match

JWS ES256 Signature Verification#

Real Cryptographic Signatures

Governance-critical events carry a JWS ES256 signature covering the canonicalized event body — any modification to the event content invalidates the signature.

Signature Coverage

267

Events with ES256 signatures

Events without signatures

100.0%

Signature coverage rate

100.0%

ES256 verification pass rate

Why not 100% signature coverage?

Trace Anatomy — Verified Example#

Below is a complete 7-event governed trace from the validation run, showing the full lifecycle from policy creation to governance proof:

Seq	Event Type	Parent Hash
#1	POLICY_LOADED	(genesis)
#2	GOVERNANCE_INJECT	afbadf269d61d787...
#3	PROMPT_USED	89f3411521d07e3b...
#4	ENFORCEMENT_EVALUATED	235e88640a5400b6...
#5	INFERENCE_STARTED	d7bff74d1ce15e3c...
#6	INFERENCE_COMPLETED	e0ecb39e49f173b8...
#7	GOVERNANCE_PROOF	2bb6cee3f7fe1f4b...

Merkle verification for this trace:

Pythonpython

aigp_hash       = d64b97206ada92a3...
leaf_count      = 5
resource_types  = [policy, prompt, context, context, context]

Formula: binary_merkle_tree(resource_hashes)
Result:  MATCH — recomputed root equals stored aigp_hash

Zero Governance Blind Spots#

Full Proof

GOVERNED

Fail-Closed

DENIED

Silent Bypass

Zero tolerance

Deny Rate Analysis#

Scenario Distribution#

Normal Workflows

Standard governance flows: agent requests policy, receives prompt, enforcement evaluates, proof generated. Includes multi-agent orchestration, tool invocation, and collaborative mesh patterns.

Adversarial Scenarios

Infrastructure Failure Modes

Simulated infrastructure disruption: service unavailable, enforcement timeout, missing policy. Proves fail-closed behavior (DENIED, never silent pass-through).

Scenario Metrics — Planned

Known Limitations#

Limitations

The following limitations apply to this validation. Readers evaluating these claims for regulatory or compliance purposes should review each item carefully.

Same-Codebase Generation and Verification

Merkle Verification Scope

In-Memory Simulation, Not Production Pipeline

Monte Carlo Determinism

No Latency Measurement

Tenant Isolation: Simulation-Scoped

Reproducibility#

Every result in this white paper can be independently reproduced. The validation scripts, seed data, and event driver are all open-source.

Artifact Binding

Step 1: Generate tenant dictionary

Seed generationbash

python scripts/db/generate_tenant_seed.py \
  --seed 42 \
  --output scripts/db/data/tenant_dictionary.json

Step 2: Load seed data into the database

SQL loadingbash

python scripts/db/generate_seed_sql.py --tenants --load

Step 3: Run event simulation

Event simulation (dry-run)bash

# --traces 50 = max traces per tenant; actual count varies by scenario distribution
# with 15 tenants, produces 38 traces / 267 events
python .build-event-driver/agentgp_event_driver.py \
  --all-tenants \
  --dictionary scripts/db/data/tenant_dictionary.json \
  --seed 42 \
  --traces 50 \
  --include-adversarial \
  --metrics \
  --sink dry-run

Step 4: Verify cryptographic integrity

Crypto verificationbash

python scripts/validation/verify_crypto.py \
  --from-driver \
  --traces 50 \
  --seed 42 \
  --include-adversarial \
  --output scripts/db/data/crypto_appendix.json

Step 5: Run full validation harness

Full harness (combines steps 2-4)bash

python scripts/validation/run_validation.py --traces 50 --seed 42
# Produces: validation_metrics.json, crypto_appendix.json

Expected Output

If all claims hold, you should see:
governance_coverage: 100% | hash_chain_integrity: 100% | merkle_root: 100% | signatures: 100% | all_passed: true

Path to Standards-Grade Evidence#

Live-Path Validation Cohort

Planned

JWKS Rotation & HSM-Backed Key Storage

Planned

Live Tenant Isolation Testing

Planned

Standards Alignment