AIGP v0.12Monte Carlo ValidationAll Runs Passed2026-03-28 01:13:58 UTC

AI Governance Proof (AIGP)
Operational Validation via Monte Carlo Replay

This white paper demonstrates the operational characteristics of the AIGP spec through AgentGP, a production-grade platform built to prove the AIGP thesis in a real-world agentic AI context. Monte Carlo replay across 500 independent runs confirms that governance coverage and hash integrity are structural invariants of the spec — not seed-dependent outcomes.

500 Monte Carlo runs Multi-tenant isolation Confidence intervals 500 deterministic seeds

Overview#

The AIGP validation program (detailed in Whitepaper A) establishes that the AIGP spec's cryptographic properties hold across a single deterministic seed. This companion document extends that evidence through Monte Carlo replay — running the same validation harness across multiple independent seeds to measure variance, compute confidence intervals, and confirm that critical properties are structural invariants rather than artifacts of a single run.

The evidence pack analyzes 500 validation runs with deterministic seeds, checking full cryptographic lineage from event hash chains through Merkle root and signature verification.

Adversarial (500 seeds)

Deny 25.00%

Allow 75.00%

Throughput 531 eps ± 16 eps

Total Volume

18,954

traces across 500 runs

131,029 events verified

Cryptographic Integrity

100.00%

Hash chain: 100.00% (σ = 0.0)

JWS ES256: 100.00% (σ = 0.0)

Global Verdict

PASS

All 500 runs

Coverage & integrity deterministic

Key Findings#

The Monte Carlo extension confirms 4 key properties of the AIGP spec, as implemented by AgentGP:

Governance Coverage = 100.00%

Confirmed

Governance coverage remained 100.00% in every analyzed run (σ = 0.0) — a structural invariant of the AIGP spec, not a seed-dependent outcome.

Hash Chain Integrity = 100.00%

Confirmed

Hash-chain integrity remained 100.00% with zero chain failures observed across all Monte Carlo runs and all seeds.

JWS ES256 Signatures = 100.00%

Confirmed

JWS ES256 (ECDSA P-256) signature verification achieved 100.00% pass rate across 131,029 events and 500 seeds. Signatures use real ECDSA P-256 key pairs generated per-session (JWKS rotation and HSM backing are planned).

Merkle Root Verification = 100.00%

Confirmed

Merkle root recomputation independently verified across all 500 seeds. The verifier extracts leaf hashes from governance_merkle_tree.resources[], rebuilds the binary SHA-256 tree, and confirms the root matches aigp_hash.

Validation Pipeline#

The validation harness executes simulation, cryptographic verification, and artifact synthesis. Monte Carlo extension reruns the same scenario distribution across multiple deterministic seeds to estimate variance and confidence intervals.

Validation and Evidence Pipeline

System Model#

AgentGP implements the AIGP spec through a complete governance execution pipeline. Governed requests include policy + prompt + tools, then emit a cryptographically verifiable governance event for audit and analytics.

AgentGP — AIGP Governance Execution Model

AIGP vs AgentGP

AIGP (AI Governance Proof) is the open spec that defines the governance event schema, hash chain construction, Merkle root computation, and signature requirements. AgentGP is the production-grade platform that implements this spec. This whitepaper validates AIGP spec properties through the AgentGP implementation.

Simulated data: All tenant configurations — including regulated financial services tenants and the internal AI governance canary tenant — use AI-generated synthetic data. No real customer data or production workloads are used. See Whitepaper A § Industry Coverage for full tenant details.

Proof Timeline, Merkle Tree, and Agent Topology#

The following visuals show the proof timeline anatomy, Merkle root construction, and the multi-agent orchestration topology that emits the final governance proof.

Proof timeline with trace-linked events — Fig. 1. Proof timeline (trace anatomy): sequence-linked governance events from policy/prompt/tool usage to GOVERNANCE_PROOF.

AIGP Merkle Root Computation

AIGP Merkle Root

e9a7e2d850f4cb23...

SHA-256(L0 + L1)

SHA-256(L2 + L3)

Policy Hash

a1b2c3d4...

Prompt Hash

852fc381...

Dynamic Prompt Hash

c3d4e5f6...

Tool Hash

80cf336d...

Leaf = SHA-256("aigp-leaf-v1:" + resource_type + ":" + resource_name + ":" + content)

✓ Verified: recomputed root matches stored aigp_hash

Fig. 2. Merkle root construction from governance resource hashes (policy, prompt, dynamic prompt, tool). The binary tree uses pairwise SHA-256 with domain-separated leaf hashes.

Multi-Agent Topology Emitting Governance Proof

Monte Carlo Matrix#

The Monte Carlo validation matrix executes the following runs, varying seeds, trace volumes, and scenario distributions:

Validation matrix (executed)bash

# Baseline + stress
# AIGP v0.12 Monte Carlo batch — 500 independent seeds, adversarial scenarios
python3 scripts/validation/run_monte_carlo_batch.py --runs 500 --traces 50 --parallel 12

# Aggregate results into briefing.json and monte_carlo_summary.json
python3 scripts/validation/aggregate_monte_carlo.py

# Single-seed verification (for quick validation)
python3 scripts/validation/verify_crypto.py --from-driver --seed 42 --traces 50 --include-adversarial

Core Metrics#

The following table summarizes results across all Monte Carlo runs, with 95% confidence intervals computed from the distribution of per-run metrics:

Cohort	n	Deny Rate (mean ±95% CI)	Allow Rate (mean ±95% CI)	Gen. Speed (mean ±95% CI)
Adversarial (500 seeds)	500	25.00% ± 0.00%	75.00% ± 0.00%	531 eps ± 16 eps

500

Monte Carlo Runs

independent seeds

100.00%

Governance Coverage

σ = 0.0 across all runs

100.00%

Hash Integrity

σ = 0.0 across all runs

PASS

Global Verdict

all runs

Monte Carlo Invariance Proof — 500 Independent Seeds

Every metric was computed independently per seed. σ = 0 confirms these are code-level invariants — not statistical properties. Monte Carlo adds no additional confidence for these metrics; the value is confirming determinism across 500 independent seeds.

Property	n	Mean	Min	Max	Verdict
Governance Coverage	500	100.00%	100.00%	100.00%	PASS
SHA-256 Hash Chain	500	100.00%	100.00%	100.00%	PASS
Merkle Root Verification	500	100.00%	100.00%	100.00%	PASS
JWS ES256 Signatures	500	100.00%	100.00%	100.00%	PASS
Adversarial Deny Rate	500	25.00%	25.00%	25.00%	PASS

Fig. 3. Structural invariance proof: all 500 seeds produce identical governance and cryptographic results (σ = 0), confirming these are design guarantees of the AIGP spec.

Harness Generation Throughput — 500 Seeds

In-memory event generation speed (not production pipeline latency). This is the only metric with non-zero variance — expected, as it depends on CPU scheduling.

531

Mean (eps)

±16

95% CI

180

120

Min (eps)

1044

Max (eps)

Fig. 4. Throughput distribution across 500 seeds. CPU-scheduling variance produces a range of 120–1044 eps while governance properties remain invariant.

Tenant Isolation#

Cross-tenant behavior remains bounded and consistent by scenario design. No cross-tenant leakage is observed in generated artifacts; isolation validation is included in every run.

Namespace Isolation

Each tenant receives UUID5-namespaced event IDs, trace IDs, and agent IDs. No ID collision is possible across tenants.

Policy Enforcement Boundary

OPA/Rego policies are tenant-scoped. A tenant's adversarial scenario cannot influence another tenant's deny/allow decisions.

Cryptographic Binding

Hash chains are per-trace, per-tenant. Parent hash references cannot leak across tenant boundaries.

Scale Behavior#

Across 500 adversarial runs generating 18,954 traces and 131,029 events, the harness generation throughput averaged 531 events/second (95% CI: ±16 eps, range 120–1044 eps). The variance is entirely CPU-scheduling noise — all governance and cryptographic properties remained deterministic (σ = 0) regardless of throughput fluctuation.

Scale Caveat

These throughput numbers reflect in-memory simulation speed, not production latency where events traverse the live production data pipeline. Production-scale performance characterization is a separate validation phase (Phase A).

Limits & Assumptions#

Limitations

The following limitations apply to this operational validation. Readers evaluating these claims for regulatory or compliance purposes should review each item carefully.

Simulation-Derived, Not Production Telemetry

These results come from deterministic event simulation, not customer production workloads. While the event driver generates realistic governance flows with real SHA-256 hashes and JWS ES256 signatures, it does not exercise the full production data pipeline. Serialization round-trip issues could break hash chains in production.

Monte Carlo Artifact Version

The Monte Carlo runs in this whitepaper were generated with the previous event driver version. The AIGP governance properties under test (coverage, hash integrity, deny rate, proof rate) are structurally identical across event driver versions. The v0.12 spec changes (governance_merkle_tree.resources[] format) affect event schema but not the aggregate metrics computed here. Regeneration with the v0.12 event driver is planned.

Same-Codebase Generation and Verification

The crypto verifier (verify_crypto.py) and event driver share the same normalization function. Mitigations: (1) --cross-check flag serializes events to JSON, deserializes, and re-verifies independently — catching normalization bugs that only exist in-memory; (2) golden normalization tests validate both normalizers against a pre-computed hash on every CI run. Independent third-party verification against a reference implementation is still recommended before regulatory submissions.

Merkle Verification Scope

Merkle root verification recomputes the binary tree from governance_merkle_tree resource hashes and confirms the root matches aigp_hash. This validates the tree binding but does not independently verify individual leaf hashes against their source artifacts (policy/prompt/tool content). Leaf-to-source verification requires live DB access, planned for Phase A.

External Validity

External validity requires replay against at least one real beta tenant traffic profile. The current validation covers AI-generated industry configurations, not actual customer agent behavior patterns.

Production Key Lifecycle

Signing keys are ephemeral (generated per-session). Production key lifecycle, JWKS rotation, certificate chain verification, and HSM integration should be independently audited.

Reproducibility#

Every result in this white paper can be independently reproduced. The exact evidence artifacts backing this page are available under the validation artifacts directory.

Key artifact filestext

scripts/validation/artifacts/
  mc_adv_<seed>/              # Per-seed directories (mc_adv_1 ... mc_adv_500)
    validation_report.json    # Validation verdict and per-claim results
    crypto_appendix.json      # Hash chain, Merkle, and signature verification
  monte_carlo_summary.json    # Aggregate summary across all seeds
public/whitepaper-o/
  briefing.json               # Compact briefing consumed by this page

Running the Monte Carlo validation

Monte Carlo replaybash

# Run the full Monte Carlo suite — 500 independent seeds, 12 parallel workers
python3 scripts/validation/run_monte_carlo_batch.py --runs 500 --traces 50 --parallel 12

# Aggregate all per-seed results into monte_carlo_summary.json and briefing.json
python3 scripts/validation/aggregate_monte_carlo.py

# Single-seed quick verification
python3 scripts/validation/verify_crypto.py --from-driver --seed 42 --traces 50 --include-adversarial

Expected Output

If all AIGP spec properties hold, you should see:
governance_coverage: 100% (σ=0) | hash_integrity: 100% (σ=0) | all_runs_passed: true

Path to Standards-Grade Evidence#

The current validation program proves the AIGP spec's cryptographic design through deterministic simulation. Two additional phases are planned to elevate this evidence to standards-grade (e.g., SOC 2, ISO 42001, or regulatory submission):

Live-Path Validation Cohort

Planned

Run the identical scenario matrix against the live production pipeline. Compare dry-run artifacts to live-path artifacts event-by-event. This proves that the governance properties hold not just in simulation but through the actual production data path.

Live Tenant Isolation Testing

Planned

Execute cross-tenant penetration tests against live row-level security, data store tenant filters, and API-layer tenant context validation. Current isolation evidence is simulation-scoped (UUID5 namespacing in event generation). This phase proves isolation under adversarial query conditions.

Standards Alignment

Completing Phases A–B would provide evidence suitable for SOC 2 Type II (governance control effectiveness), ISO/IEC 42001 (AI management system), and sector-specific regulatory submissions (HIPAA, PSD2, 21 CFR Part 11). Each phase produces independently verifiable artifacts that extend the current evidence pack.

AIGP v0.12Monte Carlo ValidationAll Runs Passed2026-03-28 01:13:58 UTC

AI Governance Proof (AIGP)
Operational Validation via Monte Carlo Replay

500 Monte Carlo runs Multi-tenant isolation Confidence intervals 500 deterministic seeds

Overview#

The evidence pack analyzes 500 validation runs with deterministic seeds, checking full cryptographic lineage from event hash chains through Merkle root and signature verification.

Adversarial (500 seeds)

Deny 25.00%

Allow 75.00%

Throughput 531 eps ± 16 eps

Total Volume

18,954

traces across 500 runs

131,029 events verified

Cryptographic Integrity

100.00%

Hash chain: 100.00% (σ = 0.0)

JWS ES256: 100.00% (σ = 0.0)

Global Verdict

PASS

All 500 runs

Coverage & integrity deterministic

Key Findings#

The Monte Carlo extension confirms 4 key properties of the AIGP spec, as implemented by AgentGP:

Governance Coverage = 100.00%

Confirmed

Governance coverage remained 100.00% in every analyzed run (σ = 0.0) — a structural invariant of the AIGP spec, not a seed-dependent outcome.

Hash Chain Integrity = 100.00%

Confirmed

Hash-chain integrity remained 100.00% with zero chain failures observed across all Monte Carlo runs and all seeds.

JWS ES256 Signatures = 100.00%

Confirmed

Merkle Root Verification = 100.00%

Confirmed

Validation Pipeline#

Validation and Evidence Pipeline

System Model#

AgentGP — AIGP Governance Execution Model

AIGP vs AgentGP

Proof Timeline, Merkle Tree, and Agent Topology#

The following visuals show the proof timeline anatomy, Merkle root construction, and the multi-agent orchestration topology that emits the final governance proof.

AIGP Merkle Root Computation

AIGP Merkle Root

e9a7e2d850f4cb23...

SHA-256(L0 + L1)

SHA-256(L2 + L3)

Policy Hash

a1b2c3d4...

Prompt Hash

852fc381...

Dynamic Prompt Hash

c3d4e5f6...

Tool Hash

80cf336d...

Leaf = SHA-256("aigp-leaf-v1:" + resource_type + ":" + resource_name + ":" + content)

✓ Verified: recomputed root matches stored aigp_hash

Fig. 2. Merkle root construction from governance resource hashes (policy, prompt, dynamic prompt, tool). The binary tree uses pairwise SHA-256 with domain-separated leaf hashes.

Multi-Agent Topology Emitting Governance Proof

Monte Carlo Matrix#

The Monte Carlo validation matrix executes the following runs, varying seeds, trace volumes, and scenario distributions:

Validation matrix (executed)bash

# Baseline + stress
# AIGP v0.12 Monte Carlo batch — 500 independent seeds, adversarial scenarios
python3 scripts/validation/run_monte_carlo_batch.py --runs 500 --traces 50 --parallel 12

# Aggregate results into briefing.json and monte_carlo_summary.json
python3 scripts/validation/aggregate_monte_carlo.py

# Single-seed verification (for quick validation)
python3 scripts/validation/verify_crypto.py --from-driver --seed 42 --traces 50 --include-adversarial

Core Metrics#

The following table summarizes results across all Monte Carlo runs, with 95% confidence intervals computed from the distribution of per-run metrics:

Cohort	n	Deny Rate (mean ±95% CI)	Allow Rate (mean ±95% CI)	Gen. Speed (mean ±95% CI)
Adversarial (500 seeds)	500	25.00% ± 0.00%	75.00% ± 0.00%	531 eps ± 16 eps

500

Monte Carlo Runs

independent seeds

100.00%

Governance Coverage

σ = 0.0 across all runs

100.00%

Hash Integrity

σ = 0.0 across all runs

PASS

Global Verdict

all runs

Monte Carlo Invariance Proof — 500 Independent Seeds

Property	n	Mean	Min	Max	Verdict
Governance Coverage	500	100.00%	100.00%	100.00%	PASS
SHA-256 Hash Chain	500	100.00%	100.00%	100.00%	PASS
Merkle Root Verification	500	100.00%	100.00%	100.00%	PASS
JWS ES256 Signatures	500	100.00%	100.00%	100.00%	PASS
Adversarial Deny Rate	500	25.00%	25.00%	25.00%	PASS

Fig. 3. Structural invariance proof: all 500 seeds produce identical governance and cryptographic results (σ = 0), confirming these are design guarantees of the AIGP spec.

Harness Generation Throughput — 500 Seeds

In-memory event generation speed (not production pipeline latency). This is the only metric with non-zero variance — expected, as it depends on CPU scheduling.

531

Mean (eps)

±16

95% CI

180

120

Min (eps)

1044

Max (eps)

Fig. 4. Throughput distribution across 500 seeds. CPU-scheduling variance produces a range of 120–1044 eps while governance properties remain invariant.

Tenant Isolation#

Cross-tenant behavior remains bounded and consistent by scenario design. No cross-tenant leakage is observed in generated artifacts; isolation validation is included in every run.

Namespace Isolation

Each tenant receives UUID5-namespaced event IDs, trace IDs, and agent IDs. No ID collision is possible across tenants.

Policy Enforcement Boundary

OPA/Rego policies are tenant-scoped. A tenant's adversarial scenario cannot influence another tenant's deny/allow decisions.

Cryptographic Binding

Hash chains are per-trace, per-tenant. Parent hash references cannot leak across tenant boundaries.

Scale Behavior#

Scale Caveat

Limits & Assumptions#

Limitations

The following limitations apply to this operational validation. Readers evaluating these claims for regulatory or compliance purposes should review each item carefully.

Simulation-Derived, Not Production Telemetry

Monte Carlo Artifact Version

Same-Codebase Generation and Verification

Merkle Verification Scope

External Validity

Production Key Lifecycle

Signing keys are ephemeral (generated per-session). Production key lifecycle, JWKS rotation, certificate chain verification, and HSM integration should be independently audited.

Reproducibility#

Every result in this white paper can be independently reproduced. The exact evidence artifacts backing this page are available under the validation artifacts directory.

Key artifact filestext

scripts/validation/artifacts/
  mc_adv_<seed>/              # Per-seed directories (mc_adv_1 ... mc_adv_500)
    validation_report.json    # Validation verdict and per-claim results
    crypto_appendix.json      # Hash chain, Merkle, and signature verification
  monte_carlo_summary.json    # Aggregate summary across all seeds
public/whitepaper-o/
  briefing.json               # Compact briefing consumed by this page

Running the Monte Carlo validation

Monte Carlo replaybash

# Run the full Monte Carlo suite — 500 independent seeds, 12 parallel workers
python3 scripts/validation/run_monte_carlo_batch.py --runs 500 --traces 50 --parallel 12

# Aggregate all per-seed results into monte_carlo_summary.json and briefing.json
python3 scripts/validation/aggregate_monte_carlo.py

# Single-seed quick verification
python3 scripts/validation/verify_crypto.py --from-driver --seed 42 --traces 50 --include-adversarial

Expected Output

If all AIGP spec properties hold, you should see:
governance_coverage: 100% (σ=0) | hash_integrity: 100% (σ=0) | all_runs_passed: true

Path to Standards-Grade Evidence#

Live-Path Validation Cohort

Planned

Live Tenant Isolation Testing

Planned

Standards Alignment