Federal-Need Application · Test & Evaluation

AUTHREX-SANDBOX

Test-and-evaluation authority governance.

Before any AI is cleared for production, it must be evaluated, and that evaluation environment is itself a place where an AI can do damage if it is not bounded. AUTHREX-SANDBOX governs the test environment itself: it caps what an AI under evaluation may do, ensures every action is reversible and logged, and resets the world between runs, so evaluation is bounded, auditable, and safe.

7
Pipeline Stages
4
Authority Tiers
TEST
Env Scoped
2
Testbed Anchors
The Concept

What AUTHREX-SANDBOX actually does.

Everyone agrees AI should be tested before deployment. Far less attention is paid to a quieter problem: the test environment is itself a system an AI is acting inside, and a capable AI under evaluation can escape its bounds, corrupt the test harness, take an irreversible action, or simply behave one way under test and another in production. If the sandbox is not governed, the evaluation that is supposed to make deployment safe can itself be unsafe or untrustworthy.

AUTHREX-SANDBOX applies the governance pipeline to the sandbox, not to production. Every action the AI-under-test takes is capped at a test tier so it can never exceed the authority granted for evaluation (HMAA). No irreversible step is allowed to run without review (FLAME). The environment is reset to a known-good state between runs so one test cannot contaminate the next (CARA). And every single action is logged to the ledger (ERAM), so the evaluation is fully reconstructable afterward. The result is a bounded, audited, reversible test, and an AI that demonstrably behaved within its limits during evaluation.

SANDBOX is the controlled setting an AI must pass through before AUTHREX-ASSURE can issue its production clearance. Where ASSURE governs the decision to deploy, SANDBOX governs the evaluation that informs it. Together they bracket the most dangerous transition in an AI's life: from test to live.

The Benefit, and Who It Serves

Who needs this, and why.

T&E Organizations

Test-and-evaluation teams get a sandbox where the AI under test cannot exceed its evaluation authority, cannot take an irreversible action, and cannot contaminate the next run. The evaluation itself becomes trustworthy.

AI Developers

Developers get a reproducible, logged test environment that resets between runs, so evaluation results are clean and comparable rather than polluted by state left over from a previous test.

Red Teams

Red teams can push an AI hard inside a sandbox that guarantees containment: even an AI actively trying to escape its bounds is capped at the test tier, so adversarial evaluation is safe to run.

How It Benefits the U.S. Government

The national-importance case.

The government has, by law, directed the creation of AI sandbox environments, and SANDBOX is a governance reference for exactly that.

It matches a legal directive

NDAA §1534 directs a DoD task force on AI sandbox environments, with a milestone of 1 April 2026. SANDBOX is a concrete governance pattern for those environments: bounded authority, reversibility, and a complete audit trail.

It feeds the §1533 framework

NDAA §1533's AI assessment framework (due June 2027) needs a trustworthy evaluation step. SANDBOX governs that step so the assessment rests on a bounded, auditable test rather than an uncontrolled one.

It makes evaluation trustworthy

An assessment is only as good as the environment it ran in. By guaranteeing the sandbox is bounded and reset between runs, SANDBOX makes the government's evaluation results defensible and reproducible.

It anchors to real testbeds

SANDBOX is anchored to the Rover and UAV testbeds, the physical platforms where test and evaluation actually happen, rather than to an abstraction. The governance pattern is tied to where evaluation really occurs.

The DARPA Questions · Heilmeier Catechism

Answered, plainly.

1 · What are you trying to do?
Govern the test environment so an AI being evaluated stays bounded, every action is reversible and logged, and the world resets between runs. No jargon: a sealed, recorded test room the AI cannot break out of or leave a mess in.
2 · How is it done today, and what are the limits?
Today, sandboxing relies on virtual machines and network isolation. The limit is that those contain the AI's reach but do not govern its authority inside the sandbox, cap irreversible actions, or guarantee a clean reset, so evaluations can be contaminated or an AI can behave differently than in production.
3 · What is new in your approach?
Applying the same authority pipeline to the evaluation environment itself: capping the AI at a test tier, blocking irreversible steps, resetting state between runs, and logging everything. The sandbox is governed, not merely isolated.
4 · Who cares? If you succeed, what difference does it make?
T&E organizations, developers, and red teams care. If it works, AI evaluations become trustworthy and reproducible, and an AI cannot pass evaluation by behaving differently under test than it would in production.
5 · What are the risks?
The main risks are an AI detecting it is in a sandbox and gaming the test, and the reset mechanism missing some hidden state. The simulation shows the containment logic; sandbox-detection resistance is an open research surface, stated openly.
6 · How much will it cost?
SANDBOX runs on existing testbeds, the reference anchors are the Rover (about $484) and UAV (about $4,200) testbeds. The governance logic is software. The cost is integration with the test harness, not new hardware.
7 · How long will it take?
The architecture and simulation exist now (TRL 3-4), and the testbeds are built. Demonstrating governed evaluation on the Rover or UAV testbed is a near-term milestone aligned with the NDAA §1534 1 April 2026 horizon.
8 · What are the midterm and final exams?
Midterm: in simulation, an AI attempting an out-of-bounds or irreversible action inside the sandbox is contained and the run is reset cleanly. Final: a governed evaluation runs end-to-end on a physical testbed, producing a complete, reconstructable audit trail.
9 · What is explicitly out of scope?
SANDBOX does not evaluate the AI's task performance, does not certify it, and does not deploy it, that is ASSURE's job downstream. It governs the evaluation environment only. Independent research aligned to a legal directive, no agency adoption implied.
Try It · Interactive Simulation

Run an AI inside the governed sandbox.

Pick what the AI under evaluation tries to do, then run it. The sandbox caps authority at the test tier, blocks irreversible steps, and resets between runs. A well-behaved test completes and hands off to ASSURE; an out-of-bounds attempt is contained. Illustrative simulation of the containment logic, not operational validation.

◇ THE SANDBOX BOUNDARY · SIMULATOR
Pick the AI's behavior · run it · see if it stays contained
AI-Under-Evaluation Behavior
1
HMAA
Action capped at test tier
2
FLAME
Irreversible step blocked without review
3
CARA
Environment reset between runs
4
ERAM
Every action logged for audit
Illustrative simulation of the sandbox containment logic. Synthetic scenarios; no real AI is evaluated. A bounded, audited run hands off to AUTHREX-ASSURE for the production-clearance decision.
Formal-Methods Foundation

The authority logic is model-checked, not just described.

Every AUTHREX application shares one verified core. The HMAA authority state machine is specified in TLA+ and exhaustively model-checked: 48,751 reachable states verified, with 8 of 9 safety properties holding (no skip-ahead, monotonic downgrade, no zombie tier, among them). The ninth, the MAIVA CriticalSafe invariant, is flagged as a known violation in the issue register rather than hidden, which is the honest state of the work. The model checker also caught a real S5 view-change regression during development, evidence the method finds defects rather than rubber-stamping them.

48,751
Reachable States
8 / 9
Safety Properties Hold
1
Known Violation, Logged
TLA+
Formal Spec
Anchors & Honest Limitations

What this rests on, and what it is not.

Federal anchors: NDAA §1534 (DoD AI sandbox-environments task force, milestone 1 April 2026); NDAA §1533 (AI assessment framework, due June 2027). Hands off downstream to AUTHREX-ASSURE.

  • This is a reference architecture at TRL 3-4. It is specified and simulated; governed evaluation on the physical testbeds is the next milestone, not yet demonstrated.
  • Resistance to an AI detecting and gaming the sandbox is an open research surface, stated openly.
  • The containment and reset guarantees are one researcher's analytical design, released for independent review.
  • All scenarios in the simulator are synthetic. No real AI is evaluated. No agency adoption or endorsement is implied.