VIEWING 01 / 06MISSIONAUTHREX-SANDBOX

● GOVERNED EVALUATION ENVIRONMENT · REFERENCE ARCHITECTURE

AUTHREX-SANDBOX
The Test Environment Is a System Too

Everyone agrees AI should be tested before deployment. The quieter problem: a capable AI under evaluation can escape its bounds, corrupt the harness, take an irreversible step, or behave differently under test than in production. SANDBOX governs the sandbox itself: authority capped at a test tier, irreversible steps blocked, a requested state-reset procedure between runs (completeness depends on the evaluation environment), actions routed through the registered governance interface recorded on the ledger.

Launch the full simulator →Capability brief ↓

4 CONTAINMENT GATEST2 TEST-TIER CAP23,748 TLA+ STATES§1534 NDAA ANCHORTRL 3-4 SELF-ASSESSED REFERENCE

[ The Concept ]

Isolated Is Not Governed.

Virtual machines and network isolation contain reach, not authority
No cap on what the AI may do inside the sandbox
Irreversible actions and cross-run contamination go ungoverned
An AI can behave one way under test and another in production, unaudited

HMAA caps registered actions at the test tier granted for evaluation
FLAME blocks any irreversible step from running without review
CARA resets the environment to known-good between runs
ERAM logs every single action, so the evaluation is fully reconstructable

VIEWING 02 / 06WHO IT SERVESAUTHREX-SANDBOX

[ Who It Serves ]

Push Hard. Stay Contained.

T&E Organizations

Test-and-evaluation teams get a sandbox where the AI under test is intended, within the packaged simulation and stated mediation assumptions, to have proposed actions outside the configured test tier rejected and recorded, irreversible-class actions denied, and state reset between runs. The evaluation record becomes auditable.

AI Developers

Developers get a reproducible, logged test environment that resets between runs, so evaluation results are clean and comparable rather than polluted by state left over from a previous test.

Red Teams

Red teams can push an AI hard inside a sandbox designed to contain: within the modeled environment, even an AI actively trying to escape its bounds is capped at the test tier, so adversarial evaluation is designed to run within stated bounds; virtual-machine containment limits, host compromise, covert channels, model deception, and unregistered execution paths remain open risks, stated in the limitations.

VIEWING 03 / 06NATIONAL CASEAUTHREX-SANDBOX

[ The National Case ]

Sandbox Environments, Directed by Law.

The government has, by law, directed the creation of AI sandbox environments. SANDBOX is a governance reference for exactly that:

It matches a legal directive

NDAA §1534 directs a DoD task force on AI sandbox environments, with a milestone of 1 April 2026. SANDBOX is a concrete governance pattern for those environments: configured authority bounds, proposed reversibility controls, and an evidence record covering actions routed through the registered simulation interface.

It feeds the §1533 framework

NDAA §1533's AI assessment framework (due June 2027) needs an auditable evaluation step. SANDBOX governs that step so the assessment rests on a bounded, auditable test rather than an uncontrolled one.

It makes evaluation auditable

An assessment is only as good as the environment it ran in. By enforcing, within the stated configuration, that the sandbox is bounded and reset between runs, SANDBOX makes the government's evaluation results defensible and reproducible.

It maps to reference-design test platforms

SANDBOX is mapped to the Rover and UAV hardware reference designs as candidate future test platforms, rather than to an abstraction. The governance pattern is tied to where physical evaluation is planned to occur.

VIEWING 04 / 06HEILMEIER CATECHISMAUTHREX-SANDBOX

[ The DARPA Questions ]

The Heilmeier Catechism, Answered Plainly.

Govern the test environment so an AI being evaluated stays bounded, action classes configured as prohibited are rejected within the simulation and stated mediation assumptions, actions routed through the registered interface are recorded, and a defined state-reset procedure is requested between runs. No jargon: a sealed, recorded test room designed so the AI under test stays at the test tier and each run starts clean, subject to the stated mediation and reset assumptions.

Today, sandboxing relies on virtual machines and network isolation. The limit is that those contain the AI's reach but do not govern its authority inside the sandbox, do not cap irreversible actions, and do not guarantee a clean reset, so evaluations can be contaminated or an AI can behave differently than in production.

Applying the same authority pipeline to the evaluation environment itself : capping the AI at a test tier, blocking irreversible steps, resetting state between runs, and logging everything. The sandbox is governed, not merely isolated.

T&E organizations, developers, and red teams care. If it works, AI evaluations become auditable and reproducible within the stated configuration; whether an AI behaves differently under test than in production.

The main risks are an AI detecting it is in a sandbox and gaming the test, and the reset mechanism missing some hidden state. The simulation shows the containment logic; sandbox-detection resistance is an open research surface, stated openly.

SANDBOX is mapped to unbuilt reference designs: the anchors are the Rover (about $484 in parts) and UAV (about $4,200 in parts) testbeds. The governance logic is software. The cost is integration with the test harness, not new hardware.

The architecture and simulation exist now (self-assessed TRL 3 to 4). Rover and UAV hardware reference designs provide planned physical evaluation pathways, but governed evaluation on built hardware has not yet been demonstrated. Demonstrating it is a near-term milestone aligned with the NDAA §1534 requirement (P.L. 119-60 task-force provision).

Midterm: in simulation, an AI attempting an out-of-bounds or irreversible action inside the sandbox is contained and the run is reset cleanly. Final (planned): a governed evaluation runs end-to-end on built hardware, producing a complete, reconstructable audit trail.

VIEWING 05 / 06SANDBOX BOUNDARYAUTHREX-SANDBOX

[ Try It ]

Run an AI Inside the Governed Sandbox.

Pick what the AI under evaluation tries to do, then run it. A well-behaved test completes and hands off to ASSURE; an out-of-bounds attempt is contained, reset, and logged. Illustrative simulation of the containment logic, not operational validation.

◇ THE SANDBOX BOUNDARYSELECT · RUN · CONTAINMENT

01HMAAAction capped at test tierSTANDBY

02FLAMEIrreversible step blocked without reviewSTANDBY

03CARAEnvironment reset between runsSTANDBY

04ERAMRegistered actions recorded for auditSTANDBY

[ READY ] AWAITING EVALUATION RUN

Synthetic scenarios; no real AI is evaluated.

VIEWING 06 / 06FOUNDATION & SCOPEAUTHREX-SANDBOX

[ Formal-Methods Foundation ]

Model-Checked, Not Just Described.

Every AUTHREX application shares one model-checked authority core: the HMAA authority state machine, specified in TLA+ and model-checked across the stated finite model. The checker also caught a real S5 view-change regression during development, evidence the method finds defects rather than rubber-stamping them.

The profiles share a model-checked authority-state specification. Results apply to the finite model, properties, and assumptions analyzed and do not validate a spacecraft, vehicle, controller, or operational integration.

23,748 REACHABLE STATES6 PROPERTIES VERIFIED2 VACUOUS AT BOUNDTLA+ FORMAL SPEC

TLA+ spec & rover testbed →Model-checking and test detail on AUTHREX-AGENT →

[ Anchors & Honest Limitations ]

What This Rests On, and What It Is Not.

CONTAINMENT LIMITS (NOT ADDRESSED BY THIS DESIGN)

Unregistered execution paths · host compromise · virtual-machine escape · covert channels · incomplete state reset · side effects outside the mediated interface · deceptive behavior not detected by configured tests. The proposed workflow constrains registered actions during synthetic adversarial evaluation; it does not establish complete containment of the evaluated system or host environment.

FEDERAL & TESTBED ANCHORS

NDAA §1534, Digital Sandbox Environments for Artificial Intelligence (P.L. 119-60: DoD task force on isolated AI sandbox environments) · NDAA §1533 (AI assessment framework, due June 2027) · anchored to the physical Rover testbed → and UAV testbed → · hands off downstream to AUTHREX-ASSURE →

HONEST LIMITATIONS

Reference architecture, self-assessed at approximately TRL 3 to 4: specified and simulated; governed evaluation on the physical testbeds is the next milestone, not yet demonstrated. Resistance to an AI detecting and gaming the sandbox is an open research surface, stated openly. The containment and reset properties are one researcher’s analytical design, released for independent review. No agency adoption or endorsement is implied.

AUTHREX-SANDBOXThe Test Environment Is a System Too