Why We Treated Our AI Orchestrator Like Terraform

We built a coding agent. Then we realized we built it wrong. We realized that AI Agents aren’t “Chatbots”, they are “Infrastructure State Machines”.

The Failure of Chat-Based Coding

When we started building Cabin Crew, we did what everyone else was doing: we built a chat interface for AI coding.

The workflow looked like this:

User opens a chat window
User types: “Add user authentication to the app”
AI generates code and pastes it into the chat
User copies the code and pastes it into their editor
User runs tests, finds bugs, goes back to chat
Repeat until it works

This felt natural. It’s how we interact with ChatGPT, Claude, and every other LLM interface.

But it was fundamentally broken for production use.

Problem 1: No Audit Trail

The conversation was ephemeral. If the AI generated buggy code, we had no record of:

What prompt led to that code
What context the AI had access to
What alternative solutions it considered
Why it chose this specific implementation

The chat log was just text. It wasn’t structured, versioned, or cryptographically signed.

Problem 2: Manual Copy-Paste

The human was the integration layer. The AI generated code, but the human had to:

Copy it from the chat
Paste it into the right file
Resolve merge conflicts
Commit it to git
Open a PR

This introduced errors. A typo during copy-paste could break the entire application.

Problem 3: No Rollback

If the AI-generated code broke production, how do you roll back? You can’t just “undo” a chat conversation. You have to manually revert the changes, but you don’t have a clean diff of what the AI changed.

Problem 4: No Policy Enforcement

The AI could generate anything. There was no way to enforce:

“Don’t commit secrets”
“Don’t destroy the database”
“Don’t refactor authentication logic without approval”

The human was supposed to catch these issues during review. But humans are slow, and AI is fast.

The Terraform Epiphany

One day, we were debugging a failed deployment, and someone said:

“This feels like running terraform apply without running terraform plan first.”

That’s when it clicked.

AI Agents aren’t chatbots. They’re infrastructure state machines.

Think about how Terraform works:

You declare the desired state (in .tf files)
Terraform generates a plan (what will change)
You review the plan
You approve the plan
Terraform applies the plan (makes the changes)

This is a two-phase commit:

Phase 1 (Plan): Calculate what needs to change, but don’t change anything
Phase 2 (Apply): Execute the changes, but only if the plan was approved

We realized: This is exactly what AI coding needs.

Adopting the Plan / Apply Model

We redesigned Cabin Crew to follow Terraform’s architecture:

Phase 1: Flight Plan (Read-Only)

The AI Agent runs in safe mode. It:

Reads the codebase
Analyzes the issue
Generates code diffs
Produces a Terraform-like “plan” of what will change

But it executes no side effects. It doesn’t:

Write to files
Commit to git
Deploy to production
Call external APIs

The output is a structured artifact:

{
  "plan_id": "plan-12345",
  "issue": "Add user authentication",
  "changes": [
    {
      "file": "src/auth.ts",
      "action": "create",
      "diff": "...",
      "hash": "sha256:a1b2c3..."
    },
    {
      "file": "src/routes.ts",
      "action": "modify",
      "diff": "...",
      "hash": "sha256:d4e5f6..."
    }
  ],
  "dependencies": ["bcrypt", "jsonwebtoken"],
  "tests": ["auth.test.ts"]
}

This is the Flight Plan—a declaration of intent.

Phase 2: Pre-Flight Check (Governance)

The Orchestrator pauses execution. It feeds the plan into a Policy Engine (OPA):

package code_review

# Check for secrets
deny[msg] {
  input.changes[_].diff contains "API_KEY"
  msg := "Code contains potential secret"
}

# Check for destructive changes
deny[msg] {
  input.changes[_].file == "database/schema.sql"
  input.changes[_].action == "delete"
  msg := "Cannot delete database schema without approval"
}

# Check for new dependencies
deny[msg] {
  new_dep := input.dependencies[_]
  not new_dep in data.approved_packages
  msg := sprintf("Unapproved dependency: %s", [new_dep])
}

If any policy fails, the workflow halts. The plan is rejected.

Phase 3: Take-off (Execute)

Only if the plan passes all policies, the Agent re-runs in write mode. It receives:

The approved plan ID
A sealed “State Token” (cryptographic proof of approval)
The exact diffs that were approved

The Agent then:

Writes the files
Commits to git
Opens a PR
Runs tests

But it can only execute the exact plan that was approved. If it tries to make additional changes, the State Token validation fails.

Separating the Generator from the Executor

This led to a key architectural decision: separate the Generator (Dev Engine) from the Executor (Git Engine).

Dev Engine (The Generator)

Responsible for:

Reading the codebase
Analyzing the issue
Generating code diffs
Producing the Flight Plan

This is a pure function. Given the same input, it should produce the same output (as much as an LLM can).

Git Engine (The Executor)

Responsible for:

Writing files to disk
Committing changes
Pushing to GitHub
Opening PRs

This is a side-effecting function. It modifies the external world.

By separating these, we gain:

Testability: We can test the Dev Engine without touching git
Auditability: We can log what the Dev Engine wanted to do, even if we block it
Replayability: We can re-run the Dev Engine with the same inputs to debug issues

The Architecture of Our Open Core Platform

This led to the current Cabin Crew architecture:

Architecture of Cabin Crew Open Core Platform

The Orchestrator

The brain. It:

Coordinates engines
Enforces policies
Signs audit logs
Manages state tokens

This is the trust layer. It’s commercial (BSL 1.1) because enterprises need guarantees that governance is enforced.

The Engines

The workers. They:

Generate code (Dev Engine)
Plan infrastructure (Infra Engine)
Execute git operations (Git Engine)
Run tests (Test Engine)

These are commodities. We want the community to build better, faster engines. They’re Apache 2.0 licensed.

Why This Matters

This architecture gives us:

1. Separation of Concerns

The Dev Engine doesn’t need to know about git. The Git Engine doesn’t need to know about LLMs. The Orchestrator doesn’t need to know about either.

Each component has a single responsibility.

2. Pluggability

Don’t like our Dev Engine? Build your own. As long as it speaks the Cabin Crew Protocol (JSON over stdin/stdout), the Orchestrator can use it.

Want to use a different LLM? Swap out the model. The protocol doesn’t care.

3. Auditability

Every step is logged:

What the Dev Engine generated (the Flight Plan)
What the Policy Engine decided (pass/fail)
What the Git Engine executed (the actual changes)

And it’s all cryptographically signed.

4. Governance

Policies are enforced before execution. The AI can’t “accidentally” commit secrets or delete the database. The policy blocks it.

The Shift in Mindset

Building Cabin Crew taught us that AI coding isn’t about conversation. It’s about orchestration.

You don’t “chat” with Terraform. You declare state, review plans, and apply changes.

AI Agents should work the same way.

Stop building chatbots. Start building state machines.

Interested in the architecture? Check out The Orchestrator or read the Cabin Crew Protocol.

Why We Treated Our AI Orchestrator Like Terraform