How is this different from just sharing a screen?

Chaos Mode gives both engineers full terminal access to the same live system, with individual tracking and reporting, rather than one person driving while others watch.

Can we use this for internal training, not just hiring?

Yes. Chaos Mode is a strong fit for internal training, on-call readiness checks, and pair-based incident practice, not only hiring.

How many engineers can join a room?

It is currently optimised for two engineers so the collaboration signal stays clear.

What scenarios work best for Chaos Mode?

Scenarios with multiple root causes or branching failure paths work best because they reward coordination and splitting investigation threads.

What does the manager actually see in the report?

Managers see per-engineer command history, timestamps, presence data, AI analysis, and a full session replay of the collaboration.

Chaos Multi-engineer incidents

Production incidents aren't solved alone

Two engineers enter the same live incident. One broken system. Real-time coordination under pressure. Practice the communication, task splitting, and shared situational awareness that make on-call rotations work.

War Rooms · Pair Debugging · Team Training · Final Interviews

See CLI

War Room Active

Live

Engineer A - IC Lead

engineer-1

Engineer B - Responder

engineer-2

shared terminal

engineer-a$ kubectl describe pod api-gateway -n edge
Observation: readiness failing after dependency timeout

engineer-b$ kubectl get svc,endpoints -n edge
Thread: endpoints missing for auth backend

engineer-a$ kubectl describe networkpolicy -n edge
Decision: isolate policy drift before restart

Both engineers see the same system.
The report shows who did what.

Why Chaos Mode

Solo tests miss the most important signal

Real outages are team events. The best SRE in the world is useless if they can't coordinate with another engineer under pressure.

Solo assessment

Tests individual knowledge, not team behaviour
No signal on communication or delegation
Can't observe leadership under pressure
Doesn't reflect how production incidents actually work

With Chaos Mode

See how engineers split the problem and coordinate
Who narrates findings vs. who stays silent
Who drives the room vs. who adds noise
Who verifies before declaring "fixed"

How It Works

Lobby, readiness, incident, review

The room model mirrors real incident management, not a chat room with a shared terminal bolted on.

Create a room

Pick a scenario: Kubernetes cascading failure, Azure networking, GPU diagnostics. The system provisions a shared container and generates paired handoff tokens for Engineer A and Engineer B.

Both engineers join

Each engineer enters the lobby with their token. Presence indicators show who's connected. Engineer A starts the session when both are ready. It's a deliberate readiness gate, not an accidental race.

Debug together, review separately

Both engineers work the same live system. The platform tracks each engineer's commands independently. After the session, managers get per-engineer replays showing decision patterns, not just commands.

What It Reveals

The signals you can't get from a solo test

Chaos Mode exposes collaboration quality, the one dimension most technical assessments completely ignore.

Coordination Quality

Do they split the problem sensibly? One engineer narrows blast radius while the other validates dependencies. Or do they both run the same commands?

Leadership Under Pressure

Who takes ownership of the incident? Who proposes a plan, assigns threads, and keeps the room focused? Or does nobody step up?

Decision Discipline

Do they verify changes before moving on? Do they test the user path, not just the symptom? Or do they declare victory at the first green light?

Communication Clarity

How well do they narrate what they're finding? Can Engineer B understand what Engineer A discovered without asking? Silence is data too.

Delegation Patterns

"You take the network policy, I'll trace the service mesh." Clean delegation under pressure is a signal you cannot get from a multiple-choice quiz.

Recovery Sequencing

In cascading failures, the order of fixes matters. Do they understand dependency chains? Or do they fix the loudest symptom first?

Use Cases

Four ways teams use Chaos Mode

Final-stage interviews

Put your top two candidates in the same incident room. See who actually leads under pressure instead of who interviews better. One 20-minute session replaces hours of panel interviews.

Kubernetes adoption

Your team just migrated to Kubernetes. Instead of hoping they'll learn from runbooks, throw them into a cascading pod failure together. They'll learn faster under pressure, and a colleague is there to catch mistakes.

Onboarding war games

New SRE joins the team. Pair them with a senior engineer in a Chaos Mode session. The senior watches methodology, the new hire builds confidence, and managers get a real read on readiness.

Team readiness checks

Before the next change freeze or on-call rotation, run your team through a shared incident. Same pressure, none of the customer impact.

K8s Cascading Failure

Incidents that evolve

Our most advanced scenario doesn't end when you fix the first problem. Each resolution triggers the next hidden failure, just like production.

PHASE 01 Pod crash-loop api-gateway pods in CrashLoopBackOff. Diagnose and fix the liveness probe. $15K/hr

PHASE 02 Node goes down Worker node flips to NotReady. SSH in, diagnose kubelet, restore the node. $45K/hr

PHASE 03 DNS breaks Network policy blocks DNS cluster-wide. Services can't resolve each other. $120K/hr

PHASE 04 Memory surge Backed-up traffic floods recovered services. OOMKilled pods everywhere. $180K/hr

PHASE 05 Etcd split-brain Clock skew on control-plane-02 causes etcd leader election instability. $350K/hr

PHASE 06 Drain storm Autoscaler panic triggers aggressive cordon and drain across the fleet. $500K/hr

Each fix triggers the next failure

Works With

CLI + Chaos Mode is the strongest combination

Both engineers can join through the browser or the CLI. Each person picks whichever interface they work fastest in, and both connect to the same shared container over the same WebSocket. Commands, presence, and replay all work identically regardless of surface.

Browser and CLI attach to the same room container
Either engineer can use either interface, or switch mid-session
Per-engineer command replay works across both surfaces
Use Team Drills for structured internal training programs

chaos room lobby

Chaos Mode

  PARIUM / war room

  SCENARIO  K8s Cascading Failure
  ROOM      chaos-room-42
  STATUS    Waiting for start

  ────────────────────────────────
  ● Engineer A  connected (you)
  ● Engineer B  connected
  ────────────────────────────────

  Press S to start the session
  Press Tab to cycle themes
  Press Ctrl+C to leave

FAQ

Common questions about Chaos Mode

Screen sharing lets one person drive while others watch. Chaos Mode gives both engineers full terminal access to the same live system. Both can run commands, both get tracked independently, and the report shows exactly who contributed what. It's the difference between watching someone cook and both being in the kitchen.

Yes, and it's one of the best fits. Pair a new SRE with a senior engineer. Run your platform team through a Kubernetes failure before the next migration. Use it for on-call readiness checks. Same incident engine, different purpose.

Currently optimised for two engineers, labelled Engineer A and Engineer B. This keeps the signal clean: you can clearly see who led, who investigated, and who verified. Two is enough to expose collaboration patterns without the noise of a large group.

Scenarios with multiple root causes or branching failure paths. The Kubernetes cascading failure (6 phases) is designed for exactly this. It rewards engineers who split threads and coordinate. Simple single-fix scenarios work better as solo assessments.

Per-engineer command history with timestamps. Who ran what, when, and in what order. Presence data: connection times, disconnects. AI analysis of each engineer's approach. And the full session replay, so you can watch the collaboration unfold like a recording.