Site Reliability Engineers
Incident response, system debugging, and production troubleshooting. The scenarios they'd actually face on-call.
Real terminal-based incident simulations. See how candidates actually troubleshoot before you trust them on-call.
Respect your candidates' time - and your engineers' too.
Use our ready-made scenarios or let us build custom assessments for your stack.
Pick from our ready-made scenarios (GPU debugging, server performance, Kubernetes) or tell us your stack and we'll build custom assessments.
Share a link. Candidates enter their details and drop straight into a live terminal. No downloads, no accounts, no friction.
See exactly how they debug: time to resolution, commands used, investigation path. Your hiring manager gets a scored report without reviewing a single line of output.
The tools your team needs to assess real engineering skills
Full Linux VMs via the browser, or connect through our CLI for chaos room sessions. Not a sandbox. A real system to debug.
Automatic timing from first command to incident resolution. Compare candidates against your team's benchmarks or against each other.
Real SOPs like your team uses. Track whether candidates follow procedures independently or need guidance, and how much.
Paste events, tab switches, and timing patterns are captured and surfaced in the report. Your team decides what matters.
Every command and keystroke recorded with timestamps. Replay the entire session or export the full log for review.
Azure networking, K8s cascading failures, GPU driver conflicts, and more. Match the scenario to the role you're hiring for.
Four products on one incident engine. Each one feeds the next.
One branded link per role. Drop it into your ATS, job spec, or recruiter outreach. Candidates self-serve. You get a scored shortlist.
See Screening LinksRole-matched incident scenarios with configurable difficulty. Send a link, get a scored report with full evidence and session replay.
See AssessmentsTwo engineers in the same live incident. Tests coordination, communication, and leadership under pressure. Per-engineer replay.
See Chaos ModeRun your engineers through practice incidents. Onboarding exercises, on-call readiness checks, and team calibration before you hire.
See Team Drillsnpm install -g @parium.ai/cli
Real logs, real configs, real system state. The tools are what your team already uses: dmesg journalctl kubectl nvidia-smi. Health check endpoints validate the fix.
═══════════════════════════════════════════════ INCIDENT ALERT - SEV 1 ═══════════════════════════════════════════════ INCIDENT ID: INC-2026-0315-LB503 SEVERITY: Critical - Production AFFECTED: edge-api.parium.internal IMPACT: $8,200/hr revenue at risk ─────────────────────────────────────────────── Production Edge API is returning HTTP 503 through the Azure Load Balancer. The VM appears to be running, but zero backend health probes succeed. Active escalations: 3 customer tickets Executive visibility: Yes - CTO notified ═══════════════════════════════════════════════ YOUR TASK ═══════════════════════════════════════════════ 1. Investigate why the LB returns 503 2. Identify all root causes (there may be more than one) 3. Apply fixes using approved remediation tools 4. Verify health check returns 200 OK
═══════════════════════════════════════════════ INCIDENT ALERT - SEV 1 ═══════════════════════════════════════════════ INCIDENT ID: INC-2026-WAR-ROOM SEVERITY: Critical - Cascading CLUSTER: prod-us-east-1 (18 nodes) IMPACT: $15K/hr → escalating ─────────────────────────────────────────────── api-gateway pods are in CrashLoopBackOff. Customer-facing traffic is failing. SLA budget is burning. This incident has executive visibility. WARNING: This incident will escalate. Each fix you apply may reveal the next failure. Prioritise methodically. SLA budget remaining: 47 minutes Oncall team: Platform Engineering Incident room: Active - you are IC ═══════════════════════════════════════════════ YOUR TASK ═══════════════════════════════════════════════ 1. Restore api-gateway service availability 2. Investigate and resolve cascading failures 3. Validate cluster health at each phase 4. Maintain SLA budget - time matters
═══════════════════════════════════════════════ INCIDENT ALERT - SEV 2 ═══════════════════════════════════════════════ INCIDENT ID: INC-2026-0119-GPU SEVERITY: High - Production ML AFFECTED: gpu-node-01.neocloud.internal IMPACT: $4,200/hr compute waste ─────────────────────────────────────────────── GPU compute jobs are failing on gpu-node-01. The node has 2x NVIDIA A100 80GB GPUs but only 7 of 8 devices are detected by monitoring. Queued jobs: 3 LLM fine-tuning runs Last healthy: 08:00 UTC today Kernel log: Xid 79 - GPU fallen off bus ═══════════════════════════════════════════════ YOUR TASK ═══════════════════════════════════════════════ 1. Investigate why nvidia-smi shows fewer GPUs 2. Identify the root cause (driver vs hardware) 3. Restore GPU functionality if possible 4. Escalate to hardware team if necessary
Scenarios matched to every role on your team
Incident response, system debugging, and production troubleshooting. The scenarios they'd actually face on-call.
Configuration errors, container failures, API gateway issues, and log-driven debugging.
Hardware diagnostics, bare metal troubleshooting, GPU driver issues, and knowing when to escalate.
Runaway processes, disk issues, service recovery, and the fundamentals that senior hires still get wrong.
The Parium CLI connects you directly to shared incident sessions from your own terminal. No browser, no context switching. Just parium open and you're in.
$ npm install -g @parium.ai/cli@latest
$ parium open █▀█ ▄▀█ █▀█ █ █ █ █▀▄▀█ █▀▀ █▀█ █▀▄ █ █▄█ █ ▀ █ Chaos Terminal Client v0.1.0-alpha.2 Paste handoff token: •••••••••••• ✓ Token validated ✓ Session resolved - k8s-chaos-war-room ⟳ Attaching to terminal... ────────────────────────────────────── SESSION K8s Cascading Failure STATUS ● LIVE PHASE 3 of 6 - DNS network policy IMPACT $120K/hr ────────────────────────────────────── candidate@prod-worker-07:~$ █
No unfamiliar IDEs. No artificial puzzles. Just a terminal and a real incident - the environment they work in every day.
How Parium works, what your team sees, and what candidates experience.
Candidates connect to a real, isolated Linux environment - not a browser simulation or multiple-choice sandbox. Each assessment spins up a fresh system with the incident pre-configured. They get full terminal access with real bash, real logs, and real system tools. It's the same experience as SSH'ing into a production server.
Parium is built for any role that requires hands-on Linux troubleshooting: Site Reliability Engineers (SRE), DevOps Engineers, Platform Engineers, Data Center Technicians, Linux System Administrators, Cloud Engineers, and Infrastructure Engineers. Our scenarios range from L1 support tasks (config errors, disk space) to L4 senior-level incidents (GPU driver conflicts, kernel modules, PCIe issues).
We monitor for patterns that suggest external help - things like leaving the terminal for extended periods, large paste events, and unusual command timing. Suspicious activity gets flagged in the hiring manager report with enough context for you to make an informed judgment. We can't catch everything, but the patterns are usually pretty obvious.
When the candidate clicks "Verify Fix," we run a health check against the scenario's success criteria (e.g., curl the API endpoint, check nvidia-smi output). If it passes, we record their time-to-resolution. The hiring manager gets a report with every command, timestamps, hints used, suspicious activity flags, and an analysis of how the candidate approached the problem.
HackerRank, Codility, and similar platforms test algorithmic coding in sandboxed editors. Parium tests operational skills in real Linux environments. Your SRE candidates don't need to reverse a linked list - they need to figure out why nginx won't start or why the GPU driver isn't loading. We measure how they investigate, not whether they memorised the answer.
Yes. We can build scenarios that mirror your actual production environment - your monitoring tools, your deployment setup, your common failure modes. Whether it's Kubernetes on EKS, GPU clusters with SLURM, or legacy systems with custom daemons, we'll create assessments that test exactly what your team deals with day-to-day. Get in touch to discuss.
Beyond pass/fail, we give you session replay - watch exactly how candidates approached the problem. You'll see every command they ran, when they pasted content (and what they pasted), when they switched tabs, how long they were away, and when they used hints. It's like watching over their shoulder, but asynchronously. You see how they think, not just whether they got the answer.
Every candidate gets the same scenario, the same environment, the same success criteria. No more "it depends on who reviewed it."
No variation between candidates. Everyone faces the same incident with the same tools available.
Clear pass/fail based on whether the fix works, not on how well someone writes a README or formats their code.
Time-to-resolution, commands used, hints requested. Compare candidates on the metrics that matter.
Whether you need a custom scenario for your stack, want to discuss enterprise pricing, or just have questions, we'd love to hear from you.
See real incident performance before you hire.