Data centre roles need a different kind of assessment. Your technicians work with storage controllers, IPMI, BMC interfaces, RAID arrays, and operational runbooks. They need to follow procedures safely, understand escalation points, and make the right call when a drive fails or a power event occurs. Parium has scenarios built specifically for this environment.
Data centre operations is as much about process discipline as technical skill. Our scenarios test both: can the candidate use the right tools and follow the right procedures under pressure?
Each scenario includes an operational runbook or telemetry view. Junior candidates get guided procedures. Senior candidates get the incident brief and drive the investigation independently.
A storage system has gone read-only. Applications can read data but writes are failing. The candidate investigates the storage controller state, RAID health, drive SMART data, and filesystem mount options to identify the cause and restore write access without data loss.
A RAID array is running in degraded mode. One drive has failed and another is showing predictive failure indicators. The candidate identifies the failed and failing drives, assesses rebuild risk, follows the replacement procedure, and initiates rebuild while the system is live.
Environmental sensors have triggered alerts. Inlet temperatures are rising across a row of racks. The candidate reviews IPMI sensor data, identifies the affected zone, checks CRAC unit status, and determines whether to start migrating workloads or wait for facilities to respond.
A PDU has lost a phase. Half the servers in a rack have lost redundant power. UPS is holding but runtime is limited. The candidate must assess the impact, identify which servers need immediate attention (single-corded vs dual-corded), and execute the emergency power procedure safely.
A remote server is unresponsive. SSH is down but the BMC is reachable. The candidate uses IPMI tools to check power state, review SEL logs, inspect sensor readings, and determine whether a graceful reboot, a hard reset, or a console session is needed. Tests remote hands decision-making.
A GPU in a DC rack has reported Xid 79 errors. The candidate investigates using the DC runbook, checks dmesg, nvidia-smi, and PCIe state, and must decide whether to attempt a driver reset, drain the node, or escalate for hardware RMA. Tests the boundary between operational fix and hardware escalation.
Every data centre has its own procedures, ticketing systems, escalation paths, and hardware configurations. If your team uses specific runbooks, specific hardware vendors (Dell, HPE, Supermicro, Lenovo), or specific monitoring tools (Nagios, Zabbix, Prometheus, Datadog), we can build scenarios that match your operational environment.