Parium Kubernetes assessments put candidates into real cluster environments with multi-service failures, networking issues, and resource pressure. The kind of problems where reading the events section is just the starting point, and the fix requires understanding how the control plane, CNI, DNS, and your workloads interact.
These are the tools and concepts that matter during a real Kubernetes incident, not during a certification exam.
Each scenario replicates a real K8s incident inside a live cluster. Candidates get kubectl access and must diagnose and resolve the issue.
External traffic is returning 502/504 errors. The ingress controller is running but routes are broken. The candidate debugs annotations, backend service selectors, TLS termination, and endpoint readiness to restore traffic flow. Tests real-world debugging across the ingress, service, and pod layers.
Services can't resolve each other by name. Pods are up, but inter-service communication is broken. The candidate investigates CoreDNS pods, configmaps, upstream resolvers, and kube-dns service endpoints to find and fix the DNS breakdown.
Pods are being evicted from nodes with memory or disk pressure. New pods can't be scheduled. The candidate identifies which resources are exhausted, understands the eviction priority order, and takes action to stabilise the node without losing critical workloads.
A single misconfiguration has caused a chain of failures across multiple services. Pods are crashing, DNS is intermittent, and the HPA is fighting the eviction manager. The candidate must triage, identify the root cause (not just the symptoms), and restore the cluster in the right order.
A StatefulSet-backed database is unhealthy. PVCs are bound but the application can't write. The candidate investigates the storage driver, PV reclaim policy, filesystem permissions, and node affinity to understand why the stateful workload has stalled.
A microservices application has degraded. Some API endpoints work, others timeout or return errors. The issue spans multiple deployments, services, and config maps. The candidate must trace requests across services, read logs from multiple pods, and identify a configuration drift that broke the dependency chain.
Running a specific cluster architecture? Service mesh (Istio, Linkerd)? Custom operators? GitOps with ArgoCD or Flux? We can build scenarios around your exact Kubernetes setup, your deployment patterns, and the failure modes your team actually encounters.