/user/kayd @ devops :~$ cat fix-kubernetes-crashloopbackoff.md

Fix Kubernetes CrashLoopBackOff: Causes and Solutions Fix Kubernetes CrashLoopBackOff: Causes and Solutions

QR Code linking to: Fix Kubernetes CrashLoopBackOff: Causes and Solutions
Karandeep Singh
Karandeep Singh
• 5 minutes

Summary

A practical playbook for Kubernetes CrashLoopBackOff — the diagnosis workflow (describe, previous logs, events) plus fixes for the six causes you will actually hit, including OOMKilled, failing liveness probes, and missing config.

CrashLoopBackOff is the error every Kubernetes user meets sooner or later. A pod starts, the container dies, Kubernetes restarts it, it dies again — and the kubelet backs off, waiting longer between each restart (10s, 20s, 40s, up to 5 minutes). The status itself is not the problem; it is a symptom. This guide gives you a repeatable workflow to find the real cause of Kubernetes CrashLoopBackOff and fix it.

The mistake most people make is staring at the CrashLoopBackOff status and guessing. Don’t guess. The cluster already recorded exactly why the container died — you just need the right three commands to read it.

Fix Kubernetes CrashLoopBackOff — diagnosis workflow and common causes

The CrashLoopBackOff Diagnosis Workflow

Whatever the underlying cause, you start the same way every time. These three commands resolve the vast majority of cases (the full list is in the kubectl cheat sheet):

kubectl get pods                       # confirm the status and restart count
kubectl describe pod <pod-name>        # read Events and 'Last State' (reason + exit code)
kubectl logs <pod-name> --previous     # the crashed container's actual output
    graph TD
  A[Pod in CrashLoopBackOff] --> B[kubectl describe pod]
  B --> C{Last State reason?}
  C -->|OOMKilled / exit 137| D[Raise memory limit / fix leak]
  C -->|Error / exit 1| E[kubectl logs --previous]
  C -->|Liveness probe failed| F[Fix probe path, port, delay]
  C -->|ImagePullBackOff first| G[Fix image name / registry auth]
  E --> H{What does the log say?}
  H -->|Stack trace / panic| I[Fix app bug]
  H -->|Missing env / cannot connect| J[Fix ConfigMap / Secret / dependency]
  

Cause 1: The Application Crashes on Startup

The most common cause is the simplest: your code throws on boot. An unhandled exception, a panic, a failed migration. The --previous logs show it directly:

kubectl logs <pod-name> --previous
# e.g. panic: runtime error: invalid memory address or nil pointer dereference

Fix: it is an application bug, not a Kubernetes problem. Reproduce locally with the same image and arguments, fix the code, rebuild, and redeploy.

Cause 2: Missing or Wrong Configuration

A container that runs fine locally often crashes in-cluster because an environment variable, ConfigMap, or Secret is missing or wrong — think a DATABASE_URL that points nowhere. The logs usually say something like connection refused or required env var not set.

kubectl describe pod <pod-name> | grep -A10 Environment
kubectl get configmap,secret -n <namespace>

Fix: confirm the ConfigMap/Secret exists in the same namespace and that the keys match what the app expects. A typo’d key name is a classic culprit.

Cause 3: A Failing Liveness Probe

This one fools people because the app is healthy. If a liveness probe checks the wrong path or port, or its initialDelaySeconds is too short for a slow-starting app, the kubelet decides the container is unhealthy and kills it — forever.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 20   # give the app time to boot before probing
  periodSeconds: 10

Fix: verify the probe path and port actually respond, and raise initialDelaySeconds for apps that need warm-up time. kubectl describe pod will show Liveness probe failed in Events when this is the cause.

Cause 4: OOMKilled (Exit Code 137)

If kubectl describe pod shows Last State: Terminated, Reason: OOMKilled or exit code 137, the container exceeded its memory limit and was killed.

kubectl describe pod <pod-name> | grep -i -A2 "last state"

Fix: raise the memory limit, or fix the leak that makes the app exceed it. Set realistic requests and limits:

resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "256Mi"

Cause 5: Wrong Command or Entrypoint

Exit code 127 means “command not found” and 126 means “not executable.” These point to a bad command/args override or a binary that isn’t where the container expects it.

Fix: check the command and args in your manifest against the image’s real entrypoint. Run the image locally with docker run to confirm the command works before deploying.

Cause 6: A Dependency Isn’t Ready

Sometimes the app is correct but starts before its database or an upstream API is reachable, crashes, and loops. The fix is to make startup resilient rather than fragile.

  • Add retry-with-backoff logic to the app’s startup connection.
  • Use an init container to wait for the dependency before the main container starts.
  • Use a readiness probe so traffic only arrives once the app is truly ready.
```yaml initContainers: - name: wait-for-db image: busybox:1.36 command: ['sh', '-c', 'until nc -z db 5432; do echo waiting for db; sleep 2; done'] ``` The main container will not start until the `db` service accepts connections on port 5432.

A Reusable Mental Model

CrashLoopBackOff almost always reduces to one of two questions: did the container die because of something inside it (app bug, OOM, bad command) or something around it (config, probes, dependencies)? kubectl describe pod plus kubectl logs --previous answers that in under a minute.

If you are still learning the objects involved — pods, probes, and the controllers that restart them — start with Kubernetes fundamentals: pods, deployments, and services.

Question

What is the strangest root cause of a CrashLoopBackOff you have had to track down?

References and Further Reading

Similar Articles

More from devops

No related topic suggestions found.

Knowledge Quiz

Test your general knowledge with this quick quiz!

A set of multiple-choice questions to test your knowledge.

Take as much time as you need.

Your score will be shown at the end.