Where do you find the logs of a container that has already crashed and restarted?

Correct answer: kubectl logs POD --previous. `kubectl logs POD --previous` prints the logs of the previous, terminated container instance — where the actual crash and stack trace live. Plain `kubectl logs` shows the new container, which has not failed yet.

A container is reported as terminated with exit code 137. What usually caused it?

Correct answer: It was OOMKilled (ran out of memory). Exit code 137 = 128 + 9 (SIGKILL). In Kubernetes it almost always means the container exceeded its memory limit and was OOMKilled. Raise the memory limit or fix the leak.

What can cause CrashLoopBackOff even when the application code is perfectly fine?

Correct answer: A failing liveness probe repeatedly killing the container. A misconfigured liveness probe (wrong path, port, or too-short initialDelaySeconds) makes the kubelet kill a healthy container over and over, producing a crash loop that has nothing to do with your code.

CrashLoopBackOff is a symptom. What is the right first diagnostic command?

Correct answer: kubectl describe pod (read Events and Last State). `kubectl describe pod` shows the Events and the container Last State (reason + exit code), which point you to the real cause before you touch anything.

/user/kayd @ devops :~$ cat fix-kubernetes-crashloopbackoff.md

Fix Kubernetes CrashLoopBackOff: Causes and Solutions Fix Kubernetes CrashLoopBackOff: Causes and Solutions

Karandeep Singh

Jun 18, 2026 • 5 minutes

Summary

A practical playbook for Kubernetes CrashLoopBackOff — the diagnosis workflow (describe, previous logs, events) plus fixes for the six causes you will actually hit, including OOMKilled, failing liveness probes, and missing config.

CrashLoopBackOff is the error every Kubernetes user meets sooner or later. A pod starts, the container dies, Kubernetes restarts it, it dies again — and the kubelet backs off, waiting longer between each restart (10s, 20s, 40s, up to 5 minutes). The status itself is not the problem; it is a symptom. This guide gives you a repeatable workflow to find the real cause of Kubernetes CrashLoopBackOff and fix it.

The mistake most people make is staring at the CrashLoopBackOff status and guessing. Don’t guess. The cluster already recorded exactly why the container died — you just need the right three commands to read it.

Fix Kubernetes CrashLoopBackOff — diagnosis workflow and common causes

The CrashLoopBackOff Diagnosis Workflow

Whatever the underlying cause, you start the same way every time. These three commands resolve the vast majority of cases (the full list is in the kubectl cheat sheet):

kubectl get pods                       # confirm the status and restart count
kubectl describe pod <pod-name>        # read Events and 'Last State' (reason + exit code)
kubectl logs <pod-name> --previous     # the crashed container's actual output

    graph TD
  A[Pod in CrashLoopBackOff] --> B[kubectl describe pod]
  B --> C{Last State reason?}
  C -->|OOMKilled / exit 137| D[Raise memory limit / fix leak]
  C -->|Error / exit 1| E[kubectl logs --previous]
  C -->|Liveness probe failed| F[Fix probe path, port, delay]
  C -->|ImagePullBackOff first| G[Fix image name / registry auth]
  E --> H{What does the log say?}
  H -->|Stack trace / panic| I[Fix app bug]
  H -->|Missing env / cannot connect| J[Fix ConfigMap / Secret / dependency]

kubectl logs --previous is the single most important command here. While a pod is crash-looping, the current container has barely started, so kubectl logs is often empty. The --previous flag shows the container that actually died — that is where the stack trace lives.

Cause 1: The Application Crashes on Startup

The most common cause is the simplest: your code throws on boot. An unhandled exception, a panic, a failed migration. The --previous logs show it directly:

kubectl logs <pod-name> --previous
# e.g. panic: runtime error: invalid memory address or nil pointer dereference

Fix: it is an application bug, not a Kubernetes problem. Reproduce locally with the same image and arguments, fix the code, rebuild, and redeploy.

Cause 2: Missing or Wrong Configuration

A container that runs fine locally often crashes in-cluster because an environment variable, ConfigMap, or Secret is missing or wrong — think a DATABASE_URL that points nowhere. The logs usually say something like connection refused or required env var not set.

kubectl describe pod <pod-name> | grep -A10 Environment
kubectl get configmap,secret -n <namespace>

Fix: confirm the ConfigMap/Secret exists in the same namespace and that the keys match what the app expects. A typo’d key name is a classic culprit.

Cause 3: A Failing Liveness Probe

This one fools people because the app is healthy. If a liveness probe checks the wrong path or port, or its initialDelaySeconds is too short for a slow-starting app, the kubelet decides the container is unhealthy and kills it — forever.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 20   # give the app time to boot before probing
  periodSeconds: 10

Fix: verify the probe path and port actually respond, and raise initialDelaySeconds for apps that need warm-up time. kubectl describe pod will show Liveness probe failed in Events when this is the cause.

Cause 4: OOMKilled (Exit Code 137)

If kubectl describe pod shows Last State: Terminated, Reason: OOMKilled or exit code 137, the container exceeded its memory limit and was killed.

kubectl describe pod <pod-name> | grep -i -A2 "last state"

Fix: raise the memory limit, or fix the leak that makes the app exceed it. Set realistic requests and limits:

resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "256Mi"

Setting the memory limit too low is a self-inflicted CrashLoopBackOff. If an app legitimately needs 300Mi and you cap it at 128Mi, it will be OOMKilled on every startup no matter how clean the code is.

Cause 5: Wrong Command or Entrypoint

Exit code 127 means “command not found” and 126 means “not executable.” These point to a bad command/args override or a binary that isn’t where the container expects it.

Fix: check the command and args in your manifest against the image’s real entrypoint. Run the image locally with docker run to confirm the command works before deploying.

Cause 6: A Dependency Isn’t Ready

Sometimes the app is correct but starts before its database or an upstream API is reachable, crashes, and loops. The fix is to make startup resilient rather than fragile.

Add retry-with-backoff logic to the app’s startup connection.
Use an init container to wait for the dependency before the main container starts.
Use a readiness probe so traffic only arrives once the app is truly ready.

initContainers:
- name: wait-for-db
  image: busybox:1.36
  command: ['sh', '-c', 'until nc -z db 5432; do echo waiting for db; sleep 2; done']

The main container will not start until the db service accepts connections on port 5432.

A Reusable Mental Model

CrashLoopBackOff almost always reduces to one of two questions: did the container die because of something inside it (app bug, OOM, bad command) or something around it (config, probes, dependencies)? kubectl describe pod plus kubectl logs --previous answers that in under a minute.

If you are still learning the objects involved — pods, probes, and the controllers that restart them — start with Kubernetes fundamentals: pods, deployments, and services.

Question

What is the strangest root cause of a CrashLoopBackOff you have had to track down?

References and Further Reading

Kubernetes Authors. Debug Running Pods. Kubernetes Documentation.
Kubernetes Authors. Configure Liveness, Readiness and Startup Probes. Kubernetes Documentation.
Kubernetes Authors. Pod Lifecycle. Kubernetes Documentation.
Kubernetes Authors. Assign Memory Resources to Containers. Kubernetes Documentation.

More from devops

Cloning an Aurora PostgreSQL Cluster (Copy-on-Write)

Create a copy-on-write Aurora clone in minutes, prove it is isolated from the source, and learn when …

Running SQL Over HTTPS with the RDS Data API (Aurora PostgreSQL)

Enable the RDS Data API on Aurora PostgreSQL and run SQL over HTTPS with no persistent connection, …

Aurora Write Forwarding: Writing Through a Read Replica

Let a read replica accept writes with Aurora write forwarding. Hands-on local write forwarding in …

Docker Compose, Bake & ECR: Build and Ship Apps

Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …

Kubernetes on AWS: EKS Setup with eksctl

Set up a Kubernetes cluster on AWS EKS with eksctl: prerequisites, one-command cluster creation, …

Kubernetes Fundamentals: Pods, Deployments, Services

Learn Kubernetes fundamentals hands-on: deploy your first pod, understand Deployments and …

Knowledge Quiz

Test your general knowledge with this quick quiz!

A set of multiple-choice questions to test your knowledge.

Take as much time as you need.

Your score will be shown at the end.

Question 1 of 5

Score: 0

Quiz Complete!

Your score: 0 out of 5

Loading next question...

The CrashLoopBackOff Diagnosis Workflow

Cause 1: The Application Crashes on Startup

Cause 2: Missing or Wrong Configuration

Cause 3: A Failing Liveness Probe

Cause 4: OOMKilled (Exit Code 137)

Cause 5: Wrong Command or Entrypoint

Cause 6: A Dependency Isn’t Ready

A Reusable Mental Model

References and Further Reading

Similar Articles

Related Content

More from devops

You Might Also Like

Knowledge Quiz

Question 1 of 5

Quiz Complete!