Linux Process States

R, S, D, Z — what the letters in ps and top actually mean.

At a Glance

Two fields, one story — a task's state lives in task_struct->__state (runtime) and task_struct->exit_state (post-do_exit). Tools like ps collapse both into a single letter.
The letters — R running/runnable, S interruptible sleep, D uninterruptible sleep, I idle, T stopped, t traced, Z zombie, X dead. Plus flag suffixes: < high-prio, N low-prio, s session leader, l multi-threaded, + foreground process group.
R is "runnable," not "on CPU" — it means "on a runqueue." A task shown as R may not be executing right now; it's just eligible. ps reports the scheduler's view, not the CPU's.
S vs D — both are blocked. S wakes on a signal or event; D ignores signals entirely and waits on a specific completion (usually a storage driver or NFS server). D is why kill -9 sometimes "doesn't work."
K (TASK_KILLABLE) — the sane cousin of D: same "don't wake on SIGTERM" guarantee, but SIGKILL does rouse it. Used by NFS and FUSE so stuck mounts don't become unkillable. ps still prints D.
I (TASK_IDLE) — added in 4.2. Behaves like D for signals but doesn't count toward load average. Kernel threads waiting idly for work (NFS nfsd, XFS workers) use it to stop spuriously inflating uptime.
Load average counts R + D — this is why a box with nothing on-CPU but a stuck NFS client can report load of 50. The number is "runnable + uninterruptibly sleeping," not "CPU busy."
Z is not a leak — a zombie holds only its task_struct and exit code. It's waiting for the parent to wait(). The file descriptors, memory, and threads are already gone. The problem is PID-space pressure, not RAM.
Inspect via /proc — /proc/PID/status gives the letter and name; /proc/PID/stat is the one-line machine-readable form; /proc/PID/wchan names the kernel function the task is sleeping in; /proc/PID/stack dumps the in-kernel stack.

The Model

What a Linux kernel actually stores, and why tooling sometimes disagrees.

struct task_struct {
    ...
    unsigned int            __state;       /* TASK_RUNNING, TASK_INTERRUPTIBLE, ... */
    unsigned int            exit_state;    /* EXIT_ZOMBIE, EXIT_DEAD */
    ...
};

/* include/linux/sched.h */
#define TASK_RUNNING            0x00000000
#define TASK_INTERRUPTIBLE      0x00000001
#define TASK_UNINTERRUPTIBLE    0x00000002
#define __TASK_STOPPED          0x00000004
#define __TASK_TRACED           0x00000008
#define TASK_DEAD               0x00000080
#define TASK_WAKEKILL           0x00000100  /* combined with D to form "killable" */
#define TASK_NOLOAD             0x00000400  /* combined with D to form TASK_IDLE   */

#define TASK_KILLABLE   (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
#define TASK_IDLE       (TASK_NOLOAD   | TASK_UNINTERRUPTIBLE)

#define EXIT_ZOMBIE             0x00000020
#define EXIT_DEAD               0x00000010

Runtime vs exit — while the task is alive, __state is authoritative. Once do_exit() runs, the task is moved to exit_state and won't be scheduled again.
Composable flags — KILLABLE and IDLE are not separate states; they're UNINTERRUPTIBLE plus a flag that changes wake-up and accounting behaviour. That's why ps still prints D for both.
ps codes are not kernel names — ps translates kernel constants into single letters (and sometimes flattens distinctions). For the raw truth, read /proc/PID/status → State: line.

The States

Every letter you'll see, its kernel-side name, and what it means for scheduling, signals, and load.

Letter	Kernel constant	Meaning	Wakes on	Load avg?
`R`	`TASK_RUNNING`	On a CPU or on a runqueue waiting for one.	already runnable	Yes
`S`	`TASK_INTERRUPTIBLE`	Sleeping; most idle processes live here (`read` on a socket, `epoll_wait`, `futex`).	event or any unblocked signal	No
`D`	`TASK_UNINTERRUPTIBLE`	Blocked on a specific completion, typically disk I/O or a driver. Signals are ignored.	only the awaited event	Yes
`D` (K)	`TASK_KILLABLE`	`D` plus: SIGKILL can wake it. Used by NFS, FUSE, and anything that used to trap users in `D` forever.	awaited event or SIGKILL	Yes
`I`	`TASK_IDLE`	`D` plus `NOLOAD`: excluded from load-average calculation. Used by kernel worker threads.	awaited event	No
`T`	`TASK_STOPPED`	Paused by SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU. Resumes on SIGCONT.	SIGCONT	No
`t`	`TASK_TRACED`	Stopped by a tracer (`ptrace`, `gdb`, `strace`) at a syscall, signal, or breakpoint.	tracer `PTRACE_CONT`	No
`Z`	`EXIT_ZOMBIE`	Exited; `task_struct` kept around so the parent can read exit status.	parent `wait()`	No
`X`	`EXIT_DEAD` / `TASK_DEAD`	Being reaped; the `task_struct` is on its way out. Rarely seen by tooling.	—	No

State Diagram

The common transitions a userspace task goes through.

stateDiagram-v2 [*] --> R: fork / clone R --> S: blocking syscall (interruptible) R --> D: blocking I/O (uninterruptible) S --> R: event or signal D --> R: I/O completes R --> T: SIGSTOP / SIGTSTP T --> R: SIGCONT R --> t: ptrace stop t --> R: PTRACE_CONT R --> Z: exit / killed S --> Z: killed Z --> [*]: parent wait()

R — Running or Runnable

R means "the scheduler would pick this task if given the chance," not "this task is executing now."

Runqueue membership — each CPU has a struct rq with red-black trees (for CFS/EEVDF) or FIFO lists (for SCHED_FIFO/SCHED_RR). A task in any of these is TASK_RUNNING.
On-CPU is a subset — only one task per CPU is actually on-CPU at any moment. top's running count usually matches the number of CPUs minus idle; ps just shows every R regardless.
No timeout — CFS/EEVDF preempts by virtual runtime, not a fixed timeslice. A task can stay R indefinitely while being periodically descheduled; it never transitions to S unless it blocks.

S vs D — The Two Sleeps

Both are off-CPU and blocked. The difference is what it takes to wake them.

	S — Interruptible	D — Uninterruptible
Kernel helper	`wait_event_interruptible()`	`wait_event()` / `io_schedule()`
Wakes on signal?	Yes — syscall returns `-EINTR`	No — signal stays pending
Typical callers	sockets, pipes, futex, epoll, sleep(), wait()	block-layer I/O, page fault on disk, NFS, direct disk read
Counts toward load	No	Yes
Can `kill -9`?	Yes	No — only SIGKILL + TASK_KILLABLE (see below)
Why it exists	Most things. Default for well-written drivers.	The task holds a kernel-allocated resource (buffer, lock, reference) that a signal handler cannot safely release mid-flight.

TASK_KILLABLE — The Fix for Stuck D

Added in 2.6.25 specifically to escape "D forever on NFS."

The problem: an NFS server goes away while a client task is blocked on read(2). Inside the kernel, the task is in wait_event() with TASK_UNINTERRUPTIBLE. A SIGTERM (or even SIGKILL) cannot wake it because the wait is uninterruptible. The task is stuck for eternity, the PID leaks, the mount is unkillable.

The fix: wait_event_killable(), which sets TASK_UNINTERRUPTIBLE | TASK_WAKEKILL. SIGKILL (and only SIGKILL) wakes it; every other signal is still ignored. Callers preserve the "don't return -EINTR from a random syscall" guarantee while still letting the admin reap a truly stuck process.

ps reports D for both plain TASK_UNINTERRUPTIBLE and TASK_KILLABLE; you can't tell them apart from userspace without reading /proc/PID/stack and recognising the wait function.

TASK_IDLE — Why Your Load Average Doesn't Spike

A late addition (4.2) to stop kernel threads from inflating load.

Before 4.2, kernel worker threads like nfsd, loop*, and various XFS workers used TASK_UNINTERRUPTIBLE while waiting for work. That meant an idle file server with 16 NFS threads reported a load average of 16. "Load average" in Linux is runnable + uninterruptibly-sleeping, a heritage from when D was a rare, short-lived state.

TASK_IDLE = TASK_UNINTERRUPTIBLE | TASK_NOLOAD. The NOLOAD flag excludes the task from the load-average tick. Signal behaviour is unchanged: still uninterruptible, still not killable. New code should use wait_event_idle() / schedule_timeout_idle() for "I'm a kernel thread waiting patiently for work."

Z — Zombie

The task is dead. The PID is not yet freed.

What's left — just task_struct and exit info (exit code, resource usage from rusage, signal that killed it). Memory, open FDs, signal handlers, and threads are all already freed in do_exit().
Why it exists — so the parent can call wait() / waitpid() / waitid() and read the exit status. Without this, a fast-exiting child could disappear before the parent has a chance to look.
How it's reaped — parent calls one of the wait family, or ignores SIGCHLD with SA_NOCLDWAIT, or sets SIGCHLD to SIG_IGN. Either way the kernel transitions the zombie to EXIT_DEAD and releases the task_struct.
Orphan handling — if the parent dies first, the child is reparented (to the nearest PR_SET_CHILD_SUBREAPER ancestor, or PID 1). That new parent is responsible for reaping. This is why init / systemd quietly reaps an unending stream of zombies.
When it's a bug — a long-running parent that forks children and never wait()s. PIDs accumulate, the PID-space fills (/proc/sys/kernel/pid_max, default 4M), and eventually fork() starts failing with EAGAIN.

T and t — Stopped and Traced

Two related states with different causes.

	T (TASK_STOPPED)	t (TASK_TRACED)
Cause	Job-control signal: SIGSTOP, SIGTSTP (`^Z`), SIGTTIN, SIGTTOU	Tracer attached (`ptrace`): stopped at a syscall boundary, signal, or breakpoint
Resume	SIGCONT	Tracer issues `PTRACE_CONT`, `PTRACE_SYSCALL`, etc.
Who can resume	anyone who can signal it	only the tracer
Typical tools	shell job control (`fg`, `bg`)	`gdb`, `strace`, `ltrace`, `perf` uprobes
ptrace interaction	a `T` task can still be attached by a tracer	cannot be signalled through normal `kill` except SIGKILL

Load Average — What Actually Counts

The number is not "CPU busy %." It's nr_running + nr_uninterruptible, averaged with three exponential decays (1, 5, 15 minutes).

R contributes — every task on any CPU's runqueue at the sampling tick (every 5 seconds).
D contributes — every task in TASK_UNINTERRUPTIBLE that isn't flagged NOLOAD.
S does not contribute — ordinary sleeping tasks are invisible to load.
Why D is in there — the metric predates TASK_IDLE and was originally meant to capture "work queued up." Heavy disk I/O is work even if no CPU is busy, so Linux counts it.
Consequence — on a system with a failed SAN or a slow NFS server, load can be huge while CPU is 100% idle. Cross-reference with mpstat, iostat, or vmstat before concluding you need more CPU.

How to Inspect

Every answer ultimately comes from /proc/PID.

Source	Gives you	Example
`/proc/PID/status`	Human-readable; `State: S (sleeping)`	`grep State /proc/1234/status`
`/proc/PID/stat`	One-line, tool-parseable; state letter is field 3	`awk '{print $3}' /proc/1234/stat`
`/proc/PID/wchan`	Kernel function the task is sleeping in	`cat /proc/1234/wchan` → `futex_wait_queue`
`/proc/PID/stack`	Full in-kernel stack (needs `CONFIG_STACKTRACE`)	`cat /proc/1234/stack`
`ps -eo pid,stat,wchan,cmd`	State + flags + sleep point in one line	see all `D` tasks: `ps -eo stat,pid,cmd \| awk '$1 ~ /^D/'`
`top` / `htop`	Live view; column `S` is the state letter	press `t` in htop to toggle task tree
`bpftrace` / `perf sched`	Tracks transitions (`sched_switch`, `sched_wakeup`) with nanosecond timestamps	`bpftrace -e 'tracepoint:sched:sched_switch { @[args->prev_state] = count(); }'`

ps State Suffix Flags

The state letter is often followed by one or more flag characters.

Suffix	Meaning
`<`	High-priority (negative nice)
`N`	Low-priority (positive nice)
`L`	Has pages locked into memory (`mlock`)
`s`	Session leader
`l`	Multi-threaded (uses `CLONE_THREAD`)
`+`	In the foreground process group of its tty

So Ssl+ = interruptible sleep, session leader, multi-threaded, foreground. A very typical shell-launched server.

Gotchas

"I sent SIGKILL and it's still there." — task is in plain D (not killable). Check /proc/PID/wchan: if it's a network FS function, the server is gone. Your options are waiting for the server to come back or rebooting.
"Load is 80 but CPU is idle." — count your D tasks (ps -eo stat | grep -c '^D'). If that roughly matches load, it's disk/NFS, not CPU.
"Zombies piling up." — parent isn't reaping. Options: fix the parent to call wait(); set SIGCHLD to SIG_IGN; use SA_NOCLDWAIT; or make a subreaper via prctl(PR_SET_CHILD_SUBREAPER) so something else reaps.
"ps shows R but top shows 0% CPU." — R is "runnable." If it's not getting scheduled, it's waiting behind other runnable tasks or pinned off a CPU that's saturated. Look at /proc/PID/schedstat for run-delay.
"Traced process won't respond to kill." — a t task can only be resumed by its tracer. If the tracer crashed without detaching, the tracee is stuck. kill -9 still works; ordinary signals are held until detach.
"Process state changes mid-read of /proc." — /proc/PID files are snapshots taken at read time, not atomic. Between two reads you can see R → S → R. For accurate cross-field views, read a single file once and parse it.

References

ps(1) — see the PROCESS STATE CODES section for the full letter inventory.
proc(5) — fields of /proc/PID/stat, /proc/PID/status, /proc/loadavg.
include/linux/sched.h — authoritative state definitions.
LWN: TASK_KILLABLE — the original article introducing killable uninterruptible sleep.
LWN: TASK_IDLE — Peter Zijlstra's patch stopping kernel threads from inflating load.
Brendan Gregg: Linux Load Averages — archaeology of why D is in the load metric.
wait(2) — how zombies get reaped.