How Linux Executes a Binary
From ./prog to the first instruction of main().
At a Glance
- One syscall —
execve(2)does the whole thing. The shellfork()s, the childexecve()s; the parent's address space is discarded, the new one is set up, execution transfers to the entry point. PID is unchanged. - Format detection — The kernel reads the first few bytes and dispatches to a
linux_binfmthandler: ELF (\x7fELF), shebang scripts (#!), or user-registered handlers via binfmt_misc. - ELF at runtime — Only the program headers matter; section headers are for the linker. The kernel
mmaps eachPT_LOADsegment at the address it requests (or a random offset if PIE). - Interpreter — If the binary has a
PT_INTERPsegment, the kernel also loads that ELF (/lib64/ld-linux-x86-64.so.2) and jumps into its entry point.ld.sodoes the real work of resolving shared libraries. - _start → main —
_startis assembly fromcrt1.o. It calls__libc_start_main(main, argc, argv, ...), which inits libc, runs.init_arrayconstructors, then callsmain(). - Address space —
.text/.rodata/.data/.bssfrom PT_LOAD, heap growing up viabrk, shared libs in the mmap region, stack growing down from near the top of userspace. ASLR randomizes all of them. - Shebang —
#!/usr/bin/env python3is handled bybinfmt_script: the kernel rewritesargvand recursivelyexecves the interpreter. - Security — setuid/setgid change the effective UID at exec time, file capabilities attach privileges without root, ASLR + NX + RELRO reduce the reliability of memory-corruption exploits.
The Full Sequence
The end-to-end path from a shell command to the first line of main() for a dynamically-linked ELF.
(ELF / script / binfmt_misc) K->>K: tear down old address space K->>K: mmap each PT_LOAD segment K->>K: mmap the PT_INTERP interpreter K->>K: build initial stack (argv, envp, auxv) K->>Ld: jump to ld.so entry (AT_ENTRY of the interpreter) Ld->>Ld: resolve DT_NEEDED, mmap each shared object Ld->>Ld: apply relocations (GOT now, PLT lazily) Ld->>Ld: run DT_INIT / .init_array of libs Ld->>P: jump to the program's _start P->>L: __libc_start_main(main, argc, argv, ...) L->>L: init TLS, run program's .init_array L->>P: call main(argc, argv, envp)
execve(2)
The one syscall that replaces the current process image. Never returns on success.
int execve(const char *pathname,
char *const argv[],
char *const envp[]); What happens during a successful execve:
| Category | Preserved | Reset / replaced |
|---|---|---|
| Address space | — | All mappings torn down; a fresh one built from the binary. |
| Process identity | PID, PPID, PGID, SID, session terminal | — |
| File descriptors | Open FDs without FD_CLOEXEC | FDs flagged O_CLOEXEC are closed. |
| Signals | Signal mask, pending signals | Handlers reset to SIG_DFL (except ignored → still ignored). |
| Credentials | Real UID/GID | Effective UID/GID set from file permission bits (setuid/setgid). |
| Memory locks, timers | — | Cleared. |
argv, envp, auxv | — | Rebuilt on the new stack for the new program. |
Binary Format Handlers
Kernel dispatches to a struct linux_binfmt based on the first bytes of the file. Each handler owns the job of setting up the new process image.
| Handler | Matches | Role |
|---|---|---|
binfmt_elf | Magic \x7fELF | Parse ELF header + program headers, mmap PT_LOAD, load PT_INTERP, build stack, jump to entry. |
binfmt_script | First two bytes #! | Parse the shebang line, rewrite argv, recursively execve the interpreter. |
binfmt_misc | User-registered magic / extension | Runs a configured interpreter for the file (e.g. java for .class, qemu-user for foreign-arch binaries). |
binfmt_flat | FLAT header | Legacy format for MMU-less systems (uClinux). |
ELF Layout
Two parallel views of an ELF file: section headers describe what the linker uses to assemble the binary; program headers describe what the kernel should put in memory. Both are valid at the same time.
$ readelf -l /bin/ls | head -20
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x67d0
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x00000040 0x00000040 0x0002d8 0x0002d8 R 0x8
INTERP 0x000318 0x00000318 0x00000318 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x00000000 0x00000000 0x003558 0x003558 R 0x1000
LOAD 0x004000 0x00004000 0x00004000 0x013e95 0x013e95 R E 0x1000
LOAD 0x018000 0x00018000 0x00018000 0x008190 0x008190 R 0x1000
LOAD 0x021010 0x00022010 0x00022010 0x001248 0x0025c8 RW 0x1000
DYNAMIC 0x021a58 0x00022a58 0x00022a58 0x000200 0x000200 RW 0x8
... | Program header | Purpose |
|---|---|
PT_LOAD | A segment to mmap into the process image. One per RWX permission combination (often 4: R, RX, R, RW). |
PT_INTERP | Path to the dynamic linker/loader (usually /lib64/ld-linux-x86-64.so.2). |
PT_DYNAMIC | Points to the DT_* table used by ld.so: DT_NEEDED, DT_SYMTAB, DT_RELA, etc. |
PT_NOTE | Build-ID, ABI info, auxiliary metadata. Used by gdb, perf, debuginfod. |
PT_GNU_STACK | Flags the stack as non-executable (standard since the mid-2000s). |
PT_GNU_RELRO | After relocations finish, ld.so mprotects this range read-only — hardens the GOT. |
PT_TLS | Thread-local storage initialization image. |
Process Address Space after exec
Post-exec layout of a typical x86-64 Linux process. Arrows show growth direction; all base offsets are randomized by ASLR.
High address (0x00007fff...)
┌──────────────────────────────┐
│ kernel space (not mapped) │
├──────────────────────────────┤
│ stack │ grows ↓ argv, envp, auxv at top
│ ↓ │
│ │
│ ... gap ... │
│ │
│ ↑ │
│ mmap region │ ld.so, shared libs, mmap() calls
├──────────────────────────────┤
│ ↑ │
│ heap │ grows ↑ brk() / glibc's arenas
├──────────────────────────────┤
│ .bss │ zero-initialized globals
├──────────────────────────────┤
│ .data │ initialized globals (writable)
├──────────────────────────────┤
│ .rodata │ string literals, const data
├──────────────────────────────┤
│ .text │ executable code (read-only)
└──────────────────────────────┘
Low address (randomized PIE base, e.g. 0x55...) Inspect a live process at /proc/<pid>/maps.
Dynamic Linking (ld.so)
For a dynamic binary, control actually starts in ld-linux.so. It loads the libraries the program needs, patches up addresses, then transfers control to the program's entry point.
- DT_NEEDED — List of shared objects to load (
libc.so.6,libpthread.so.0, etc.).ld.sowalks this list recursively, honoringRPATH/RUNPATHandLD_LIBRARY_PATH. - GOT & PLT — Global Offset Table holds the real addresses of external symbols after relocation. Procedure Linkage Table stubs lazily resolve function addresses on first call (unless
LD_BIND_NOWor-z now). - Relocations — For PIE binaries and every shared library, absolute addresses baked in at link time are wrong until ld.so adjusts them to the actual load address.
- RELRO — After relocations are applied, the GOT is
mprotected read-only, denying attackers a trivial function-pointer overwrite. - LD_PRELOAD — Forces a shared object to be loaded first; its symbols take precedence. Handy for interposing
malloc/free, mocking syscalls, or debugging. Ignored for setuid binaries. - Static binaries — No
PT_INTERP; the kernel jumps straight to the program's_start. Nold.sostep at all.
_start → main()
Between the kernel's jump and your main() there are several pieces of libc glue. Simplified x86-64 version:
/* crt1.o (from glibc) */
_start:
xor %ebp, %ebp /* zero the base ptr (end of call chain) */
mov (%rsp), %edi /* argc from kernel's stack */
lea 8(%rsp), %rsi /* argv */
lea 16(%rsp,%rdi,8), %rdx /* envp = argv + argc + 1 */
and $-16, %rsp /* 16-byte align */
lea __libc_csu_fini(%rip), %r8 /* finalizer for __libc_start_main */
lea __libc_csu_init(%rip), %rcx /* initializer */
lea main(%rip), %rdi /* pointer to user's main() */
call __libc_start_main /* never returns */
hlt /* unreachable */ __libc_start_main then:
- Initializes TLS and stack guard cookies.
- Runs the
initcallback (__libc_csu_init), which calls every constructor in.init_array(C++ globals,__attribute__((constructor)), etc.). - Registers
finiwithatexitso destructors run at exit. - Calls
main(argc, argv, envp). - Calls
exit(ret), which runsatexithandlers and.fini_array, then syscallsexit_group.
Shebang (#!) Resolution
Scripts aren't magic — the kernel recognizes #! as a binary format and rewrites the execve in-flight.
#!/usr/bin/env python3 Note over K: rewrite argv to:
["/usr/bin/env", "python3", "./deploy.py", "prod"] K->>K: recursive execve on /usr/bin/env Note over K: /usr/bin/env is ELF → binfmt_elf
takes over
Quirks worth knowing:
- Line length limit —
BINPRM_BUF_SIZEis 256 bytes; everything past that is ignored. Long shebang lines silently break. - At most one argument — Linux (unlike some BSDs) splits the shebang into interpreter and one argument.
#!/usr/bin/env -S foo barworks around this withenv's-S. - Relative paths — the interpreter path must be absolute; hence the
/usr/bin/envtrick to keep it portable while still resolving viaPATH. - Nested shebangs — If the interpreter is itself a script, the kernel will re-resolve recursively, up to 4 levels deep.
Security & Hardening
Exec is also the only moment when privileges, namespaces, and memory protections get a fresh start. The kernel does most of the enforcement here.
| Feature | What it does |
|---|---|
| setuid / setgid bit | Effective UID/GID becomes the file's owner. Dropped silently if filesystem is mounted nosuid or the process has no_new_privs set. |
| File capabilities | setcap cap_net_bind_service+ep lets a non-root binary bind to port 80 without full root. Replaces setuid for finer control. |
no_new_privs | Once set (prctl(PR_SET_NO_NEW_PRIVS)), no subsequent exec can gain privileges — setuid bits and capabilities are ignored. Required for seccomp-bpf in unprivileged processes. |
| ASLR | Randomizes PIE base, mmap region, heap start, and stack. Controlled by /proc/sys/kernel/randomize_va_space. |
| NX (W^X) | The PT_GNU_STACK segment marks the stack non-executable. Data pages aren't executable; .text isn't writable. |
| RELRO | After relocations, .got / .init_array / .dynamic become read-only. "Full RELRO" (-z now) resolves all PLT entries up-front so the GOT can be hardened immediately. |
| Stack canaries | Compiler inserts a random guard value between locals and the return address; libc aborts if it changes before ret. |
| Close-on-exec | FDs with O_CLOEXEC are closed during exec, so a new program can't inherit sensitive handles. |
References
- execve(2) — the syscall at the center of everything.
- elf(5) — ELF file format reference on Linux.
- ld.so(8) — the dynamic linker: search path, LD_PRELOAD, RELRO.
- dlopen(3) — programmatic interface to the dynamic linker.
- prctl(2) —
PR_SET_NO_NEW_PRIVSand other exec-time controls. - capabilities(7) — file and process capabilities.
- binfmt_misc — kernel docs for user-registered binary formats.
- TIS ELF Specification — canonical ELF spec (PDF).
- Linux Foundation refspecs — ABIs, ELF supplements, LSB documents.
- fs/binfmt_elf.c — the kernel's ELF loader (the authoritative implementation).
- fs/binfmt_script.c — shebang handling in the kernel.
- glibc start.S — the real
_startfor x86-64. - glibc __libc_start_main — what runs between
_startandmain. - Drepper: How to Write Shared Libraries — the standard reference on dynamic linking internals.
- LWN: How programs get run — a tour of the kernel's exec path.
- LWN: ASLR & offset2lib — background on position-independent executables and ASLR.
- Wikipedia: ELF — format overview and history.
- Wikipedia: Dynamic linker — concepts across Unix-likes and other OSes.
- Wikipedia: ASLR — history and variants.
- Wikipedia: Shebang — cross-Unix quirks, portability notes.