Python Process Persistence: How to Continue Running Script in Background

Every engineer eventually kills a long-running job by closing an SSH session. You run continue running python script in background with a bare ampersand, close the terminal, and three hours later the migration is dead — no logs, no error, just a missing process. Thats not bad luck. Thats POSIX signal propagation working exactly as designed. The fix isnt a one-liner. Its understanding the TTY ownership chain, Pythons buffering behavior, and choosing the right persistence layer for the workload. Everything else is a workaround youll regret at 2AM.

# The dangerous shortcut every junior engineer writes first
$ python3 long_job.py &
[1] 18432
$ exit
# SIGHUP propagates to process group — job is dead before you close the laptop

POSIX Signal Layer: What Actually Kills Your Process

When you open an SSH session, the kernel creates a session with a controlling terminal. Your shell is the session leader. Every process you launch from that shell inherits the same session ID and process group ID. When the SSH connection drops — cleanly or not — the kernel sends SIGHUP to the foreground process group. The default disposition of SIGHUP is termination. Unless the process explicitly ignores or handles SIGHUP, it dies. This is not configurable at the OS level. It is a fundamental part of how Unix process groups work, documented in POSIX.1-2008, and it behaves identically on Linux, macOS, and every BSD variant.

The three signals that matter: SIGHUP is the session-death signal sent when the controlling terminal closes. SIGTERM is the polite shutdown request — systemd sends this first when stopping a service. SIGKILL cannot be caught or ignored at the application level; it goes directly to the kernels process scheduler. Python handles none of these correctly by default. You wire them up yourself, and most tutorials skip this because it requires understanding what the signals actually do.

import signal
import sys
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    handlers=[
        logging.FileHandler("/var/log/worker.log"),
        logging.StreamHandler(sys.stdout)
    ]
)

def _handle_sigterm(signum, frame):
    logging.info("SIGTERM received — flushing state and exiting")
    # release DB connections, close file handles, flush queues here
    sys.exit(0)

signal.signal(signal.SIGTERM, _handle_sigterm)
signal.signal(signal.SIGHUP, signal.SIG_IGN)  # survive terminal closure

Signal Registration: Why SIG_IGN for SIGHUP and a Real Handler for SIGTERM

Setting SIGHUP to signal.SIG_IGN tells the kernel to discard the signal rather than terminate the process. This is exactly what nohup does before execing your script — it sets the SIGHUP disposition to SIG_IGN, then replaces itself with your process. Writing it explicitly in Python is more portable and more readable. The SIGTERM handler needs actual cleanup logic: flush in-progress writes, acknowledge any queued messages, and close database connections. Skipping this means systemds 90-second timeout window expires and SIGKILL fires, leaving half-written files and open transactions. The root fix is treating signal handlers as first-class application logic, not optional boilerplate appended after the main loop compiles cleanly.

Hohup and Disown: Tactical Tools With Real Limits

The nohup command does exactly one thing: sets SIGHUP to SIG_IGN before execing the target process and redirects stdout to nohup.out if stdout is a terminal. It doesnt daemonize, doesnt set up a PID file, doesnt touch stderr. The default nohup.out is a trap — on a busy worker it fills your root partition silently. Ive watched this take down a staging server the night before a release.

Related materials

Python Async Gotchas Explained

Python asyncio pitfalls You’ve written async code in Python, it looks clean, tests run fast, and your logs show overlapping tasks. These are exactly the situations where Python asyncio pitfalls start to reveal themselves. It...

[read more →]

The disown builtin is the retroactive escape hatch. If a process is already running under &, disown -h %1 removes it from the shells job table and marks it to ignore SIGHUP. The problem: disown doesnt fix stdout. The process still holds a file descriptor pointing at the dying TTY. Any write to stdout after terminal closure produces a SIGPIPE, which Pythons default handler turns into BrokenPipeError and terminates the process anyway.

# Correct nohup usage — explicit stderr, PID capture, unbuffered output
$ nohup python3 -u worker.py \
    >> /var/log/worker/app.log \
    2>> /var/log/worker/error.log \
    & echo $! > /var/run/worker.pid

# Retroactive rescue — job already running
$ jobs -l
[1]+ 22109 Running  python3 pipeline.py
$ disown -h %1
$ exec > /dev/null 2>&1  # kill terminal I/O before exit

The -u Flag and Why stdout Redirection Without It Is Useless

The -u flag forces Python to run in unbuffered mode: stdout and stderr become raw binary streams with no internal buffer. Without it, when stdout is redirected to a file — which it always is in a proper nohup python script invocation — Python switches to fully-buffered mode with an 8KB block buffer. The process can run for minutes generating output while the log file stays empty, because nothing has flushed the buffer yet. You can also set PYTHONUNBUFFERED=1 in the environment for the same effect. Either works. The important thing is that you pick one and make it standard across all background invocations in your infrastructure. Relying on default buffering in background processes is a silent data-loss bug waiting for a specific timing condition to surface it.

Buffered vs Unbuffered I/O: The Python-Specific Trap

Pythons I/O layer sits on top of the OS file descriptor layer and adds its own buffering logic. When stdout is connected to a TTY, its line-buffered — each newline triggers a flush. When stdout is connected to a file or pipe, Python switches to block-buffered mode unless told otherwise. This is documented in the io module reference, but nobody reads it until theyve stared at an empty log file for twenty minutes. The behavior is consistent and reproducible, which somehow makes it more frustrating.

import sys
import io

# Option 1: replace stdout with an unbuffered binary wrapper
sys.stdout = io.TextIOWrapper(
    sys.stdout.buffer,
    line_buffering=True,
    write_through=False
)

# Option 2: explicit flush after every write (works for mixed output)
print("Starting batch job", flush=True)

# Option 3: use logging exclusively — FileHandler flushes per record by default
import logging
logging.basicConfig(filename='/var/log/worker.log', level=logging.DEBUG)
logging.info("This reaches disk immediately")

sys.stdout.flush() vs line_buffering=True vs the logging Module

The three approaches have different tradeoffs. Explicit flush=True in every print() call works but is high-friction — it requires discipline across a codebase and breaks the moment someone adds a print statement without the flag. Replacing stdout with a line-buffered TextIOWrapper is a one-time fix at the top of the entrypoint that covers all downstream print calls, but it can interact badly with C extensions that write directly to stdout.fileno(). The python daemon process pattern using only the logging module is the cleanest: FileHandler calls flush() after every emit() by default, you get structured log records, and you never touch stdout at all. Treating print() as a logging primitive in background processes is legacy debt — swap it out at the architecture level, not with per-call flush arguments.

Screen vs tmux for Python: Multiplexer Persistence and Its Ceiling

Terminal multiplexers solve the TTY problem by inserting a server process between your SSH connection and the shell. The tmux server runs as a daemon, holds pseudo-terminal sessions, and your SSH connection is just a client attaching to an existing session. When you disconnect, the session stays alive because its attached to the tmux servers process, not your SSH session. For ad-hoc long-running scripts, interactive data exploration, or jobs you genuinely need to watch, this is the right tool. The debate around screen vs tmux for python largely ends at: screen is older and pre-installed everywhere, tmux has a better scripting API and saner key bindings. Use tmux unless youre on a locked-down box that only has screen.

The ceiling is clear: multiplexers dont survive reboots, dont auto-restart failed processes, and provide no resource limits. A tmux session holding a production worker is one reboot away from silent failure — Ive seen entire background job infrastructures that were a single tmux session nobody knew how to recreate.

# Tmux: detached session with logging baked in
$ tmux new-session -d -s ml_worker \
    'python3 -u train.py 2>&1 | tee -a /var/log/ml_worker.log'

# Reattach
$ tmux attach -t ml_worker

# Kill cleanly
$ tmux send-keys -t ml_worker C-c
$ tmux kill-session -t ml_worker

# Screen equivalent for locked-down servers
$ screen -dmS scraper python3 -u scraper.py
$ screen -r scraper

Pty Layer and Its Effect on Python Buffering

Both screen and tmux allocate a pty (pseudo-terminal) for the shell session they manage. When Python runs inside a pty, it detects a TTY and switches back to line-buffered mode — which means you dont need the -u flag inside multiplexers. This is one of the underappreciated practical differences between running a script with python background execution linux via nohup versus via tmux: the buffering behavior changes. The tee pipeline in the code above also runs inside the same pty, which means log output hits the file in real time. The tradeoff is pty overhead, which is measurable on high-throughput pipe-heavy workloads. Use multiplexers for visibility and interactivity; move to systemd or supervisord the moment the job needs to survive without human supervision.

Related materials

Python zip() Explained

Understanding Common Mistakes with Tuples and Argument Unpacking in zip() in Python If you've worked with Python for more than a few weeks, you've probably used zip() in Python explained — it's one of those...

[read more →]

Systemd Service Units: The Production Standard

On any modern Linux distro, systemd is the init system, and using anything else for persistent background workloads means hand-rolling the features systemd already provides: restart on failure, resource limits via cgroups, dependency ordering, log aggregation via journald, socket activation, and clean shutdown sequencing. A systemd service unit for a Python worker is twenty lines of INI that replaces a fragile mix of nohup, PID files, and cron-based health checks. There is no legitimate reason to avoid it on a non-containerized system. The systemd is too complex argument comes from engineers whove never actually written a unit file — it takes about ten minutes to read the man page for systemd.service and understand 90% of what youll ever need.

[Unit]
Description=Python Background Worker
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=simple
User=appuser
Group=appgroup
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/.venv/bin/python -u worker.py
Restart=on-failure
RestartSec=10s
StartLimitIntervalSec=60s
StartLimitBurst=3
StandardOutput=journal
StandardError=journal
Environment=PYTHONUNBUFFERED=1
PrivateTmp=true
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

Restart Policies, Resource Isolation, and User Units

Restart=on-failure restarts only on non-zero exit codes. Restart=always restarts on clean exits too — wrong for batch jobs, correct for persistent workers. StartLimitBurst=3 combined with StartLimitIntervalSec=60s prevents a crashing process from restart-looping and hammering a database or external API. PrivateTmp=true gives the service its own /tmp namespace, preventing tmp-file collisions between services. For non-root deployments, systemd user units work identically: drop the unit file in ~/.config/systemd/user/ and use systemctl --user enable worker.service. loginctl enable-linger username ensures the users systemd instance starts at boot without requiring a login session. Every production Python background worker that isnt managed by systemd or an equivalent supervisor is technical debt with an unknown expiry date.

Supervisord: The Non-Root Alternative

Supervisord fills the gap between tmux duct-tape and full systemd ownership. Its a Python-based process control system that runs as a user process, requires no root access, and manages child processes with configurable restart policies, log rotation, and a basic HTTP API. For shared hosting environments, containers where systemd isnt running, or situations where you dont control the init system, supervisord is the correct tool. Its also reasonable for development environments where you want to manage background processes without touching system-level configuration.

; /etc/supervisor/conf.d/worker.conf
[program:python_worker]
command=/opt/myapp/.venv/bin/python -u /opt/myapp/worker.py
directory=/opt/myapp
user=appuser
autostart=true
autorestart=true
startretries=3
redirect_stderr=true
stdout_logfile=/var/log/supervisor/worker.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
environment=PYTHONUNBUFFERED="1"

Supervisord vs systemd: When to Use Each

The practical distinction: systemd is the right choice when you own the machine and need process startup tied to system boot, dependency ordering against other services, or cgroup resource limits. Supervisord is correct when youre deploying to a machine where modifying systemd units requires a sysadmin ticket, or in Docker containers where systemd isnt the init process. The stdout_logfile_maxbytes and stdout_logfile_backups options handle log rotation internally — one less thing to wire up with logrotate. The redirect_stderr=true directive merges stderr into the same log stream, which is almost always what you want for a Python worker rather than managing two separate log files. Supervisord is not a fallback for engineers who find systemd intimidating — its a legitimate tool with a specific use case, and conflating the two leads to wrong decisions at deployment time.

Related materials

CPython JIT Overhead

CPython JIT Memory Overhead: Why Your 3.14+ Upgrade Is Eating RAM Quick-Fix Set PYTHON_JIT=0 in production containers → stops JIT warm-up allocation on startup Monitor RSS, not just CPU — a 5% CPU gain paired...

[read more →]

Docker Detached Mode: Persistence in Containers

In Docker, the TTY problem is solved by design — the containers PID 1 is your process, not a shell session. But process persistence in containers has its own failure modes. When Python is PID 1, it receives signals directly from the Docker daemon: docker stop sends SIGTERM and waits 10 seconds before escalating to SIGKILL. If Python doesnt handle SIGTERM — which it wont by default — every deployment kills the worker hard and leaves in-flight work abandoned. The buffering problem is identical to bare metal: PYTHONUNBUFFERED=1 is mandatory in the Dockerfile environment block, not optional.

# Dockerfile — Python as PID 1, unbuffered from the start
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY worker.py .
CMD ["python", "-u", "worker.py"]

# Detached run with restart policy
$ docker run -d --name ml_worker --restart unless-stopped \
    -v /var/log/workers:/app/logs myapp/worker:latest

PID 1 Signal Semantics vs Bare Metal

On bare metal, systemd is PID 1 and acts as a signal relay — your Python process is a child that receives SIGTERM from systemd with configurable timeout and cleanup ordering. In a container, Python is PID 1, which means zombie reaping and signal forwarding are now its responsibility. For multi-process Python workers using multiprocessing or subprocess, you need explicit child reaping logic or a minimal init like tini as the container entrypoint. The --restart unless-stopped policy handles container-level restarts but does nothing for SIGTERM handling inside the process — those are separate concerns that get conflated constantly. In containers, the absence of a proper init layer and missing SIGTERM handlers are the two most common causes of unclean shutdown data loss.

FAQ

What is the correct way to use nohup python script to prevent output loss?

Always combine nohup with the -u flag on the Python interpreter and explicit redirection of both stdout and stderr to separate log files. The default nohup.out file is unrotated and will fill your disk. Capture the PID with echo $! > /var/run/myjob.pid immediately after backgrounding so you can track or kill it later.

How does a python daemon process differ from a background process?

A daemonized process has detached from its original session entirely: it calls setsid() to create a new session, forks twice to ensure it can never reacquire a controlling terminal, and closes all inherited file descriptors. Pythons python-daemon library handles this correctly. A simple background process with & or nohup is still session-member — it just ignores SIGHUP. Daemonization is the proper approach for system-level services; nohup is a tactical shortcut for interactive sessions.

When does screen vs tmux for python actually change the outcome?

For most Python workloads, the choice doesnt affect correctness — both provide pty-based session persistence. Tmux becomes the better choice when you need scripted session creation, window layouts, or integration with tools like tmuxinator. Screen is the fallback when tmux isnt installed and you cant install packages. Neither survives a reboot without additional tooling.

How do I configure a systemd service unit for a Python script running in a virtualenv?

Point ExecStart directly at the Python binary inside the virtualenv path: /opt/myapp/.venv/bin/python worker.py. You do not need to source the activate script — using the absolute path to the interpreter implicitly loads that environments site-packages. Set WorkingDirectory to the project root so relative imports resolve correctly.

What is the best approach to manage background processes for multiple Python workers?

Systemd template units are the correct abstraction: name the file worker@.service and instantiate it as worker@1.service, worker@2.service, and so on. Each instance gets its own log stream in journald and independent restart tracking. For dynamic worker counts, supervisords numprocs directive is simpler to configure than managing multiple template unit instances manually.

Written by:

Bart.F Burek