Python Debugging: How to Find and Fix Errors from Code to Production
Most developers spend more time debugging than writing code — and that ratio gets worse as systems grow. Knowing how to debug Python code is not a soft skill, it is the difference between a two-hour fix and a two-day incident. The gap between a junior and a senior engineer is not syntax knowledge, it is the ability to read broken state, trace execution, and isolate root cause without guessing.
TL;DR: Quick Takeaways
- Print debugging is a trap — structured logging and pdb give you actual state, not guesses
- py-spy lets you profile a frozen process without touching the source or restarting
- asyncio errors are often silent — unhandled task exceptions vanish unless you explicitly handle them
- In Kubernetes, debugpy + port-forward is the only sane way to attach a debugger to a running pod
Common Python Debugging Mistakes That Cost You Hours
Bad debugging habits compound. One wrong assumption early in an investigation can send you down a rabbit hole for hours. These are the patterns that consistently slow engineers down — not because they are stupid mistakes, but because they feel productive in the moment.
The most common trap is print hell — scattering print() calls everywhere, running the script, adding more prints, running again. It feels like progress. It is not. You are flying blind with a flashlight instead of turning on the lights. Prints disappear when you need them in production, they add no context about call stack or variable scope, and they leave your codebase looking like a crime scene.
The second trap is chasing symptoms instead of root cause. Your API returns a 500. You immediately look at the endpoint handler. But the real issue is three layers down — a database connection pool that exhausted silently. This is what makes debugging Python hard for most people: the surface error and the actual cause are rarely in the same place.
Heisenbugs deserve a mention here — bugs that disappear when you try to observe them. Classic examples: a race condition that vanishes when you add logging (because logging slows things down enough to close the timing window), or a memory leak that only appears under production load. These require a different strategy entirely: monitoring and tracing instead of interactive debugging.
Finally: no reproducible test case. If you cannot reproduce the bug consistently, you are not debugging — you are gambling. Before touching a single line of code, your first job is always to make the failure deterministic.
Python Debugging Workflow: Step by Step
A structured approach cuts investigation time dramatically. Experienced engineers do not thrash randomly through a codebase — they follow a process, and that process is mostly the same every time.
Step one: reproduce the issue. Minimal, isolated, deterministic. If you cannot reproduce it in a unit test or a small script, you are not ready to fix it yet. Step two: read the traceback completely — not just the last line. The full call stack tells you the execution path that led to the failure. Step three: isolate the root cause by forming a hypothesis and testing it, not by changing random things and hoping.
# Minimal reproduction pattern
def process_data(items):
for item in items:
result = transform(item) # crash here?
store(result)
# Isolate: does transform() fail on specific input?
# Test with known-bad input before touching store()
if __name__ == "__main__":
process_data([None, "", {"key": None}])
Step four is where most people skip ahead: inspect program state before you fix anything. Use pdb, structured logs, or variable dumps to confirm your hypothesis. A fix based on a wrong hypothesis just adds another bug. Step five: apply the fix, write a regression test, verify under the same conditions that reproduced the original failure.
Core Techniques: PDB, IDE Debuggers, and Logging
The standard toolset covers 80% of debugging scenarios. The issue is most developers never fully learn the tools they already have — they use 20% of pdb’s capabilities and wonder why debugging takes so long.
Why Python Pitfalls Exist It is common to view unexpected language behavior as a collection of simple mistakes or edge cases. However, defining python pitfalls merely as traps for inexperienced developers is a misleading framing....
[read more →]How to Use PDB in Python
PDB is the built-in Python debugger, and since PEP 553 (Python 3.7+), you drop into it with a single line: breakpoint(). No imports, no setup. Once inside, the core commands are: n (next line, stay in current frame), s (step into function call), c (continue to next breakpoint), p (print expression), and where (show current call stack). Frame inspection with up and down lets you walk the call stack and inspect state at each level — that alone solves a huge class of bugs.
def calculate_total(orders):
breakpoint() # execution stops here
total = 0
for order in orders:
# (pdb) p order
# (pdb) p order.get('amount')
# (pdb) where — shows full call stack
total += order['amount']
return total
The mini-analysis: breakpoint() drops you into a live frame with access to all local variables. You can evaluate arbitrary expressions, modify variables mid-execution, and inspect the full call stack. This is not print debugging — you are interacting with the running process.
IDE Debugging: VS Code and PyCharm
Visual Studio Code debugger setup for Python requires the Python extension and a launch.json in .vscode/. Set breakpoints visually, use the watch panel for expressions, and the call stack panel replaces where. PyCharm’s debug mode is heavier but has better support for Django and async debugging out of the box. Both IDEs wrap pdb under the hood — the difference is just interface and project integration.
Logging Over Print: The Right Default
Switch to logging.debug() and never look back. The Python logging module gives you severity levels, timestamps, module names, and configurable output — all things print() cannot do. In production, you can set LOG_LEVEL=DEBUG without touching code. With print, you either have output or you don’t.
import logging
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s %(name)s %(levelname)s %(message)s"
)
log = logging.getLogger(__name__)
def process(payload):
log.debug("Processing payload: %s", payload)
result = transform(payload)
log.debug("Transform result: %s", result)
return result
The structured format here includes timestamp and module name automatically. When this runs in a container, your log aggregator can filter by level and module without any post-processing. That is not something you can retrofit onto print statements after the fact.
Advanced Debugging: When Basic Tools Fail
Standard breakpoints and logging cover most cases. But production systems have a class of problems that cannot be debugged interactively — processes that freeze, async tasks that silently die, and race conditions that only appear under load. These need a different set of tools.
Debugging a Frozen Python Process: GIL and CPU Issues
Your process is at 100% CPU. No traceback. No log output. This is one of the most disorienting scenarios in Python debugging — the process is clearly doing something, but you have no visibility into what. The GIL (Global Interpreter Lock) means only one thread executes Python bytecode at a time, but CPU can still spike from a tight loop in pure Python or from extension code that releases the GIL and thrashes.
The tool here is py-spy. It attaches to a running Python process by PID without modifying source or restarting.
# Install once
pip install py-spy
# Attach to a running process by PID
py-spy top --pid 12345
# Or generate a flame graph
py-spy record -o profile.svg --pid 12345 --duration 30
py-spy samples the call stack at regular intervals and shows you exactly which function is eating CPU. A flame graph from a 30-second sample has identified infinite loops, expensive serialization in hot paths, and deadlocked threads that would have taken hours to find otherwise. For deadlock detection specifically: if two threads appear frozen waiting for the same lock, py-spy’s stack output will show both frames blocked on lock acquisition.
Async Bugs and Silent Failures in asyncio
Debugging asyncio tasks is painful because Python will swallow exceptions from tasks that are never awaited. A task raises, nobody catches it, the event loop logs a warning at shutdown — if you’re lucky. In production, that warning disappears into log noise. This is why asyncio tasks fail silently more often than any other Python concurrency primitive.
<code>import asyncio
async def risky_task():
raise ValueError("something broke")
async def main():
# BAD: fire and forget, exception is lost
asyncio.create_task(risky_task())
await asyncio.sleep(1)
# GOOD: use TaskGroup (Python 3.11+)
async with asyncio.TaskGroup() as tg:
tg.create_task(risky_task()) # ExceptionGroup raised here
TaskGroup (Python 3.11+) surfaces exceptions immediately via ExceptionGroup — the entire group cancels when any task raises. This converts silent failure into a visible traceback. For event loop lag and race conditions, asyncio.set_event_loop_policy with a debug policy adds slow callback warnings and enables tracking of where tasks were created. Set PYTHONASYNCIODEBUG=1 as an environment variable before running to get the full diagnostic output without touching code.
Python Web Framework: How It Shapes Architecture A deep dive into Python web frameworks — their architectural role, real trade-offs, and why choosing the wrong one costs more than just time. For developers who want...
[read more →]Runtime Monitoring Without Stopping the Application
Sometimes you cannot attach a debugger and cannot restart the process. In these cases, observing without interrupting execution is the only option. PEP 578 audit hooks let you register a callback that fires on specific runtime events — imports, file opens, exec calls — without modifying any application code. The tradeoff is real: sys.settrace adds measurable overhead on every function call and line execution. In production, use it surgically and for short observation windows, not as permanent instrumentation.
<code">import sys
def audit_hook(event, args):
if event == "open":
print(f"File access: {args[0]}")
sys.addaudithook(audit_hook)
# Now every file open in this process is logged
# Use for: tracing unexpected I/O, security auditing
# Avoid: high-frequency events in hot paths
This pattern is useful for tracing unexpected I/O or catching code paths you didn’t know existed. The audit hook fires synchronously, so keep it fast or you will introduce latency into whatever you’re observing. A context manager that installs and removes the hook around a specific operation is the cleanest approach for non-invasive Python debugging in constrained environments.
Production Diagnostics: Remote Debugging in Kubernetes
Attaching a debugger to a running container is a legitimate production debugging strategy — not a hack. The tool is debugpy, the Python debug adapter that VS Code and PyCharm both support as a remote backend. The workflow: expose a debug port in your container, port-forward it with kubectl, attach from your IDE.
<code">import debugpy
# In your application entrypoint
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client() # optional: block until IDE connects
Then in your cluster:
<code"># Port-forward the debug port to localhost
kubectl port-forward pod/my-app-pod-xyz 5678:5678 -n production
# Then attach from VS Code launch.json:
# "request": "attach", "connect": { "host": "localhost", "port": 5678 }
Security is not optional here. Never expose port 5678 in a production Service or Ingress. Port-forward only — it creates a temporary tunnel through the Kubernetes API server, authenticated by your kubeconfig. The moment you expose a debug port publicly, you have given anyone with network access an interactive Python shell on your server. The attach debugger to docker Python process pattern works identically for local containers — just map the port in docker run instead of using kubectl.
Debugging Without Restarting the Application
In stateful systems — long-running workers, WebSocket servers, anything holding in-memory state — a restart means losing the context you need to debug. Safe live debugging strategies prioritize observability over interruption: attach py-spy to sample the stack, enable debug logging via a signal handler, or use audit hooks to trace specific operations. The principle is simple — get as much information as possible before you touch anything. Restarting is always an option, but it should be the last one, not the first.
Python Stack Trace Analysis: Where Every Debug Session Starts
Before any tool, any breakpoint, any py-spy session — there is the traceback. Think of it as the autopsy report of your crash. It tells you the exact sequence of function calls that led to the failure, the file and line number at each frame, and the exception type with its message. Most developers read only the last line. That is a mistake. The interesting part is usually three or four frames up — where your code called into a library, or where a wrong assumption was passed down the call stack and finally exploded somewhere unrelated.
Effective Python stack trace analysis is more than just looking at the error message. Reading a Python stack trace correctly means starting from the bottom (the exception) and reading upward until you hit code you own. The frame where your code ends and library code begins is almost always where the real fix lives.
FAQ
Why is my Python code not working even though there are no errors?
No errors means no exception was raised — it does not mean the code did what you expected. Silent failures are common in Python: a function returns None when you expected a value, a condition is always False so a branch never executes, or an async task raises and nobody awaits it. Start by adding explicit assertions on your assumptions, then use pdb to inspect the actual values flowing through your code versus the values you think are there. The gap between those two is where the bug lives.
Why Senior Developers Keep Hitting the Same Advanced Python Traps Most production incidents involving Python aren't caused by missing logic — they're caused by a misunderstood object model, a garbage collector that did exactly what...
[read more →]How do I debug Python code faster without losing my mind?
The fastest debugging sessions follow a strict hypothesis-test loop: form one specific hypothesis, test it with the minimum change needed, confirm or reject, repeat. The slowest sessions are the ones where you change three things at once and then don’t know which one fixed it. Use breakpoint() to drop into pdb at the exact point of failure, inspect state before assuming anything, and write a failing test case before you write the fix. A regression test is not optional — it is the proof that you actually fixed the root cause.
What is the best Python debugger for production environments?
It depends on what you need. For interactive debugging of local or containerized code, debugpy with VS Code or PyCharm is the standard. For profiling a live process without touching it, py-spy is the best option — it is low-overhead, works on CPython and PyPy, and requires no code changes. For async-heavy applications, the built-in asyncio debug mode surfaces task exceptions and slow callbacks that would otherwise be invisible. There is no single best tool — a production debugging workflow uses all three depending on the symptom.
Why does my Python process freeze without any error output?
Three common causes: a deadlock where two threads are waiting on each other’s locks indefinitely, a tight loop in pure Python or in C extension code that releases the GIL, or a blocking I/O call with no timeout set. The GIL does not protect you from deadlocks — it only serializes Python bytecode execution. Use py-spy to sample the frozen process and look at what the threads are blocked on. If you see the same frame repeated across many samples, that is your hot spot. If you see threads blocked on lock acquisition with no samples showing the lock being released, that is a deadlock.
How do I debug asyncio tasks that fail silently in Python?
Set PYTHONASYNCIODEBUG=1 before running your application — this enables debug mode on the event loop, which logs warnings for slow callbacks (event loop lag over 100ms) and coroutines that were never awaited. For Python 3.11+, migrate fire-and-forget create_task calls to TaskGroup — it raises an ExceptionGroup immediately when any task fails, making silent failures impossible. For older codebases, add a done callback to each task that logs or re-raises exceptions: task.add_done_callback(lambda t: t.exception()).
Is remote debugging in Kubernetes safe?
It is safe if you use port-forwarding and never expose the debug port through a Service or Ingress. kubectl port-forward creates an authenticated tunnel through the Kubernetes API — the port is only reachable from your local machine while the forward is active. The risk is configuration mistakes: a debug port accidentally added to a LoadBalancer Service, or a debugpy listener left active in a production image. Both can be mitigated with network policies that block inter-pod access to the debug port and build pipelines that strip debug instrumentation from production images.
Final Word
Debugging is not glamorous work, but it is where you actually learn how systems behave under real conditions — not the idealized version you designed. Every production incident is a free lesson in what your assumptions got wrong. The engineers who get fast at debugging are the ones who treat each session as an investigation with structure, not a desperate search with grep.
Use the right tool for the symptom: pdb for interactive inspection, py-spy for frozen processes, debugpy for remote sessions, asyncio debug mode for task failures. Stop using print as a debugging strategy — it scales to zero and tells you nothing about context.
The goal is simple: get from “something is broken” to “I know exactly where and why” as fast as possible. Everything else is noise.
Written by: