PyFu

Python Vulnerability Anatomy

Python-based Vulnerabilities Anatomy

This section dissects Python vulnerability classes at the language level, the mechanics of why a technique works, independent of any web framework. It is the foundation the rest of the handbook builds on: the framework pages in Flask and FastAPI show these same primitives reaching attacker input through routes and dependencies, but the engine underneath is documented here.

The split is deliberate. When you see “command injection in a Flask app”, the Flask part is just plumbing; the part that actually gives you a shell is subprocess with shell=True, and that lives here. Learn the anatomy once and you recognize the bug in any framework, any codebase, even in code that never touches the web.

Two lenses run through every page: the sink (the dangerous call that turns data into action, os.system, pickle.loads, eval, open) and the chain (how an attacker reaches that sink and escalates from “expression evaluates” to “code executes”). Offensively, you hunt for the sink and then build the chain; defensively, you cut the data path before it reaches the sink.

What this section covers

Injection & code execution

Deserialization & object injection

File access

Python internals as attack surface

Every technique here has a runnable counterpart under generic-py-fu/ in the PyFu lab (these are standalone scripts, not web apps). The framework-specific exploitation of the same classes, the Flask SSTI/SQLi/XXE apps, the FastAPI auth bypasses, lives in the Web Application Attacks section and is wired into the Docker lab.

Why Python’s native attack surface matters from an offensive security perspective

I keep this section as the first stop on every Python assessment because the bugs documented here are framework-independent and they pay in full. The same dynamic features that make Python pleasant to write, runtime evaluation, rich serialization, a programmable import system, and an object graph that exposes everything to everything, are exactly the primitives that turn “attacker controls a string” into “attacker runs code”. When I find one of these sinks, I am usually not looking at information disclosure or a privilege nudge; I am looking at RCE, persistence, or a sandbox escape. That is why I hunt the sink first and worry about the framework second.

What makes this surface valuable to me is that it is native and quiet. There is no exploit binary, no memory corruption, no version-specific gadget that breaks on the next patch. The techniques abuse documented, intended language behavior, so they survive upgrades and rarely trip the controls teams actually deploy. A __subclasses__() climb works the same in a template engine, a pickle gadget, and a homemade eval jail, which means one mental model unlocks several bug classes at once.

When I audit a Python codebase, these are the tells I grep for first:

  • Any sink that turns data into action. eval, exec, compile, os.system, subprocess(..., shell=True), pickle.loads, yaml.load, __import__. Each one is a direct line from input to execution, and I trace the data path backward to the nearest attacker source.
  • A “sandbox” or “safe eval” built inside the interpreter. Stripped __builtins__, name blocklists, and restricted namespaces signal a developer who knew the feature was dangerous and tried to contain it in the wrong layer. That is an escape waiting to happen, not a defense.
  • Deserialization of anything the user can influence. Session blobs, cache entries, message-queue payloads, and “config” files that round-trip objects are all __reduce__ territory.
  • Writable import locations. A .pth file or a sys.meta_path insertion means code runs before the program’s first line or hijacks a future import, with nothing visible in the application’s source.
  • Path values that reach open, os.path.join, or pathlib without canonicalization. Existence checks and prefix checks are not containment.

For defenders the takeaway is blunt: Python’s expressiveness is the attack surface, so the only durable control is to keep untrusted data away from these sinks entirely rather than to filter what reaches them, because the object model offers too many equivalent paths to the same capability.