Insecure File Access and Path Traversal in Python
Path traversal, also known as directory traversal, is a common security vulnerability that allows attackers to access files and directories outside the intended scope of a web application.
This occurs when user input is used to construct file paths without proper sanitization or validation, enabling attackers to manipulate the path and access restricted areas of the file system.
In Python, this risk is especially relevant due to the availability of both safe and unsafe file handling functions in its standard library. If user input is passed directly to functions like open(), os.path.join(), or similar file access methods without proper validation, the application may inadvertently allow traversal to sensitive files.
While Python also provides utilities such as os.path.abspath() or pathlib for safer path handling, failure to implement these correctly can lead to exploitable conditions.
While path traversal is often associated with arbitrary file read attacks, its impact can escalate significantly when combined with file write operations.
If user input is used not only for reading files but also for generating file paths where files are written or uploaded, an attacker can use path traversal sequences to write malicious files outside the intended upload directories.
This section examines how various file access methods are implemented and how improper use of these methods can result in path traversal vulnerabilities.
Direct File Open
Passing user-controlled input to the open() function even when reading from a specific directory is not inherently safe and can lead to a path traversal vulnerability if the input is not properly validated.
Consider the following example:
import sys
def read_file(file_name):
f = open("files/%s" % file_name)
data = f.read()
print(data)
file_name = sys.argv[1]
read_file(file_name)
At first glance, it seems that the file access is limited to the files/ directory.
However, because the file_name parameter is derived directly from user input (via sys.argv), an attacker could supply values like ../../etc/passwd to traverse directories and read sensitive system files.
This works because the path is constructed through simple string formatting, without any checks or sanitization.
hackpad :: /opt/PyFu/generic-py-fu » python3 direct-file-open.py uploaded-data-march-2025.csv
name,email
Lolz, Lolz@pyfu.io
hackpad :: /opt/PyFu/generic-py-fu » python3 direct-file-open.py ../../../../../etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
Check if a file exists using os.path.exists()
The os.path.exists() function in Python is used to check whether a given path points to an existing file or directory.
While useful for verifying file presence, it does not provide any protection against path traversal attacks.
If user input is used to construct the path, an attacker can still probe for files outside the intended directory.
>>> os.path.exists("/etc/passwd")
True
>>> os.path.exists("../../../../etc/passwd")
True
>>> os.path.exists("../../../../../etc/passwd")
True
>>> os.path.exists("../../../../../etc/passwd5")
False
As shown, os.path.exists() will return True for any file that exists on the system even if the path includes traversal patterns like ../.
This highlights the importance of sanitizing and validating user input before using it to build file paths, as checking for existence alone does not prevent unauthorized access.
Check if a File is Valid Using os.path.isfile()
The os.path.isfile() function in Python checks whether a given path points to a valid regular file, not a directory, symlink, or special file.
Like os.path.exists(), it can be useful for confirming that a file exists and is of the correct type, but, it does not offer protection against path traversal.
>>> os.path.isfile("/etc/passwd")
True
>>> os.path.isfile("../../../../../etc/passwd")
True
>>> os.path.isfile("../../../../../etc/passwd5")
False
>>>
As seen here, os.path.isfile() will return True for valid files even if the path includes traversal sequences like ../.
This reinforces that simply checking if a file exists or is valid is not sufficient for secure file access.
Input used to construct file paths must be properly validated and restricted to prevent access to unintended parts of the filesystem.
Check if a file exists using pathlib
The pathlib module provides an object-oriented approach to handling filesystem paths in Python.
It includes the Path.exists() method, which checks whether a given path exists similar to os.path.exists().
For Example:
>>> from pathlib import Path
>>> file = Path("../../../../../etc/passwd")
>>> file.exists()
True
>>>
While pathlib improves code readability and offers more robust path manipulation capabilities, calling Path.exists() alone does not mitigate path traversal risks.
As with other file access methods, if the path is constructed from user input, it must be properly sanitized and validated to prevent access to unauthorized files or directories.
Path Concatenation using os.path.join()
The os.path.join() function is commonly used in Python applications to construct file paths by joining multiple components together.
It is often assumed to provide safety against path traversal, but in reality, os.path.join() simply performs string concatenation with proper separators based on the operating system.
It does not validate or sanitize the individual path components.
It’s mostly used when you concatenate a base directory with user-supplied, for example:
import os
UPLOAD_FOLDER = "./uploads/"
user_input = "readable.csv"
file_path = os.path.join(UPLOAD_FOLDER, user_input)
print(file_path)
# Output: ./uploads/readable.csv
This will concatenate readable.csv with the path ./uploads to get the full path ./uploads/readable.csv.
If the user provides input such as ../../../../../etc/passwd, the following code:
import os
UPLOAD_FOLDER = "./uploads/"
# user_input = "readable.csv"
user_input = "../../../../../etc/passwd"
file_path = os.path.join(UPLOAD_FOLDER, user_input)
print(file_path)
# Output: ./uploads/../../../../../etc/passwd
This will result in a path that includes the traversal sequence, allowing the path to escape the intended uploads directory when resolved.
The os.path.join() Absolute-Path Trap
os.path.join() has a second, sharper failure mode beyond passing ../ through unchanged. If any component handed to it is an absolute path, os.path.join() discards everything before it and returns that absolute path on its own:
>>> import os
>>> os.path.join("/var/www/uploads", "user_supplied.txt")
'/var/www/uploads/user_supplied.txt'
>>> os.path.join("/var/www/uploads", "/etc/passwd")
'/etc/passwd'
The base directory vanishes entirely. Here an attacker does not even need traversal sequences, a single leading / is enough to point the resulting path anywhere on the filesystem. Any code that joins a trusted base with a user-controlled component and assumes the result stays under the base is vulnerable to this on its own.
Escalating from Read to Write
Every method above is shown reading files, but the same unvalidated path flows into write operations, and that is where impact escalates. If user input reaches the path in open(path, "w"), a file upload handler, or an archive-extraction routine, traversal sequences let an attacker write outside the intended directory, overwriting configuration, dropping a .pth file into site-packages for code execution on the next interpreter start (see Import System Abuse with .pth Files and sys.meta_path), or planting a web shell in a served directory.
def save_upload(file_name, data):
# file_name = "../../app/config.py"
with open("uploads/%s" % file_name, "w") as f:
f.write(data)
A write primitive turns path traversal from information disclosure into code execution, which is why write paths built from user input deserve even more scrutiny than read paths.
Resolving Paths Safely
The fix is to never trust the constructed path. Resolve it to an absolute, canonical form and confirm it still lives under the intended base. os.path.realpath() collapses .. segments and follows symlinks; os.path.commonpath() then verifies containment:
import os
def safe_open(base, user_path):
base = os.path.realpath(base)
full = os.path.realpath(os.path.join(base, user_path))
if os.path.commonpath([base, full]) != base:
raise ValueError("path escapes the base directory")
return open(full)
The pathlib equivalent uses Path.resolve() with Path.is_relative_to() (Python 3.9+):
from pathlib import Path
def safe_open(base, user_path):
base = Path(base).resolve()
full = (base / user_path).resolve()
if not full.is_relative_to(base):
raise ValueError("path escapes the base directory")
return full.open()
Resolving before the containment check is essential. Inspecting the raw string for .. is not enough, because absolute paths, symlinks, and encodings all defeat naive string filtering. The framework-level version of this attack, and the same defense applied to a Flask route, is covered in Path Traversal in Flask Applications.
Running it in the lab
docker compose run --rm generic-py-fu python3 os-path-join.py
./uploads/../../../../../etc/passwd
os.path.join("./uploads", user_input) did nothing to contain the traversal: the attacker string is appended verbatim, and had it been an absolute path, join would have discarded the base directory entirely.
Why path traversal matters from an offensive security perspective
I keep path traversal near the top of my list on any Python target because it is a primitive that scales with the privileges of the process. On read it leaks /etc/passwd, application source, .env secrets, and session stores. On write it stops being information disclosure and becomes code execution: I drop a .pth file into site-packages for execution on the next interpreter start (see Import System Abuse with .pth Files and sys.meta_path), overwrite a config the app re-imports, or plant a web shell in a served directory. The os.path.join absolute-path trap makes this cheaper still, since a single leading / discards the base directory with no ../ needed.
What makes it attractive is how often the validation looks present but is not. Developers reach for os.path.exists, os.path.isfile, or Path.exists and assume those gate access, but every one of them returns True for a traversed path. The check exists, so the bug hides in plain sight during review.
Where I hunt for a reachable sink in an assessment:
- Download and “view file” endpoints that take a filename, report id, or template name and feed it to
openunder a base directory. - Upload, export, and report-generation handlers where a user-supplied name reaches
open(path, "w"), since that is the write primitive that escalates to code execution. - Archive extraction (
zipfile,tarfile) that joins entry names onto a target directory, the classic Zip Slip flowing throughos.path.join. - Cache, queue, and session backends keyed on user-influenced strings that become filenames on disk.
- Config and plugin loaders that resolve a path from a request parameter or environment value.
The audit tell is any open, os.path.join, or pathlib path built from request data where the only guard is an existence or type check rather than a resolved-and-contained check. For defenders the takeaway is that existence checks prove nothing about location, so containment must be verified against a canonicalized absolute path.
Mitigation
The fix is to never build a filesystem path by concatenating user input. Resolve the candidate with os.path.realpath (or Path.resolve) and confirm it stays inside an allowlisted base directory using os.path.commonpath or Path.is_relative_to before opening it, rejecting anything that escapes, since existence checks like os.path.exists provide no protection. Better still, accept an opaque identifier from the user and map it server-side to a known-safe path so the raw filename never reaches the filesystem call at all.