Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Finding File Handling and os.path Operations

Path traversal (directory traversal) occurs when user input influences filesystem paths without proper validation.

Example:

filename = request.args.get("file")
with open("uploads/" + filename, "r") as f:
    data = f.read()

If filename is:

../../etc/passwd

The attacker reads arbitrary system files.

What to Look for in Code Review


Misuse of os.path.join()

Developers often assume:

safe_path = os.path.join("/app/uploads", filename)

But if filename starts with /:

"/etc/passwd"

The base path is ignored.

Archive Extraction Vulnerabilities

If applications extract ZIP or TAR files:

import tarfile
tar.extractall(path="uploads/")

Malicious archives may contain:

../../app.py

This overwrites application files.

Secure Pattern for Path Validation

Python applications should always validate user input and use secure path handling techniques to ensure that user-supplied paths cannot escape intended boundaries.

While os.path.realpath() is a valid approach, the modern standard in Python is using the pathlib module. It provides a more robust way to resolve paths and verify their integrity.

from pathlib import Path

# Resolve the base directory to an absolute, canonical path
BASE_DIR = Path("/var/www/data").resolve()

def save_file(filename, data):
    # Combine and resolve the path to remove symbols like '../' and symlinks
    target_path = (BASE_DIR / filename).resolve()

    # Enforce directory boundary: ensure the target is inside BASE_DIR
    if BASE_DIR not in target_path.parents and target_path != BASE_DIR:
        raise ValueError("Security Alert: Attempted directory traversal")

    with open(target_path, "wb") as f:
        f.write(data)

There are of course more mitigations possible. If possible the best approach is always to completely eliminate directory traversal possibilities by stripping all path components from user input.