CVE-2026-28350
Received Received - Intake
Base Tag Injection Vulnerability in lxml_html_clean Allows Link Hijacking

Publication date: 2026-03-05

Last updated on: 2026-03-09

Assigner: GitHub, Inc.

Description
lxml_html_clean is a project for HTML cleaning functionalities copied from `lxml.html.clean`. Prior to version 0.4.4, the <base> tag passes through the default Cleaner configuration. While page_structure=True removes html, head, and title tags, there is no specific handling for <base>, allowing an attacker to inject it and hijack relative links on the page. This issue has been patched in version 0.4.4.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-03-05
Last Modified
2026-03-09
Generated
2026-05-07
AI Q&A
2026-03-05
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
fedoralovespython lxml_html_clean to 0.4.4 (exc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-116 The product prepares a structured message for communication with another component, but encoding or escaping of the data is either missing or done incorrectly. As a result, the intended structure of the message is not preserved.
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

[{'type': 'paragraph', 'content': "CVE-2026-28350 is a vulnerability in the lxml_html_clean Python library, specifically in versions prior to 0.4.4. The issue arises because the library's default Cleaner configuration does not remove or handle the HTML <base> tag. Although the HTML specification requires the <base> tag to be inside the <head> element, browsers accept <base> tags placed outside the head. This allows an attacker to inject a <base> tag that changes the base URL for all relative URLs on the page."}, {'type': 'paragraph', 'content': 'Because the <base> tag sets the base URL for relative links, an attacker can hijack these links by injecting a malicious <base> tag, redirecting users to attacker-controlled domains. This can affect links, images, scripts, and form actions on the page.'}] [1, 2]


How can this vulnerability impact me? :

[{'type': 'paragraph', 'content': 'This vulnerability can impact you in several ways due to the hijacking of relative URLs via the injected <base> tag:'}, {'type': 'list_item', 'content': "Phishing and Redirection: Attackers can redirect user navigation and form submissions to malicious sites, potentially stealing credentials or exfiltrating data without the user's knowledge."}, {'type': 'list_item', 'content': 'Cross-Site Scripting (XSS): If the application loads JavaScript files via relative paths, the injected <base> tag can cause the browser to fetch scripts from an attacker-controlled domain, escalating the attack to stored XSS.'}, {'type': 'list_item', 'content': "Defacement: Relative references to images and stylesheets can be loaded from the attacker's server, allowing UI redressing or defacement of the website."}] [2]


How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

I don't know


How can this vulnerability be detected on my network or system? Can you suggest some commands?

[{'type': 'paragraph', 'content': 'This vulnerability involves the presence of <base> tags in HTML content processed by the lxml_html_clean library prior to version 0.4.4. To detect it, you can inspect HTML outputs or inputs for unexpected <base> tags that could hijack relative URLs.'}, {'type': 'paragraph', 'content': 'A practical approach is to analyze HTML content for <base> tags, especially those outside the <head> element or injected unexpectedly. For example, you can use command-line tools like grep or curl combined with HTML parsing to find such tags.'}, {'type': 'list_item', 'content': "Use curl or wget to fetch HTML content from your web application: curl -s http://yourapp.example.com | grep -i '<base '"}, {'type': 'list_item', 'content': 'Use Python to test if the lxml_html_clean library version is vulnerable by running a script that cleans HTML containing a <base> tag and checking if it remains: \n```python\nfrom lxml_html_clean import clean_html\nresult = clean_html(\'<base href="http://evil.com/"><a href="/account">Account</a>\')\nprint(result)\n```\nIf the output still contains the <base> tag, the version is vulnerable.'}] [2, 1]


What immediate steps should I take to mitigate this vulnerability?

[{'type': 'paragraph', 'content': 'The primary mitigation is to upgrade the lxml_html_clean library to version 0.4.4 or later, where the vulnerability has been patched.'}, {'type': 'paragraph', 'content': 'The patch ensures that all <base> tags are removed whenever the <head> tag is removed during HTML cleaning, preventing attackers from injecting malicious <base> tags that hijack relative URLs.'}, {'type': 'list_item', 'content': 'Upgrade lxml_html_clean to version 0.4.4 or newer.'}, {'type': 'list_item', 'content': "If upgrading is not immediately possible, configure the Cleaner to explicitly remove <base> tags by adding 'base' to the remove_tags or kill_tags sets when removing <head>."}, {'type': 'list_item', 'content': 'Review and sanitize any HTML inputs or outputs to ensure no malicious <base> tags are present.'}] [1, 2]


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart