CVE-2026-54293

Modified Modified - Updated After Analysis

Path Traversal in NLTK via URL Scheme

Publication date: 2026-06-22

Last updated on: 2026-07-21

Assigner: GitHub, Inc.

Description

NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Prior to 3.10.0-rc1, nltk.data.load() in NLTK is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The unsafe-path regex check is performed before url2pathname() decodes the %xx sequences (a classic decode-after-check / TOCTOU-style flaw), allowing an attacker to bypass the protection documented in NLTK's SECURITY.md and read arbitrary files from the filesystem. While literal traversal strings such as ../../../etc/passwd are correctly blocked, encoded variants such as %2fetc%2fpasswd, %2e%2e%2f..., and ..%2f..%2f slip past the regex and are subsequently decoded into a real filesystem path. This vulnerability is fixed in 3.10.0-rc1.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-06-22

Last Modified

2026-07-21

Generated

2026-08-02

AI Q&A

2026-06-22

EPSS Evaluated

2026-07-31

NVD

CVE-2026-54293

EUVD

EUVD-2026-38333

Affected Vendors & Products

Vendor	Product	Version / Range
nltk	nltk	to 3.10.0 (exc)

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-22	The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.

Attack-Flow Graph

Executive Summary

CVE-2026-54293 is a path traversal vulnerability in the Natural Language Toolkit (NLTK) library, specifically in the nltk.data.load() function.

The vulnerability arises because the function performs unsafe-path regex checks on encoded URL paths before decoding them. This means that while literal traversal strings like ../../../etc/passwd are blocked, encoded variants such as %2fetc%2fpasswd or ..%2f..%2f can bypass these checks.

After bypassing the regex check, these encoded sequences are decoded into actual filesystem paths, allowing an attacker to read arbitrary files on the system.

This is a classic decode-after-check (TOCTOU-style) flaw where the security check is done before decoding, enabling attackers to circumvent protections.

The vulnerability affects NLTK versions prior to 3.10.0-rc1 and has been fixed in that release.

Detection Guidance

This vulnerability involves the `nltk.data.load()` function processing URL-encoded path traversal payloads that bypass regex checks before decoding. Detection involves monitoring or scanning for usage of `nltk.data.load()` with suspicious encoded inputs such as `%2fetc%2fpasswd` or `..%2f..%2fetc%2fpasswd`.

Since this is a local file read vulnerability triggered by specific encoded inputs, detection can be performed by searching logs or application inputs for suspicious URL-encoded traversal patterns targeting NLTK.

Use grep or similar tools to search code or logs for calls to `nltk.data.load()` with encoded traversal strings, e.g.:
grep -r "nltk.data.load(" /path/to/your/codebase
grep -r "%2fetc%2fpasswd" /var/log/your_application_logs
Monitor network or application logs for suspicious URL-encoded inputs resembling path traversal payloads such as `%2f`, `%2e%2e%2f`, or `..%2f`.

Automated scanning tools or custom scripts can be developed to detect these encoded traversal patterns in inputs reaching the NLTK library.

Impact Analysis

This vulnerability allows attackers to read arbitrary files from the filesystem by exploiting encoded path traversal sequences.

Exposure of sensitive system files such as /etc/passwd.
Access to environment variables that may contain credentials.
Disclosure of application source code.
Leakage of cloud secrets or other sensitive configuration files.

The vulnerability is especially critical in environments like web applications, hosted notebook services, and multi-tenant systems where untrusted input might reach nltk.data.load().

The CVSS score of 7.5 (High) reflects the significant confidentiality impact without requiring privileges or user interaction.

Compliance Impact

The vulnerability in NLTK allows attackers to read arbitrary files from the filesystem, including sensitive files such as environment variables containing credentials, application source code, and cloud secrets.

Exposure of such sensitive information could lead to violations of data protection regulations like GDPR and HIPAA, which mandate the protection of personal and sensitive data.

In environments such as web applications, hosted notebook services, and multi-tenant systems where untrusted input may reach the vulnerable function, this flaw increases the risk of unauthorized data disclosure, potentially impacting compliance with these standards.

Mitigation Strategies

The primary mitigation is to upgrade NLTK to version 3.10.0-rc1 or later, where this vulnerability has been fixed.

The fix involves decoding resource names before applying safety checks to prevent bypass of path traversal protections.

Upgrade NLTK to version 3.10.0-rc1 or newer.
Audit and sanitize any untrusted input that may reach `nltk.data.load()` to ensure it does not contain encoded traversal sequences.
If upgrading immediately is not possible, implement input validation to reject URL-encoded path traversal patterns before passing inputs to NLTK.

These steps reduce the risk of arbitrary local file reads via encoded path traversal payloads.

Hi! I’m here to help you understand CVE-2026-54293. Ask me anything about the vulnerability, its impact, or mitigation strategies.

0/70