CVE-2026-54293
Received Received - Intake
Path Traversal in NLTK via URL Scheme

Publication date: 2026-06-22

Last updated on: 2026-06-22

Assigner: GitHub, Inc.

Description
NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Prior to 3.10.0-rc1, nltk.data.load() in NLTK is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The unsafe-path regex check is performed before url2pathname() decodes the %xx sequences (a classic decode-after-check / TOCTOU-style flaw), allowing an attacker to bypass the protection documented in NLTK's SECURITY.md and read arbitrary files from the filesystem. While literal traversal strings such as ../../../etc/passwd are correctly blocked, encoded variants such as %2fetc%2fpasswd, %2e%2e%2f..., and ..%2f..%2f slip past the regex and are subsequently decoded into a real filesystem path. This vulnerability is fixed in 3.10.0-rc1.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-06-22
Last Modified
2026-06-22
Generated
2026-06-23
AI Q&A
2026-06-22
EPSS Evaluated
N/A
NVD
EUVD
Affected Vendors & Products
Showing 2 associated CPEs
Vendor Product Version / Range
nltk nltk to 3.10.0-rc1 (exc)
nltk nltk 3.10.0-rc1
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-22 The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.
Attack-Flow Graph
AI Quick Actions
Instant insights powered by AI
Executive Summary

CVE-2026-54293 is a path traversal vulnerability in the Natural Language Toolkit (NLTK) library, specifically in the nltk.data.load() function.

The vulnerability arises because the function performs unsafe-path regex checks on encoded URL paths before decoding them. This means that while literal traversal strings like ../../../etc/passwd are blocked, encoded variants such as %2fetc%2fpasswd or ..%2f..%2f can bypass these checks.

After bypassing the regex check, these encoded sequences are decoded into actual filesystem paths, allowing an attacker to read arbitrary files on the system.

This is a classic decode-after-check (TOCTOU-style) flaw where the security check is done before decoding, enabling attackers to circumvent protections.

The vulnerability affects NLTK versions prior to 3.10.0-rc1 and has been fixed in that release.

Impact Analysis

This vulnerability allows attackers to read arbitrary files from the filesystem by exploiting encoded path traversal sequences.

  • Exposure of sensitive system files such as /etc/passwd.
  • Access to environment variables that may contain credentials.
  • Disclosure of application source code.
  • Leakage of cloud secrets or other sensitive configuration files.

The vulnerability is especially critical in environments like web applications, hosted notebook services, and multi-tenant systems where untrusted input might reach nltk.data.load().

The CVSS score of 7.5 (High) reflects the significant confidentiality impact without requiring privileges or user interaction.

Detection Guidance

This vulnerability involves the `nltk.data.load()` function processing URL-encoded path traversal payloads that bypass regex checks before decoding. Detection involves monitoring or scanning for usage of `nltk.data.load()` with suspicious encoded inputs such as `%2fetc%2fpasswd` or `..%2f..%2fetc%2fpasswd`.

Since this is a local file read vulnerability triggered by specific encoded inputs, detection can be performed by searching logs or application inputs for suspicious URL-encoded traversal patterns targeting NLTK.

  • Use grep or similar tools to search code or logs for calls to `nltk.data.load()` with encoded traversal strings, e.g.:
  • grep -r "nltk.data.load(" /path/to/your/codebase
  • grep -r "%2fetc%2fpasswd" /var/log/your_application_logs
  • Monitor network or application logs for suspicious URL-encoded inputs resembling path traversal payloads such as `%2f`, `%2e%2e%2f`, or `..%2f`.

Automated scanning tools or custom scripts can be developed to detect these encoded traversal patterns in inputs reaching the NLTK library.

Mitigation Strategies

The primary mitigation is to upgrade NLTK to version 3.10.0-rc1 or later, where this vulnerability has been fixed.

The fix involves decoding resource names before applying safety checks to prevent bypass of path traversal protections.

  • Upgrade NLTK to version 3.10.0-rc1 or newer.
  • Audit and sanitize any untrusted input that may reach `nltk.data.load()` to ensure it does not contain encoded traversal sequences.
  • If upgrading immediately is not possible, implement input validation to reject URL-encoded path traversal patterns before passing inputs to NLTK.

These steps reduce the risk of arbitrary local file reads via encoded path traversal payloads.

Chat Assistant
Ask questions about this CVE
Hi! I’m here to help you understand CVE-2026-54293. Ask me anything about the vulnerability, its impact, or mitigation strategies.
0/70
EPSS Chart