CVE-2026-33929
Path Traversal in Apache PDFBox ExtractEmbeddedFiles Example
Publication date: 2026-04-14
Last updated on: 2026-04-20
Assigner: Apache Software Foundation
Description
Description
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
| Vendor | Product | Version / Range |
|---|---|---|
| apache | pdfbox | From 2.0.24 (inc) to 2.0.37 (exc) |
| apache | pdfbox | From 3.0.0 (inc) to 3.0.8 (exc) |
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-22 | The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory. |
Attack-Flow Graph
AI Powered Q&A
How can this vulnerability impact me? :
This vulnerability can allow an attacker to write or overwrite files outside the intended directory on a system where the vulnerable ExtractEmbeddedFiles example is used, potentially leading to unauthorized file modifications.
If an attacker crafts a malicious PDF, they could exploit this flaw to place files in arbitrary locations within the file system where the user has write permissions, which could lead to data corruption, privilege escalation, or other malicious activities depending on the files written.
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:
The vulnerability allows a malicious PDF to write files outside the intended restricted directory, potentially leading to unauthorized file writes or overwrites.
Such unauthorized file operations could result in exposure or modification of sensitive data, which may impact compliance with data protection standards and regulations like GDPR or HIPAA that require strict control over data access and integrity.
Therefore, if the vulnerable ExtractEmbeddedFiles example is used in production without the fix, it could pose risks to maintaining compliance with these regulations by enabling potential data breaches or unauthorized data manipulation.
Can you explain this vulnerability to me?
This vulnerability is a path traversal issue in the ExtractEmbeddedFiles example of Apache PDFBox. It allows a malicious PDF to cause files to be written outside the intended restricted directory by exploiting insufficient checks on file paths.
Specifically, the original code did not properly verify that extracted files stayed within the target directory, allowing an attacker to write files to paths starting with the target directory path but actually outside it, such as "/home/ABCDEF" when the intended directory was "/home/ABC".
The fix improves the directory boundary check by using canonical paths and ensuring that the extracted file's parent directory is either exactly the target directory or a subdirectory within it, preventing unauthorized file writes outside the intended location.
How can this vulnerability be detected on my network or system? Can you suggest some commands?
This vulnerability is related to a path traversal issue in the ExtractEmbeddedFiles example of Apache PDFBox, which allows malicious PDFs to write files outside the intended directory.
Detection would involve monitoring or analyzing PDF files processed by the ExtractEmbeddedFiles example, especially looking for attempts to extract embedded files to paths outside the intended directory.
Since the vulnerability is in code logic rather than network traffic, direct network commands may not detect it. Instead, you can check your system for suspicious file writes or attempts to write files outside expected directories when processing PDFs.
Suggested commands include:
- Use file system monitoring tools (e.g., inotifywait on Linux) to watch the directory where embedded files are extracted for unexpected file creations outside the intended directory.
- Search logs or audit trails for file write operations triggered by PDFBox ExtractEmbeddedFiles, especially those targeting paths starting with but extending beyond the intended directory (e.g., paths like /home/ABCDEF if /home/ABC is the intended directory).
- Manually review or scan PDF files for embedded file names containing path traversal sequences (e.g., ../) before processing.
What immediate steps should I take to mitigate this vulnerability?
The primary mitigation step is to update Apache PDFBox to version 2.0.37 or 3.0.8 once they are available, as these versions include the fix for this vulnerability.
Until the official update is available, users who have copied the ExtractEmbeddedFiles example into their production code should apply the fix provided in GitHub Pull Request 427.
The fix involves improving the directory boundary check by verifying the canonical path of the target extraction directory to ensure embedded files cannot be extracted outside the intended directory.
Additionally, restrict write permissions on directories used by ExtractEmbeddedFiles to minimize the impact of any exploitation attempts.