CVE-2026-23907
Path Traversal in Apache PDFBox ExtractEmbeddedFiles Example
Publication date: 2026-03-10
Last updated on: 2026-03-13
Assigner: Apache Software Foundation
Description
Description
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
| Vendor | Product | Version / Range |
|---|---|---|
| apache | pdfbox | From 2.0.24 (inc) to 2.0.35 (inc) |
| apache | pdfbox | From 3.0.0 (inc) to 3.0.7 (inc) |
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-22 | The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory. |
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?
CVE-2026-23907 is a path traversal vulnerability in the ExtractEmbeddedFiles example code of Apache PDFBox. The issue occurs because the filename obtained from PDComplexFileSpecification.getFilename() is directly appended to the extraction path without proper validation.
This allows an attacker to craft filenames that can traverse directories, potentially causing embedded files to be extracted outside the intended directory.
The vulnerability affects Apache PDFBox versions 2.0.24 through 2.0.35 and 3.0.0 through 3.0.6. The example code has been updated to convert both the initial extraction path and the target extraction paths into canonical paths and verify that the extraction path is contained within the initial path, preventing directory traversal.
How can this vulnerability impact me? :
This vulnerability can impact you if you have copied the vulnerable ExtractEmbeddedFiles example code into your production environment without proper validation of extraction paths.
An attacker could exploit this vulnerability by crafting filenames that cause files to be extracted outside the intended directory, potentially overwriting or placing files in unauthorized locations.
This could lead to unauthorized file creation or modification, which might be used to compromise system integrity or facilitate further attacks.
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:
I don't know
How can this vulnerability be detected on my network or system? Can you suggest some commands?
[{'type': 'paragraph', 'content': 'This vulnerability arises from the ExtractEmbeddedFiles example code in Apache PDFBox where filenames obtained from PDComplexFileSpecification.getFilename() are appended to extraction paths without validation, allowing path traversal.'}, {'type': 'paragraph', 'content': 'To detect this vulnerability on your system, you should review your codebase for usage of the ExtractEmbeddedFiles example or similar code that extracts embedded files using PDComplexFileSpecification.getFilename() without validating or canonicalizing the extraction path.'}, {'type': 'paragraph', 'content': 'There are no specific network detection commands provided in the resources. However, you can search your source code for the vulnerable pattern using commands like:'}, {'type': 'list_item', 'content': "grep -r 'PDComplexFileSpecification.getFilename()' /path/to/your/code"}, {'type': 'list_item', 'content': "grep -r 'ExtractEmbeddedFiles' /path/to/your/code"}, {'type': 'paragraph', 'content': 'Additionally, monitor file extraction operations for unexpected directory traversal by checking if files are extracted outside intended directories.'}] [1]
What immediate steps should I take to mitigate this vulnerability?
Immediate mitigation steps include reviewing any use of the ExtractEmbeddedFiles example code in your production environment and ensuring that the extraction path is properly validated.
Specifically, modify your code to convert both the initial extraction path and the target extraction paths into canonical paths and verify that the extraction path is contained within the initial path before extracting files.
If possible, update to a fixed version of Apache PDFBox where this issue has been addressed (versions after 2.0.35 and 3.0.6).
Also, review the updated documentation for guidance on safe extraction path handling.