CVE-2026-40682

Modified Modified - Updated After Analysis

XML External Entity Injection in Apache OpenNLP

Publication date: 2026-05-04

Last updated on: 2026-07-15

Assigner: Apache Software Foundation

Description

XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor Versions Affected: before 2.5.9, before 3.0.0-M3 Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the only feature set on the XMLReader is namespace support — external entity resolution and DOCTYPE declarations remain fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or server-side request forgery via http:// entity references during SAX parsing, before the application processes a single dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading user-supplied dictionaries, making untrusted input a realistic scenario. Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-05-04

Last Modified

2026-07-15

Generated

2026-07-26

AI Q&A

2026-05-05

EPSS Evaluated

2026-07-25

NVD

CVE-2026-40682

Affected Vendors & Products

Vendor	Product	Version / Range
apache	opennlp	to 2.5.9 (exc)
apache	opennlp	3.0.0
apache	opennlp	3.0.0

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-611	The product processes an XML document that can contain XML entities with URIs that resolve to documents outside of the intended sphere of control, causing the product to embed incorrect documents into its output.

Attack-Flow Graph

Executive Summary

This vulnerability is an XML External Entity (XXE) attack in the Apache OpenNLP DictionaryEntryPersistor class. The class initializes an XML parser without enabling secure processing features or disabling DTD processing. As a result, when a dictionary file containing a malicious DOCTYPE declaration is parsed, an attacker can exploit this to perform local file disclosure or server-side request forgery by referencing external entities during XML parsing.

Specifically, the vulnerability arises because the XMLReader used only has namespace support enabled, but external entity resolution and DOCTYPE declarations remain enabled, allowing crafted dictionary files to trigger these attacks before any dictionary entries are processed.

Detection Guidance

This vulnerability can be detected by inspecting dictionary files used by Apache OpenNLP for the presence of malicious DOCTYPE declarations or external entity references, which are indicators of XML External Entity (XXE) attacks.

Since the vulnerability arises from processing crafted XML dictionary files, you can detect potential exploitation attempts by scanning these files for DOCTYPE declarations or external entity references before they are parsed.

Use grep or similar tools to search for DOCTYPE declarations in dictionary files, for example: grep -i '<!DOCTYPE' /path/to/dictionaries/*
Check for external entity references such as file:// or http:// in XML files: grep -E 'file://|http://' /path/to/dictionaries/*

Additionally, monitoring network traffic for unexpected outbound HTTP requests originating from the application during dictionary parsing may help detect server-side request forgery attempts triggered by this vulnerability.

No specific commands for runtime detection or network scanning are provided in the available resources.

Impact Analysis

This vulnerability can lead to serious security impacts including local file disclosure, where sensitive files on the server can be read by an attacker, and server-side request forgery (SSRF), where the server can be tricked into making unauthorized HTTP requests to internal or external systems.

These impacts can compromise confidentiality and potentially allow attackers to gather sensitive information or perform unauthorized actions within the affected system.

Compliance Impact

The vulnerability allows an attacker to supply a crafted dictionary file containing a malicious DOCTYPE declaration, which can lead to local file disclosure or server-side request forgery during XML parsing. This exposure of sensitive files or unauthorized requests could potentially lead to unauthorized access to personal or protected data.

Such unauthorized data disclosure or access could impact compliance with data protection regulations like GDPR or HIPAA, which require safeguarding personal and sensitive information against unauthorized access or breaches.

Mitigations include upgrading to fixed versions or validating input to reject XML containing DOCTYPE declarations, which helps maintain compliance by preventing exploitation.

Mitigation Strategies

To mitigate this vulnerability, users should upgrade Apache OpenNLP to version 2.5.9 if using the 2.x branch, or to 3.0.0-M3 if using the 3.x branch.

If immediate upgrade is not possible, ensure that all dictionary files are sourced from trusted origins.

Additionally, consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.

Hi! I’m here to help you understand CVE-2026-40682. Ask me anything about the vulnerability, its impact, or mitigation strategies.

0/70