CVE-2026-40682
Undergoing Analysis Undergoing Analysis - In Progress
XML External Entity Injection in Apache OpenNLP

Publication date: 2026-05-04

Last updated on: 2026-05-06

Assigner: Apache Software Foundation

Description
XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor Versions Affected: before 2.5.9, before 3.0.0-M3 Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the only feature set on the XMLReader is namespace support β€” external entity resolution and DOCTYPE declarations remain fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or server-side request forgery via http:// entity references during SAX parsing, before the application processes a single dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading user-supplied dictionaries, making untrusted input a realistic scenario. Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-05-04
Last Modified
2026-05-06
Generated
2026-05-27
AI Q&A
2026-05-05
EPSS Evaluated
2026-05-25
NVD
Affected Vendors & Products
Showing 3 associated CPEs
Vendor Product Version / Range
apache opennlp to 2.5.9 (exc)
apache opennlp 3.0.0
apache opennlp 3.0.0
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-611 The product processes an XML document that can contain XML entities with URIs that resolve to documents outside of the intended sphere of control, causing the product to embed incorrect documents into its output.
Attack-Flow Graph
AI Powered Q&A
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The vulnerability allows an attacker to supply a crafted dictionary file containing a malicious DOCTYPE declaration, which can lead to local file disclosure or server-side request forgery during XML parsing. This exposure of sensitive files or unauthorized requests could potentially lead to unauthorized access to personal or protected data.

Such unauthorized data disclosure or access could impact compliance with data protection regulations like GDPR or HIPAA, which require safeguarding personal and sensitive information against unauthorized access or breaches.

Mitigations include upgrading to fixed versions or validating input to reject XML containing DOCTYPE declarations, which helps maintain compliance by preventing exploitation.


Can you explain this vulnerability to me?

This vulnerability is an XML External Entity (XXE) attack in the Apache OpenNLP DictionaryEntryPersistor class. The class initializes an XML parser without enabling secure processing features or disabling DTD processing. As a result, when a dictionary file containing a malicious DOCTYPE declaration is parsed, an attacker can exploit this to perform local file disclosure or server-side request forgery by referencing external entities during XML parsing.

Specifically, the vulnerability arises because the XMLReader used only has namespace support enabled, but external entity resolution and DOCTYPE declarations remain enabled, allowing crafted dictionary files to trigger these attacks before any dictionary entries are processed.


How can this vulnerability impact me? :

This vulnerability can lead to serious security impacts including local file disclosure, where sensitive files on the server can be read by an attacker, and server-side request forgery (SSRF), where the server can be tricked into making unauthorized HTTP requests to internal or external systems.

These impacts can compromise confidentiality and potentially allow attackers to gather sensitive information or perform unauthorized actions within the affected system.


What immediate steps should I take to mitigate this vulnerability?

To mitigate this vulnerability, users should upgrade Apache OpenNLP to version 2.5.9 if using the 2.x branch, or to 3.0.0-M3 if using the 3.x branch.

If immediate upgrade is not possible, ensure that all dictionary files are sourced from trusted origins.

Additionally, consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability can be detected by inspecting dictionary files used by Apache OpenNLP for the presence of malicious DOCTYPE declarations or external entity references, which are indicators of XML External Entity (XXE) attacks.

Since the vulnerability arises from processing crafted XML dictionary files, you can detect potential exploitation attempts by scanning these files for DOCTYPE declarations or external entity references before they are parsed.

  • Use grep or similar tools to search for DOCTYPE declarations in dictionary files, for example: grep -i '<!DOCTYPE' /path/to/dictionaries/*
  • Check for external entity references such as file:// or http:// in XML files: grep -E 'file://|http://' /path/to/dictionaries/*

Additionally, monitoring network traffic for unexpected outbound HTTP requests originating from the application during dictionary parsing may help detect server-side request forgery attempts triggered by this vulnerability.

No specific commands for runtime detection or network scanning are provided in the available resources.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart