CVE-2026-42440
Undergoing Analysis Undergoing Analysis - In Progress
OOM Denial of Service in Apache OpenNLP

Publication date: 2026-05-04

Last updated on: 2026-05-06

Assigner: Apache Software Foundation

Description
OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP AbstractModelReader  Versions Affected:  before 2.5.9 before 3.0.0-M3  Description: The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and getPredicates() each read a 32-bit signed integer count field from a binary model stream and pass that value directly to an array allocation (new String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without validating that the value is non-negative or within a reasonable bound. The count is therefore fully attacker-controlled when the model file originates from an untrusted source. A crafted .bin model file in which any of these count fields is set to Integer.MAX_VALUE (or any value large enough to exhaust the available heap) triggers an OutOfMemoryError at the array allocation itself, before the corresponding label or pattern data is consumed from the stream. The error occurs very early in deserialization: for a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read; so the attacker pays no meaningful size cost to weaponize a payload, and a single small file can crash a JVM that loads it. Any code path that deserializes a .bin model is affected, including direct use of GenericModelReader and any higher-level component that delegates to it during model load. The practical impact is denial of service against processes that load model files from untrusted or semi-trusted origins.   Mitigation: * 2.x users should upgrade to 2.5.9. * 3.x users should upgrade to 3.0.0-M3. Note: The fix introduces an upper bound on each of the three count fields, checked before array allocation; counts that are negative or exceed the bound cause an IllegalArgumentException to be thrown and the read to fail fast with no large allocation. The default bound is 10,000,000, which is well above the entry counts of legitimate OpenNLP models but far below any value that would threaten heap exhaustion. Deployments that legitimately need to load models with more entries than the default can raise the limit at JVM startup by setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer (e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back to the default. Users who cannot upgrade immediately should treat all .bin model files as untrusted input unless their provenance is verified, and should avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-05-04
Last Modified
2026-05-06
Generated
2026-05-07
AI Q&A
2026-05-05
EPSS Evaluated
2026-05-05
NVD
Affected Vendors & Products
Showing 3 associated CPEs
Vendor Product Version / Range
apache opennlp to 2.5.9 (exc)
apache opennlp 3.0.0
apache opennlp 3.0.0
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-789 The product allocates memory based on an untrusted, large size value, but it does not ensure that the size is within expected limits, allowing arbitrary amounts of memory to be allocated.
Attack-Flow Graph
AI Powered Q&A
What immediate steps should I take to mitigate this vulnerability?

To mitigate this vulnerability, users should upgrade Apache OpenNLP to version 2.5.9 if using the 2.x series, or to 3.0.0-M3 if using the 3.x series.

If immediate upgrade is not possible, treat all .bin model files as untrusted input unless their provenance is verified.

Avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.

The fix introduces an upper bound on count fields during array allocation to prevent excessive memory allocation, and this bound can be adjusted via the OPENNLP_MAX_ENTRIES system property if needed.


Can you explain this vulnerability to me?

This vulnerability occurs in Apache OpenNLP's AbstractModelReader component, specifically in the methods getOutcomes(), getOutcomePatterns(), and getPredicates(). These methods read a 32-bit signed integer count from a binary model file and use that value directly to allocate arrays without validating if the count is non-negative or within a reasonable size.

An attacker can craft a malicious .bin model file with extremely large count values (such as Integer.MAX_VALUE) that cause the program to allocate huge arrays, exhausting the Java Virtual Machine's heap memory and triggering an OutOfMemoryError very early during deserialization.

This means that loading a single small malicious model file can crash the JVM process that attempts to load it, leading to a denial of service.


How can this vulnerability impact me? :

The primary impact of this vulnerability is a denial of service (DoS) condition against any process that loads model files from untrusted or semi-trusted sources.

An attacker can supply a crafted model file that causes the application to run out of memory and crash, disrupting service availability.

This affects any code path that deserializes .bin model files, including direct use of GenericModelReader or any higher-level components that load models.


How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The vulnerability causes a denial of service (DoS) via out-of-memory errors when loading untrusted model files, which can disrupt availability of services relying on Apache OpenNLP.

However, there is no information provided about any direct impact on confidentiality, integrity, or data protection that would relate to compliance with standards such as GDPR or HIPAA.

Therefore, based on the provided information, this vulnerability primarily affects service availability but does not explicitly affect compliance with common data protection regulations.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability is triggered by loading a crafted .bin model file with maliciously large count fields in Apache OpenNLP versions before 2.5.9 and before 3.0.0-M3. Detection involves monitoring for OutOfMemoryError exceptions during model loading or inspecting model files for suspiciously large count values.

Since the vulnerability arises from untrusted or malformed .bin model files, detection can include verifying the provenance and integrity of model files before loading them.

There are no specific commands provided in the context to detect this vulnerability on a network or system.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart