CVE-2026-42440

Modified Modified - Updated After Analysis

OOM Denial of Service in Apache OpenNLP

Publication date: 2026-05-04

Last updated on: 2026-07-15

Assigner: Apache Software Foundation

Description

OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP AbstractModelReader Versions Affected: before 1.9.5 before 2.5.9 before 3.0.0-M3 Description: The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and getPredicates() each read a 32-bit signed integer count field from a binary model stream and pass that value directly to an array allocation (new String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without validating that the value is non-negative or within a reasonable bound. The count is therefore fully attacker-controlled when the model file originates from an untrusted source. A crafted .bin model file in which any of these count fields is set to Integer.MAX_VALUE (or any value large enough to exhaust the available heap) triggers an OutOfMemoryError at the array allocation itself, before the corresponding label or pattern data is consumed from the stream. The error occurs very early in deserialization: for a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read; so the attacker pays no meaningful size cost to weaponize a payload, and a single small file can crash a JVM that loads it. Any code path that deserializes a .bin model is affected, including direct use of GenericModelReader and any higher-level component that delegates to it during model load. The practical impact is denial of service against processes that load model files from untrusted or semi-trusted origins. Mitigation: * 2.x users should upgrade to 2.5.9. * 3.x users should upgrade to 3.0.0-M3. Note: The fix introduces an upper bound on each of the three count fields, checked before array allocation; counts that are negative or exceed the bound cause an IllegalArgumentException to be thrown and the read to fail fast with no large allocation. The default bound is 10,000,000, which is well above the entry counts of legitimate OpenNLP models but far below any value that would threaten heap exhaustion. Deployments that legitimately need to load models with more entries than the default can raise the limit at JVM startup by setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer (e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back to the default. Users who cannot upgrade immediately should treat all .bin model files as untrusted input unless their provenance is verified, and should avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-05-04

Last Modified

2026-07-15

Generated

2026-07-26

AI Q&A

2026-05-05

EPSS Evaluated

2026-07-25

NVD

CVE-2026-42440

Affected Vendors & Products

Vendor	Product	Version / Range
apache	opennlp	to 2.5.9 (exc)
apache	opennlp	3.0.0
apache	opennlp	3.0.0

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-789	The product allocates memory based on an untrusted, large size value, but it does not ensure that the size is within expected limits, allowing arbitrary amounts of memory to be allocated.
CWE-770	The product allocates a reusable resource or group of resources on behalf of an actor without imposing any intended restrictions on the size or number of resources that can be allocated.

Attack-Flow Graph

Executive Summary

This vulnerability occurs in Apache OpenNLP's AbstractModelReader component, specifically in the methods getOutcomes(), getOutcomePatterns(), and getPredicates(). These methods read a 32-bit signed integer count from a binary model file and use that value directly to allocate arrays without validating if the count is non-negative or within a reasonable size.

An attacker can craft a malicious .bin model file with extremely large count values (such as Integer.MAX_VALUE) that cause the program to allocate huge arrays, exhausting the Java Virtual Machine's heap memory and triggering an OutOfMemoryError very early during deserialization.

This means that loading a single small malicious model file can crash the JVM process that attempts to load it, leading to a denial of service.

Detection Guidance

This vulnerability is triggered by loading a crafted .bin model file with maliciously large count fields in Apache OpenNLP versions before 2.5.9 and before 3.0.0-M3. Detection involves monitoring for OutOfMemoryError exceptions during model loading or inspecting model files for suspiciously large count values.

Since the vulnerability arises from untrusted or malformed .bin model files, detection can include verifying the provenance and integrity of model files before loading them.

There are no specific commands provided in the context to detect this vulnerability on a network or system.

Impact Analysis

The primary impact of this vulnerability is a denial of service (DoS) condition against any process that loads model files from untrusted or semi-trusted sources.

An attacker can supply a crafted model file that causes the application to run out of memory and crash, disrupting service availability.

This affects any code path that deserializes .bin model files, including direct use of GenericModelReader or any higher-level components that load models.

Compliance Impact

The vulnerability causes a denial of service (DoS) via out-of-memory errors when loading untrusted model files, which can disrupt availability of services relying on Apache OpenNLP.

However, there is no information provided about any direct impact on confidentiality, integrity, or data protection that would relate to compliance with standards such as GDPR or HIPAA.

Therefore, based on the provided information, this vulnerability primarily affects service availability but does not explicitly affect compliance with common data protection regulations.

Mitigation Strategies

To mitigate this vulnerability, users should upgrade Apache OpenNLP to version 2.5.9 if using the 2.x series, or to 3.0.0-M3 if using the 3.x series.

If immediate upgrade is not possible, treat all .bin model files as untrusted input unless their provenance is verified.

Avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.

The fix introduces an upper bound on count fields during array allocation to prevent excessive memory allocation, and this bound can be adjusted via the OPENNLP_MAX_ENTRIES system property if needed.

Hi! I’m here to help you understand CVE-2026-42440. Ask me anything about the vulnerability, its impact, or mitigation strategies.

0/70