CVE-2026-10803

Analyzed Analyzed - Analysis Complete

Weak Hash Usage in MLflow Dataset Digest Computation

Publication date: 2026-06-04

Last updated on: 2026-06-04

Assigner: VulDB

Description

A flaw has been found in MLflow up to 3.10.0. This issue affects the function mlflow.data.digest_utils of the file mlflow/data/digest_utils.py of the component Dataset Digest Computation. This manipulation causes use of weak hash. It is possible to launch the attack on the local host. The attack is considered to have high complexity. The exploitability is assessed as difficult. The exploit has been published and may be used. The project was informed of the problem early through a pull request but has not reacted yet.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-06-04

Last Modified

2026-06-04

Generated

2026-07-15

AI Q&A

2026-06-04

EPSS Evaluated

2026-07-13

NVD

CVE-2026-10803

EUVD

EUVD-2026-34245

Affected Vendors & Products

Vendor	Product	Version / Range
lfprojects	mlflow	to 3.10.0 (inc)

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-328	The product uses an algorithm that produces a digest (output value) that does not meet security expectations for a hash function that allows an adversary to reasonably determine the original input (preimage attack), find another input that can produce the same hash (2nd preimage attack), or find multiple inputs that evaluate to the same hash (birthday attack).
CWE-327	The product uses a broken or risky cryptographic algorithm or protocol.

Attack-Flow Graph

Executive Summary

The vulnerability in MLflow up to version 3.10.0 affects the dataset digest computation function, which uses a weak hashing method. Specifically, the digest computation uses deterministic sampling of only the first 10,000 rows and excludes certain column types like datetime, boolean, and categorical from the hash. It also relies on a weak 32-bit MD5 hash, making it susceptible to collisions.

These flaws allow an attacker to manipulate datasets beyond the sampled rows or in excluded column types without changing the digest, enabling predictable collisions. This means adversaries can craft colliding datasets that appear identical to the system, potentially bypassing integrity checks.

Impact Analysis

This vulnerability can lead to serious impacts such as data poisoning attacks where manipulated datasets go undetected, compromising the integrity of machine learning workflows.

Adversaries can modify data beyond the first 10,000 rows or in excluded column types without detection.
It can cause reproducibility failures in research or production environments relying on MLflow.
The weak hash increases the likelihood of collisions, allowing attackers to craft malicious datasets that appear legitimate.

Overall, this undermines trust in dataset integrity and can lead to compromised AI model performance or incorrect outcomes.

Compliance Impact

The vulnerability poses compliance risks under regulations such as GDPR and HIPAA because it allows undetected manipulation of datasets, which can lead to inaccurate or tampered data being used in AI models.

Such data integrity failures can violate requirements for data accuracy, auditability, and security mandated by these standards, potentially resulting in legal and regulatory consequences.

Detection Guidance

This vulnerability involves the use of a weak hash in MLflow's dataset digest computation, which allows adversaries to craft colliding datasets undetected. Detection involves verifying the MLflow version in use and checking if the dataset digest computation uses the vulnerable hashing method (MD5[:8]) and deterministic sampling (head-only sampling).

Since the vulnerability is local and related to MLflow's internal dataset digest computation, network detection is limited. Instead, detection should focus on identifying MLflow instances running versions up to 3.10.0 and inspecting the digest computation method in the mlflow/data/digest_utils.py file.

Suggested commands to detect vulnerable MLflow versions or configurations include:

Check MLflow version installed: `mlflow --version` or `pip show mlflow`
Search for usage of weak hash functions in the MLflow source code: `grep -r 'md5' /path/to/mlflow`
Inspect the digest_utils.py file for sampling method (e.g., head-only sampling): `head -40 /path/to/mlflow/mlflow/data/digest_utils.py`
Monitor logs or audit dataset digest computations for unexpected collisions or anomalies.

Mitigation Strategies

The primary mitigation is to update MLflow to a patched version that includes the fix for CVE-2026-10803. The fix replaces the weak MD5 hash with a stronger SHA-256 based hash, changes sampling from head-only to head+tail sampling, and includes all column types in the digest computation.

If an immediate upgrade is not possible, consider the following steps:

Restrict access to MLflow instances to trusted users only, as the attack requires local access.
Audit datasets for suspicious modifications beyond the first 10,000 rows or in excluded column types (datetime, boolean, categorical).
Implement additional data integrity checks outside MLflow's digest computation to detect tampering.
Monitor MLflow repositories and update promptly once the patch release including the fix is available.

Hi! I’m here to help you understand CVE-2026-10803. Ask me anything about the vulnerability, its impact, or mitigation strategies.

0/70

Weak Hash Usage in MLflow Dataset Digest Computation

Description

CVSS Scores

EPSS Scores

Meta Information

Affected Vendors & Products

Helpful Resources

Exploitability

Attack-Flow Graph

AI Quick Actions

Chat Assistant

EPSS Chart