CVE-2026-10803
Weak Hash Usage in MLflow Dataset Digest Computation
Publication date: 2026-06-04
Last updated on: 2026-06-04
Assigner: VulDB
Description
Description
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
| Vendor | Product | Version / Range |
|---|---|---|
| mlflow | mlflow | to 3.10.0 (inc) |
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-328 | The product uses an algorithm that produces a digest (output value) that does not meet security expectations for a hash function that allows an adversary to reasonably determine the original input (preimage attack), find another input that can produce the same hash (2nd preimage attack), or find multiple inputs that evaluate to the same hash (birthday attack). |
| CWE-327 | The product uses a broken or risky cryptographic algorithm or protocol. |
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?
The vulnerability in MLflow up to version 3.10.0 affects the dataset digest computation function, which uses a weak hashing method. Specifically, the digest computation uses deterministic sampling of only the first 10,000 rows and excludes certain column types like datetime, boolean, and categorical from the hash. It also relies on a weak 32-bit MD5 hash, making it susceptible to collisions.
These flaws allow an attacker to manipulate datasets beyond the sampled rows or in excluded column types without changing the digest, enabling predictable collisions. This means adversaries can craft colliding datasets that appear identical to the system, potentially bypassing integrity checks.
How can this vulnerability impact me? :
This vulnerability can lead to serious impacts such as data poisoning attacks where manipulated datasets go undetected, compromising the integrity of machine learning workflows.
- Adversaries can modify data beyond the first 10,000 rows or in excluded column types without detection.
- It can cause reproducibility failures in research or production environments relying on MLflow.
- The weak hash increases the likelihood of collisions, allowing attackers to craft malicious datasets that appear legitimate.
Overall, this undermines trust in dataset integrity and can lead to compromised AI model performance or incorrect outcomes.
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:
The vulnerability poses compliance risks under regulations such as GDPR and HIPAA because it allows undetected manipulation of datasets, which can lead to inaccurate or tampered data being used in AI models.
Such data integrity failures can violate requirements for data accuracy, auditability, and security mandated by these standards, potentially resulting in legal and regulatory consequences.
How can this vulnerability be detected on my network or system? Can you suggest some commands?
This vulnerability involves the use of a weak hash in MLflow's dataset digest computation, which allows adversaries to craft colliding datasets undetected. Detection involves verifying the MLflow version in use and checking if the dataset digest computation uses the vulnerable hashing method (MD5[:8]) and deterministic sampling (head-only sampling).
Since the vulnerability is local and related to MLflow's internal dataset digest computation, network detection is limited. Instead, detection should focus on identifying MLflow instances running versions up to 3.10.0 and inspecting the digest computation method in the mlflow/data/digest_utils.py file.
Suggested commands to detect vulnerable MLflow versions or configurations include:
- Check MLflow version installed: `mlflow --version` or `pip show mlflow`
- Search for usage of weak hash functions in the MLflow source code: `grep -r 'md5' /path/to/mlflow`
- Inspect the digest_utils.py file for sampling method (e.g., head-only sampling): `head -40 /path/to/mlflow/mlflow/data/digest_utils.py`
- Monitor logs or audit dataset digest computations for unexpected collisions or anomalies.
What immediate steps should I take to mitigate this vulnerability?
The primary mitigation is to update MLflow to a patched version that includes the fix for CVE-2026-10803. The fix replaces the weak MD5 hash with a stronger SHA-256 based hash, changes sampling from head-only to head+tail sampling, and includes all column types in the digest computation.
If an immediate upgrade is not possible, consider the following steps:
- Restrict access to MLflow instances to trusted users only, as the attack requires local access.
- Audit datasets for suspicious modifications beyond the first 10,000 rows or in excluded column types (datetime, boolean, categorical).
- Implement additional data integrity checks outside MLflow's digest computation to detect tampering.
- Monitor MLflow repositories and update promptly once the patch release including the fix is available.