CVE-2026-35346

Received Received - Intake

Lossy UTF-8 Conversion in uutils comm Causes Data Corruption

Publication date: 2026-04-22

Last updated on: 2026-04-27

Assigner: Canonical Ltd.

Description

The comm utility in uutils coreutils silently corrupts data by performing lossy UTF-8 conversion on all output lines. The implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD). This behavior differs from GNU comm, which processes raw bytes and preserves the original input. This results in corrupted output when the utility is used to compare binary files or files using non-UTF-8 legacy encodings.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-04-22

Last Modified

2026-04-27

Generated

2026-07-27

AI Q&A

2026-04-22

EPSS Evaluated

2026-07-25

NVD

CVE-2026-35346

EUVD

EUVD-2026-24978

Affected Vendors & Products

Vendor	Product	Version / Range
uutils	coreutils	to 0.6.0 (exc)

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-176	The product does not properly handle when an input contains Unicode encoding.

Attack-Flow Graph

Executive Summary

The vulnerability exists in the comm utility of uutils coreutils, where it silently corrupts data by performing a lossy UTF-8 conversion on all output lines.

This happens because the implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD).

Unlike GNU comm, which processes raw bytes and preserves the original input, uutils coreutils' comm corrupts output when comparing binary files or files with non-UTF-8 legacy encodings.

Detection Guidance

This vulnerability can be detected by testing the behavior of the uutils comm utility when processing files containing non-UTF-8 byte sequences. Specifically, you can create test files with invalid UTF-8 bytes (such as 0xFE and 0xFF) and compare their output using uutils comm versus GNU comm.

If the output from uutils comm replaces invalid UTF-8 bytes with the Unicode replacement character (U+FFFD), it indicates the presence of the vulnerability, as this corrupts the original data silently.

Suggested commands to detect the issue:

Create two test files with non-UTF-8 bytes, for example using a hex editor or echo with printf:
Use GNU comm to compare the files and observe that the output preserves the original bytes.
Use uutils comm to compare the same files and check if invalid UTF-8 bytes are replaced by U+FFFD (shown as � or byte sequence ef bf bd).
Example commands:
printf '\xFE\xFF\n' > file1
printf '\xFE\xFF\n' > file2
comm file1 file2 # GNU comm output preserves bytes
uutils comm file1 file2 # vulnerable output replaces invalid bytes with U+FFFD

Impact Analysis

This vulnerability can lead to silent data corruption when using the comm utility to compare binary files or files encoded in non-UTF-8 legacy formats.

The corrupted output may cause incorrect comparison results, potentially leading to data integrity issues or erroneous processing based on the corrupted output.

Since the corruption is silent, users may not be aware that the data has been altered, which can affect workflows relying on accurate file comparisons.

Compliance Impact

The vulnerability causes silent data corruption when the uutils comm utility processes non-UTF-8 input by replacing invalid byte sequences with the Unicode replacement character. This can lead to data integrity issues when handling arbitrary byte streams.

While the CVE description and resources do not explicitly mention compliance with standards like GDPR or HIPAA, data integrity is a critical aspect of such regulations. Corrupted data output could potentially violate requirements for data accuracy and integrity under these standards.

However, there is no direct information provided about the impact of this vulnerability on compliance with specific regulations.

Mitigation Strategies

To mitigate this vulnerability, you should update the uutils coreutils package to a version that includes the fix for CVE-2026-35346.

The fix involves changes to the comm utility to handle output without lossy UTF-8 conversion, aligning its behavior with GNU comm by writing raw bytes directly to stdout and improving error handling.

If an immediate update is not possible, avoid using the uutils comm utility to compare files containing non-UTF-8 or binary data, and instead use GNU comm or other tools that preserve raw byte output.

Hi! I’m here to help you understand CVE-2026-35346. Ask me anything about the vulnerability, its impact, or mitigation strategies.

0/70