CVE-2026-35346
Received Received - Intake
Lossy UTF-8 Conversion in uutils comm Causes Data Corruption

Publication date: 2026-04-22

Last updated on: 2026-04-27

Assigner: Canonical Ltd.

Description
The comm utility in uutils coreutils silently corrupts data by performing lossy UTF-8 conversion on all output lines. The implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD). This behavior differs from GNU comm, which processes raw bytes and preserves the original input. This results in corrupted output when the utility is used to compare binary files or files using non-UTF-8 legacy encodings.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-04-22
Last Modified
2026-04-27
Generated
2026-05-07
AI Q&A
2026-04-22
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
uutils coreutils to 0.6.0 (exc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-176 The product does not properly handle when an input contains Unicode encoding.
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

The vulnerability exists in the comm utility of uutils coreutils, where it silently corrupts data by performing a lossy UTF-8 conversion on all output lines.

This happens because the implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD).

Unlike GNU comm, which processes raw bytes and preserves the original input, uutils coreutils' comm corrupts output when comparing binary files or files with non-UTF-8 legacy encodings.


How can this vulnerability impact me? :

This vulnerability can lead to silent data corruption when using the comm utility to compare binary files or files encoded in non-UTF-8 legacy formats.

The corrupted output may cause incorrect comparison results, potentially leading to data integrity issues or erroneous processing based on the corrupted output.

Since the corruption is silent, users may not be aware that the data has been altered, which can affect workflows relying on accurate file comparisons.


How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The vulnerability causes silent data corruption when the uutils comm utility processes non-UTF-8 input by replacing invalid byte sequences with the Unicode replacement character. This can lead to data integrity issues when handling arbitrary byte streams.

While the CVE description and resources do not explicitly mention compliance with standards like GDPR or HIPAA, data integrity is a critical aspect of such regulations. Corrupted data output could potentially violate requirements for data accuracy and integrity under these standards.

However, there is no direct information provided about the impact of this vulnerability on compliance with specific regulations.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability can be detected by testing the behavior of the uutils comm utility when processing files containing non-UTF-8 byte sequences. Specifically, you can create test files with invalid UTF-8 bytes (such as 0xFE and 0xFF) and compare their output using uutils comm versus GNU comm.

If the output from uutils comm replaces invalid UTF-8 bytes with the Unicode replacement character (U+FFFD), it indicates the presence of the vulnerability, as this corrupts the original data silently.

Suggested commands to detect the issue:

  • Create two test files with non-UTF-8 bytes, for example using a hex editor or echo with printf:
  • Use GNU comm to compare the files and observe that the output preserves the original bytes.
  • Use uutils comm to compare the same files and check if invalid UTF-8 bytes are replaced by U+FFFD (shown as οΏ½ or byte sequence ef bf bd).
  • Example commands:
  • printf '\xFE\xFF\n' > file1
  • printf '\xFE\xFF\n' > file2
  • comm file1 file2 # GNU comm output preserves bytes
  • uutils comm file1 file2 # vulnerable output replaces invalid bytes with U+FFFD

What immediate steps should I take to mitigate this vulnerability?

To mitigate this vulnerability, you should update the uutils coreutils package to a version that includes the fix for CVE-2026-35346.

The fix involves changes to the comm utility to handle output without lossy UTF-8 conversion, aligning its behavior with GNU comm by writing raw bytes directly to stdout and improving error handling.

If an immediate update is not possible, avoid using the uutils comm utility to compare files containing non-UTF-8 or binary data, and instead use GNU comm or other tools that preserve raw byte output.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart