@@ -201,6 +201,13 @@ class FilemapSeek(_FilemapBase):
_FilemapBase.__init__(self, image, log)
self._log.debug("FilemapSeek: initializing")
+ # Open a separate file handle for SEEK_DATA/SEEK_HOLE probes so
+ # that the lseek() calls do not disturb the BufferedReader state
+ # of self._f_image, which sparse_copy() uses for data reading.
+ # Sharing a single fd between os.lseek() and buffered read()
+ # causes intermittent data corruption on some kernels.
+ self._f_seek = open(self._image_path, 'rb')
+
self._probe_seek_hole()
def _probe_seek_hole(self):
@@ -244,7 +251,7 @@ class FilemapSeek(_FilemapBase):
def block_is_mapped(self, block):
"""Refer the '_FilemapBase' class for the documentation."""
- offs = _lseek(self._f_image, block * self.block_size, _SEEK_DATA)
+ offs = _lseek(self._f_seek, block * self.block_size, _SEEK_DATA)
if offs == -1:
result = False
else:
@@ -265,11 +272,11 @@ class FilemapSeek(_FilemapBase):
limit = end + count * self.block_size
while True:
- start = _lseek(self._f_image, end, whence1)
+ start = _lseek(self._f_seek, end, whence1)
if start == -1 or start >= limit or start == self.image_size:
break
- end = _lseek(self._f_image, start, whence2)
+ end = _lseek(self._f_seek, start, whence2)
if end == -1 or end == self.image_size:
end = self.blocks_cnt * self.block_size
if end > limit:
While working on splitting-out wic from oe-core, on my openSUSE Leap 16.0 machine, the moment I split wic out, 2 oe-selftests always failed with 100% reproducibility: - wic.ModifyTests.test_wic_cp_ext - wic.Wic2.test_expand_mbr_image In both cases the symptom is the same: the filesystem has inode tables that are completely zeroed out. Both issues are linked together to the same underlying fault. FilemapSeek._get_ranges() is a generator. Due to the nature of finding each hole/data extent one at a time using the lseek() system call, it calls os.lseek() on a raw file descriptor, then yields, then the caller, sparse_copy(), calls file.seek() + file.read() on a Python BufferedReader wrapping that same fd — then the generator resumes and calls os.lseek() again. This interleaving of raw os.lseek() and buffered I/O on the same fd is undefined behaviour from Python's perspective. The BufferedReader tracks its own idea of the fd's position and buffer contents; os.lseek() changes the position behind its back. This can corrupt its internal state and cause read() to return stale/zero data. This code, however, has existed in wic since it was written, so why was it not noticed before? It turns out this bug was being masked by a number of implementation details that changed, especially when wic was split out for oe-core. These changes conspired together to cause the bug to be triggered. One of the root causes of this bug is that Python 3.14 increased the default buffer size from 8KB to 128KB[1]. With 8 KB buffers, read()s either go through the direct-read path leaving the buffer empty, or if it fills in 8KB chunks the buffer is fully drained. Either way, with a small buffer, read()s do a real raw seek. No fast path. No corruption. With a 128KB buffer, however, a much larger window exists where BufferedReader.seek() can take the fast-path after the raw file descriptor has already been repositioned by os.lseek() in the generator. With the smaller buffer, this window was too narrow to hit in practice. This is fixed by opening a second file object in FilemapSeek.__init__() dedicated to SEEK_DATA/SEEK_HOLE probes, leaving the data-reading handle (self._f_image) untouched. This explains why the corruption is deterministic and tied to specific block boundaries, why it only manifests with the split-out version using Python 3.14 (on systems that are using Python versions less than 3.14 on the host), and why using a separate file descriptor for reading bypasses the issue entirely. This is not an intermittent bug. For a more detailed explanation including log files, in-depth analysis, and a standalone Python reproducer, please see the linked bugzilla entry. Fixes: [YOCTO #16197] [1] https://github.com/python/cpython/commit/b1b4f9625c5f2a6b2c32bc5ee91c9fef3894b5e6 b1b4f9625c5f ("gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k (GH-118144)") AI-Generated: codex/claude-opus-4.6 (xhigh) Signed-off-by: Trevor Woerner <twoerner@gmail.com> --- scripts/lib/wic/filemap.py | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)