diff mbox series

image_types.bbclass: make fsck optional

Message ID 585429b3-5763-4c61-97ae-2c73266c887d@elder-tomes.com
State Changes Requested
Headers show
Series image_types.bbclass: make fsck optional | expand

Commit Message

Levi Shafter Dec. 31, 2025, 10:52 p.m. UTC
The fsck in oe_mkext234fs() was added to prevent an extra reboot on the
target:

https://git.openembedded.org/openembedded-core/commit/?id=a93d0059341

This has the side effect of increasing delta between images which
prevents reproducibility. In many cases, the added security provided by
image reproducibility is worth the extra reboot upon first booting the
target. The use of fsck should be included by default, but left
configurable.

[YOCTO #16110]

Signed-off-by: Levi Shafter <levi.shafter@elder-tomes.com>
---
Sponsor: 21SoftWare LLC

 meta/classes-recipe/image_types.bbclass | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

+	fi
 }

 IMAGE_CMD:ext2 = "oe_mkext234fs ext2 ${EXTRA_IMAGECMD}"

Comments

Ross Burton Jan. 9, 2026, 5:03 p.m. UTC | #1
On 31 Dec 2025, at 22:52, Levi Shafter via lists.openembedded.org <levi.shafter=elder-tomes.com@lists.openembedded.org> wrote:
> 
> The fsck in oe_mkext234fs() was added to prevent an extra reboot on the
> target:
> 
> https://git.openembedded.org/openembedded-core/commit/?id=a93d0059341
> 
> This has the side effect of increasing delta between images which
> prevents reproducibility. In many cases, the added security provided by
> image reproducibility is worth the extra reboot upon first booting the
> target. The use of fsck should be included by default, but left
> configurable.
> 
> [YOCTO #16110]

There’s slightly more information in the bug report than here, which links to a video from YPS 2024.12 talking about how fsck will modify timestamps in file systems, so whilst the initial ext4 from mkfs is reproducible, we do a fsck to clear flags and they’re no longer bit-identical.  Is this the only source of non-determinism that you’re observing?

I don’t think adding an option is the right thing here as you’re swapping one problem (non-reproducible ext4) with another (filesystem dirty, needs a reboot).  As the video shows, we should be passing timestamp information at construction time to avoid this problem. Would you be able to work on a patch to do that instead?

Thanks,
Ross
Levi Shafter Jan. 12, 2026, 5:56 p.m. UTC | #2
Hi Ross,

I really appreciate the feedback! I agree that passing timestamp
information is the proper solution here. I'll create a new patch that
does this.

Be well,
- Levi

On 1/9/26 10:03, Ross Burton wrote:
> On 31 Dec 2025, at 22:52, Levi Shafter via lists.openembedded.org <levi.shafter=elder-tomes.com@lists.openembedded.org> wrote:
>>
>> The fsck in oe_mkext234fs() was added to prevent an extra reboot on the
>> target:
>>
>> https://git.openembedded.org/openembedded-core/commit/?id=a93d0059341
>>
>> This has the side effect of increasing delta between images which
>> prevents reproducibility. In many cases, the added security provided by
>> image reproducibility is worth the extra reboot upon first booting the
>> target. The use of fsck should be included by default, but left
>> configurable.
>>
>> [YOCTO #16110]
> 
> There’s slightly more information in the bug report than here, which links to a video from YPS 2024.12 talking about how fsck will modify timestamps in file systems, so whilst the initial ext4 from mkfs is reproducible, we do a fsck to clear flags and they’re no longer bit-identical.  Is this the only source of non-determinism that you’re observing?
> 
> I don’t think adding an option is the right thing here as you’re swapping one problem (non-reproducible ext4) with another (filesystem dirty, needs a reboot).  As the video shows, we should be passing timestamp information at construction time to avoid this problem. Would you be able to work on a patch to do that instead?
> 
> Thanks,
> Ross
Levi Shafter Jan. 12, 2026, 9:06 p.m. UTC | #3
Hi again Ross,

After doing some more testing after eliminating image delta caused by
differing timestamps, etc. (detailed in the bug report) as well as
reviewing the YPS 2024.12 video, it seems there is further
non-determinism caused by the fsck command.

While it may be possible to eliminate the delta caused by running the
fsck command while also being able to keep it, to my knowledge, nobody
has been able to document this thus far. It could be a time-consuming
process. While the video you mentioned suggests completely removing the
use of fsck, I would argue this patch represents an improvement which
retains the default for those who have a use case which favors avoiding
a reboot on the target while also providing an option for those who
would prioritize reproducibility.

Let me know if you might have some ideas for retaining reproducible
builds while keeping the fsck. I'm unsure why additional delta is
observed beyond timestamps data, and I'd love to investigate further,
but I'm not sure it's something I can dive deeper into at this time.



On 1/9/26 10:03, Ross Burton wrote:
> On 31 Dec 2025, at 22:52, Levi Shafter via lists.openembedded.org <levi.shafter=elder-tomes.com@lists.openembedded.org> wrote:
>>
>> The fsck in oe_mkext234fs() was added to prevent an extra reboot on the
>> target:
>>
>> https://git.openembedded.org/openembedded-core/commit/?id=a93d0059341
>>
>> This has the side effect of increasing delta between images which
>> prevents reproducibility. In many cases, the added security provided by
>> image reproducibility is worth the extra reboot upon first booting the
>> target. The use of fsck should be included by default, but left
>> configurable.
>>
>> [YOCTO #16110]
> 
> There’s slightly more information in the bug report than here, which links to a video from YPS 2024.12 talking about how fsck will modify timestamps in file systems, so whilst the initial ext4 from mkfs is reproducible, we do a fsck to clear flags and they’re no longer bit-identical.  Is this the only source of non-determinism that you’re observing?
> 
> I don’t think adding an option is the right thing here as you’re swapping one problem (non-reproducible ext4) with another (filesystem dirty, needs a reboot).  As the video shows, we should be passing timestamp information at construction time to avoid this problem. Would you be able to work on a patch to do that instead?
> 
> Thanks,
> Ross
diff mbox series

Patch

diff --git a/meta/classes-recipe/image_types.bbclass
b/meta/classes-recipe/image_types.bbclass
index e6ef0ce11e..63dc504f8c 100644
--- a/meta/classes-recipe/image_types.bbclass
+++ b/meta/classes-recipe/image_types.bbclass
@@ -92,8 +92,14 @@  oe_mkext234fs () {
 	bbdebug 1 "Actual Partition size: `stat -c '%s'
${IMGDEPLOYDIR}/${IMAGE_NAME}.$fstype`"
 	bbdebug 1 Executing "mkfs.$fstype -F $extra_imagecmd
${IMGDEPLOYDIR}/${IMAGE_NAME}.$fstype -d ${IMAGE_ROOTFS}"
 	mkfs.$fstype -F $extra_imagecmd ${IMGDEPLOYDIR}/${IMAGE_NAME}.$fstype
-d ${IMAGE_ROOTFS}
-	# Error codes 0-3 indicate successfull operation of fsck (no errors or
errors corrected)
-	fsck.$fstype -pvfD ${IMGDEPLOYDIR}/${IMAGE_NAME}.$fstype || [ $? -le 3 ]
+
+	if [ '${RUN_FSCK}' = "0" ]; then
+		bbdebug 1 "Skipping fsck for reduced image delta"
+	else
+		bbdebug 1 "Running fsck on image"
+		# Error codes 0-3 indicate successful operation of fsck (no errors or
errors corrected)
+		fsck.$fstype -pvfD ${IMGDEPLOYDIR}/${IMAGE_NAME}.$fstype || [ $? -le 3 ]