Message ID | 20241024172506.473232-1-ross.burton@arm.com |
---|---|
State | New |
Headers | show |
Series | lib/oe/package: scan B first when populating -src package | expand |
On Thu, 2024-10-24 at 18:25 +0100, Ross Burton via lists.openembedded.org wrote: > There's an interesting reproducibility problem triggered when a recipe > has a file in $S that is manipulated and then written into $B. As a > concrete example, zlib has zconf.h in the source tree which is then > processed by configure to create zconf.h in the build tree. > > When we populate the zlib-src package we have to work from the mapped > source paths in the binaries (of the form /usr/src/BP) and try to > identify where the file came from. To do this we iterate through the > mapping paths looking for files, and using cpio in a loop we end up with > the newest file with the name we're after in the -src package, which > tends to be the right thing to do. > > However, when building zlib from scratch we create the -src package by > hard linking files from the source and build trees. Then as the final > act of do_package the sstate archive is created, and this clamps the > file timestamps to SOURCE_DATE_EPOCH (see fixtimestamp in sstate.bbclass). > > The fundamental problem is that we clamp the mtimes in sstate_package in > a way that changes the source files. There are several paths through the > code: > > 1. Full build from clean > In this case B/zconf.h is freshly generated and has a current timestamp, > and the final zlib-src package in sstate has the correct file. > > 2. Repackage from existing tree > If we need to repackage but there is an existing build tree then we've > previously clamped the timestamp of zconf.h. When iterating the map paths > for files we first copy from $S and then $B, but as the file in $M has > an identical timestamp it isn't used. This results in the wrong file in > the final zlib-src package in sstate. > > 3. Repackage from sstate > Depending on whether the sstate has the correct or incorrect file, the > output will have the same contents. > > The long-term fix to this is to not clamp the actual files but instead > to clamp then when writing the tarballs, but this isn't trivial as you > need to handle the sstate and non-sstate paths separately. What I think > is an acceptable workaround is to reorder the prefix map list and move > ${B} to the front, on the assumption that generated files should be > preferred over non-generated. This results in ${B}/zconf.h having priority > over ${S}/zconf.h when they both have identical timestamps. > > [ YOCTO #15491 ] > > Signed-off-by: Ross Burton <ross.burton@arm.com> > --- > meta/lib/oe/package.py | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py > index c213a9a3ca6..8ec00b96b69 100644 > --- a/meta/lib/oe/package.py > +++ b/meta/lib/oe/package.py > @@ -1010,7 +1010,15 @@ def copydebugsources(debugsrcdir, sources, d): > bb.utils.mkdirhier(basepath) > cpath.updatecache(basepath) > > - for pmap in prefixmap: > + # Re-order the prefix list so that we sweep ${B} first, as the sstate creation will > + # clamp mtimes of hardlinks in the -src package, and that will change the source. > + # (see #15491) > + b = d.getVar("B") > + prefixmap_keys = list(prefixmap) > + prefixmap_keys.remove(b) > + prefixmap_keys.insert(0, b) > + > + for pmap in prefixmap_keys: > # Ignore files from the recipe sysroots (target and native) > cmd = "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile > # We need to ignore files that are not actually ours Thanks for working on this. You have clearly explained the problem which is great and helps a lot. I am worried that this code is still fragile as some cpios seem to overwrite and some do not. Would adding the -u option and always overwriting not be safer until we properly resolve the sstate issue? ${B} isn't the only directory we might have generated sources in, the compiler in particular probably depends on this ordering and has other source files we'd alter timestamps of? If you're not convinced about the -u option, I'm starting to think we may need to fix this properly even if it is painful :(. Cheers, Richard
On Thu, 2024-10-24 at 18:25 +0100, Ross Burton via lists.openembedded.org wrote: > There's an interesting reproducibility problem triggered when a recipe > has a file in $S that is manipulated and then written into $B. As a > concrete example, zlib has zconf.h in the source tree which is then > processed by configure to create zconf.h in the build tree. > > When we populate the zlib-src package we have to work from the mapped > source paths in the binaries (of the form /usr/src/BP) and try to > identify where the file came from. To do this we iterate through the > mapping paths looking for files, and using cpio in a loop we end up with > the newest file with the name we're after in the -src package, which > tends to be the right thing to do. > > However, when building zlib from scratch we create the -src package by > hard linking files from the source and build trees. Then as the final > act of do_package the sstate archive is created, and this clamps the > file timestamps to SOURCE_DATE_EPOCH (see fixtimestamp in sstate.bbclass). > > The fundamental problem is that we clamp the mtimes in sstate_package in > a way that changes the source files. There are several paths through the > code: > > 1. Full build from clean > In this case B/zconf.h is freshly generated and has a current timestamp, > and the final zlib-src package in sstate has the correct file. > > 2. Repackage from existing tree > If we need to repackage but there is an existing build tree then we've > previously clamped the timestamp of zconf.h. When iterating the map paths > for files we first copy from $S and then $B, but as the file in $M has > an identical timestamp it isn't used. This results in the wrong file in > the final zlib-src package in sstate. > > 3. Repackage from sstate > Depending on whether the sstate has the correct or incorrect file, the > output will have the same contents. > > The long-term fix to this is to not clamp the actual files but instead > to clamp then when writing the tarballs, but this isn't trivial as you > need to handle the sstate and non-sstate paths separately. What I think > is an acceptable workaround is to reorder the prefix map list and move > ${B} to the front, on the assumption that generated files should be > preferred over non-generated. This results in ${B}/zconf.h having priority > over ${S}/zconf.h when they both have identical timestamps. > > [ YOCTO #15491 ] > > Signed-off-by: Ross Burton <ross.burton@arm.com> > --- > meta/lib/oe/package.py | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py > index c213a9a3ca6..8ec00b96b69 100644 > --- a/meta/lib/oe/package.py > +++ b/meta/lib/oe/package.py > @@ -1010,7 +1010,15 @@ def copydebugsources(debugsrcdir, sources, d): > bb.utils.mkdirhier(basepath) > cpath.updatecache(basepath) > > - for pmap in prefixmap: > + # Re-order the prefix list so that we sweep ${B} first, as the sstate creation will > + # clamp mtimes of hardlinks in the -src package, and that will change the source. > + # (see #15491) > + b = d.getVar("B") > + prefixmap_keys = list(prefixmap) > + prefixmap_keys.remove(b) > + prefixmap_keys.insert(0, b) > + > + for pmap in prefixmap_keys: > # Ignore files from the recipe sysroots (target and native) > cmd = "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile > # We need to ignore files that are not actually ours I've sent out a change for do_package which stops the messing with the timestamps. It breaks reproducibility of the do_package sstate but that is probably the right compromise. See what you think... Does that mean we don't need to this patch (since this feels like a bit of a hack)? Cheers, Richard
diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py index c213a9a3ca6..8ec00b96b69 100644 --- a/meta/lib/oe/package.py +++ b/meta/lib/oe/package.py @@ -1010,7 +1010,15 @@ def copydebugsources(debugsrcdir, sources, d): bb.utils.mkdirhier(basepath) cpath.updatecache(basepath) - for pmap in prefixmap: + # Re-order the prefix list so that we sweep ${B} first, as the sstate creation will + # clamp mtimes of hardlinks in the -src package, and that will change the source. + # (see #15491) + b = d.getVar("B") + prefixmap_keys = list(prefixmap) + prefixmap_keys.remove(b) + prefixmap_keys.insert(0, b) + + for pmap in prefixmap_keys: # Ignore files from the recipe sysroots (target and native) cmd = "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile # We need to ignore files that are not actually ours
There's an interesting reproducibility problem triggered when a recipe has a file in $S that is manipulated and then written into $B. As a concrete example, zlib has zconf.h in the source tree which is then processed by configure to create zconf.h in the build tree. When we populate the zlib-src package we have to work from the mapped source paths in the binaries (of the form /usr/src/BP) and try to identify where the file came from. To do this we iterate through the mapping paths looking for files, and using cpio in a loop we end up with the newest file with the name we're after in the -src package, which tends to be the right thing to do. However, when building zlib from scratch we create the -src package by hard linking files from the source and build trees. Then as the final act of do_package the sstate archive is created, and this clamps the file timestamps to SOURCE_DATE_EPOCH (see fixtimestamp in sstate.bbclass). The fundamental problem is that we clamp the mtimes in sstate_package in a way that changes the source files. There are several paths through the code: 1. Full build from clean In this case B/zconf.h is freshly generated and has a current timestamp, and the final zlib-src package in sstate has the correct file. 2. Repackage from existing tree If we need to repackage but there is an existing build tree then we've previously clamped the timestamp of zconf.h. When iterating the map paths for files we first copy from $S and then $B, but as the file in $M has an identical timestamp it isn't used. This results in the wrong file in the final zlib-src package in sstate. 3. Repackage from sstate Depending on whether the sstate has the correct or incorrect file, the output will have the same contents. The long-term fix to this is to not clamp the actual files but instead to clamp then when writing the tarballs, but this isn't trivial as you need to handle the sstate and non-sstate paths separately. What I think is an acceptable workaround is to reorder the prefix map list and move ${B} to the front, on the assumption that generated files should be preferred over non-generated. This results in ${B}/zconf.h having priority over ${S}/zconf.h when they both have identical timestamps. [ YOCTO #15491 ] Signed-off-by: Ross Burton <ross.burton@arm.com> --- meta/lib/oe/package.py | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)