diff mbox series

lib/oe/package: scan B first when populating -src package

Message ID 20241024172506.473232-1-ross.burton@arm.com
State New
Headers show
Series lib/oe/package: scan B first when populating -src package | expand

Commit Message

Ross Burton Oct. 24, 2024, 5:25 p.m. UTC
There's an interesting reproducibility problem triggered when a recipe
has a file in $S that is manipulated and then written into $B.  As a
concrete example, zlib has zconf.h in the source tree which is then
processed by configure to create zconf.h in the build tree.

When we populate the zlib-src package we have to work from the mapped
source paths in the binaries (of the form /usr/src/BP) and try to
identify where the file came from. To do this we iterate through the
mapping paths looking for files, and using cpio in a loop we end up with
the newest file with the name we're after in the -src package, which
tends to be the right thing to do.

However, when building zlib from scratch we create the -src package by
hard linking files from the source and build trees. Then as the final
act of do_package the sstate archive is created, and this clamps the
file timestamps to SOURCE_DATE_EPOCH (see fixtimestamp in sstate.bbclass).

The fundamental problem is that we clamp the mtimes in sstate_package in
a way that changes the source files. There are several paths through the
code:

1. Full build from clean
In this case B/zconf.h is freshly generated and has a current timestamp,
and the final zlib-src package in sstate has the correct file.

2. Repackage from existing tree
If we need to repackage but there is an existing build tree then we've
previously clamped the timestamp of zconf.h. When iterating the map paths
for files we first copy from $S and then $B, but as the file in $M has
an identical timestamp it isn't used. This results in the wrong file in
the final zlib-src package in sstate.

3. Repackage from sstate
Depending on whether the sstate has the correct or incorrect file, the
output will have the same contents.

The long-term fix to this is to not clamp the actual files but instead
to clamp then when writing the tarballs, but this isn't trivial as you
need to handle the sstate and non-sstate paths separately.  What I think
is an acceptable workaround is to reorder the prefix map list and move
${B} to the front, on the assumption that generated files should be
preferred over non-generated. This results in ${B}/zconf.h having priority
over ${S}/zconf.h when they both have identical timestamps.

[ YOCTO #15491 ]

Signed-off-by: Ross Burton <ross.burton@arm.com>
---
 meta/lib/oe/package.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Richard Purdie Oct. 24, 2024, 8:59 p.m. UTC | #1
On Thu, 2024-10-24 at 18:25 +0100, Ross Burton via lists.openembedded.org wrote:
> There's an interesting reproducibility problem triggered when a recipe
> has a file in $S that is manipulated and then written into $B.  As a
> concrete example, zlib has zconf.h in the source tree which is then
> processed by configure to create zconf.h in the build tree.
> 
> When we populate the zlib-src package we have to work from the mapped
> source paths in the binaries (of the form /usr/src/BP) and try to
> identify where the file came from. To do this we iterate through the
> mapping paths looking for files, and using cpio in a loop we end up with
> the newest file with the name we're after in the -src package, which
> tends to be the right thing to do.
> 
> However, when building zlib from scratch we create the -src package by
> hard linking files from the source and build trees. Then as the final
> act of do_package the sstate archive is created, and this clamps the
> file timestamps to SOURCE_DATE_EPOCH (see fixtimestamp in sstate.bbclass).
> 
> The fundamental problem is that we clamp the mtimes in sstate_package in
> a way that changes the source files. There are several paths through the
> code:
> 
> 1. Full build from clean
> In this case B/zconf.h is freshly generated and has a current timestamp,
> and the final zlib-src package in sstate has the correct file.
> 
> 2. Repackage from existing tree
> If we need to repackage but there is an existing build tree then we've
> previously clamped the timestamp of zconf.h. When iterating the map paths
> for files we first copy from $S and then $B, but as the file in $M has
> an identical timestamp it isn't used. This results in the wrong file in
> the final zlib-src package in sstate.
> 
> 3. Repackage from sstate
> Depending on whether the sstate has the correct or incorrect file, the
> output will have the same contents.
> 
> The long-term fix to this is to not clamp the actual files but instead
> to clamp then when writing the tarballs, but this isn't trivial as you
> need to handle the sstate and non-sstate paths separately.  What I think
> is an acceptable workaround is to reorder the prefix map list and move
> ${B} to the front, on the assumption that generated files should be
> preferred over non-generated. This results in ${B}/zconf.h having priority
> over ${S}/zconf.h when they both have identical timestamps.
> 
> [ YOCTO #15491 ]
> 
> Signed-off-by: Ross Burton <ross.burton@arm.com>
> ---
>  meta/lib/oe/package.py | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
> index c213a9a3ca6..8ec00b96b69 100644
> --- a/meta/lib/oe/package.py
> +++ b/meta/lib/oe/package.py
> @@ -1010,7 +1010,15 @@ def copydebugsources(debugsrcdir, sources, d):
>          bb.utils.mkdirhier(basepath)
>          cpath.updatecache(basepath)
>  
> -        for pmap in prefixmap:
> +        # Re-order the prefix list so that we sweep ${B} first, as the sstate creation will
> +        # clamp mtimes of hardlinks in the -src package, and that will change the source.
> +        # (see #15491)
> +        b = d.getVar("B")
> +        prefixmap_keys = list(prefixmap)
> +        prefixmap_keys.remove(b)
> +        prefixmap_keys.insert(0, b)
> +
> +        for pmap in prefixmap_keys:
>              # Ignore files from the recipe sysroots (target and native)
>              cmd =  "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile
>              # We need to ignore files that are not actually ours

Thanks for working on this. You have clearly explained the problem
which is great and helps a lot.

I am worried that this code is still fragile as some cpios seem to
overwrite and some do not. Would adding the -u option and always
overwriting not be safer until we properly resolve the sstate issue?
${B} isn't the only directory we might have generated sources in, the
compiler in particular probably depends on this ordering and has other
source files we'd alter timestamps of?

If you're not convinced about the -u option, I'm starting to think we
may need to fix this properly even if it is painful :(.

Cheers,

Richard
Richard Purdie Oct. 25, 2024, 1:49 p.m. UTC | #2
On Thu, 2024-10-24 at 18:25 +0100, Ross Burton via lists.openembedded.org wrote:
> There's an interesting reproducibility problem triggered when a recipe
> has a file in $S that is manipulated and then written into $B.  As a
> concrete example, zlib has zconf.h in the source tree which is then
> processed by configure to create zconf.h in the build tree.
> 
> When we populate the zlib-src package we have to work from the mapped
> source paths in the binaries (of the form /usr/src/BP) and try to
> identify where the file came from. To do this we iterate through the
> mapping paths looking for files, and using cpio in a loop we end up with
> the newest file with the name we're after in the -src package, which
> tends to be the right thing to do.
> 
> However, when building zlib from scratch we create the -src package by
> hard linking files from the source and build trees. Then as the final
> act of do_package the sstate archive is created, and this clamps the
> file timestamps to SOURCE_DATE_EPOCH (see fixtimestamp in sstate.bbclass).
> 
> The fundamental problem is that we clamp the mtimes in sstate_package in
> a way that changes the source files. There are several paths through the
> code:
> 
> 1. Full build from clean
> In this case B/zconf.h is freshly generated and has a current timestamp,
> and the final zlib-src package in sstate has the correct file.
> 
> 2. Repackage from existing tree
> If we need to repackage but there is an existing build tree then we've
> previously clamped the timestamp of zconf.h. When iterating the map paths
> for files we first copy from $S and then $B, but as the file in $M has
> an identical timestamp it isn't used. This results in the wrong file in
> the final zlib-src package in sstate.
> 
> 3. Repackage from sstate
> Depending on whether the sstate has the correct or incorrect file, the
> output will have the same contents.
> 
> The long-term fix to this is to not clamp the actual files but instead
> to clamp then when writing the tarballs, but this isn't trivial as you
> need to handle the sstate and non-sstate paths separately.  What I think
> is an acceptable workaround is to reorder the prefix map list and move
> ${B} to the front, on the assumption that generated files should be
> preferred over non-generated. This results in ${B}/zconf.h having priority
> over ${S}/zconf.h when they both have identical timestamps.
> 
> [ YOCTO #15491 ]
> 
> Signed-off-by: Ross Burton <ross.burton@arm.com>
> ---
>  meta/lib/oe/package.py | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
> index c213a9a3ca6..8ec00b96b69 100644
> --- a/meta/lib/oe/package.py
> +++ b/meta/lib/oe/package.py
> @@ -1010,7 +1010,15 @@ def copydebugsources(debugsrcdir, sources, d):
>          bb.utils.mkdirhier(basepath)
>          cpath.updatecache(basepath)
>  
> -        for pmap in prefixmap:
> +        # Re-order the prefix list so that we sweep ${B} first, as the sstate creation will
> +        # clamp mtimes of hardlinks in the -src package, and that will change the source.
> +        # (see #15491)
> +        b = d.getVar("B")
> +        prefixmap_keys = list(prefixmap)
> +        prefixmap_keys.remove(b)
> +        prefixmap_keys.insert(0, b)
> +
> +        for pmap in prefixmap_keys:
>              # Ignore files from the recipe sysroots (target and native)
>              cmd =  "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile
>              # We need to ignore files that are not actually ours

I've sent out a change for do_package which stops the messing with the
timestamps. It breaks reproducibility of the do_package sstate but that
is probably the right compromise. See what you think...

Does that mean we don't need to this patch (since this feels like a bit
of a hack)?

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
index c213a9a3ca6..8ec00b96b69 100644
--- a/meta/lib/oe/package.py
+++ b/meta/lib/oe/package.py
@@ -1010,7 +1010,15 @@  def copydebugsources(debugsrcdir, sources, d):
         bb.utils.mkdirhier(basepath)
         cpath.updatecache(basepath)
 
-        for pmap in prefixmap:
+        # Re-order the prefix list so that we sweep ${B} first, as the sstate creation will
+        # clamp mtimes of hardlinks in the -src package, and that will change the source.
+        # (see #15491)
+        b = d.getVar("B")
+        prefixmap_keys = list(prefixmap)
+        prefixmap_keys.remove(b)
+        prefixmap_keys.insert(0, b)
+
+        for pmap in prefixmap_keys:
             # Ignore files from the recipe sysroots (target and native)
             cmd =  "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile
             # We need to ignore files that are not actually ours