diff mbox series

kernel-yocto: make kernel commits reproducible

Message ID 20241217074547.1601328-1-ejo@pengutronix.de
State New
Headers show
Series kernel-yocto: make kernel commits reproducible | expand

Commit Message

Enrico Jörns Dec. 17, 2024, 7:45 a.m. UTC
The git commit hashes for the kernel checkout are not reproducible under
certain conditions:

- If the git repository is initialized on an archive (rather than a
  git), the initial git commit not only has the current user name set,
  it also uses the current system time as committer and author date.
  This will affect the initial git hash and thus all subsequent ones.

- The patches applied by the kern-tools have a valid author and date.
  However, their committer again depends on the user building the BSP.

This is an issue, for example, if one compiles a kernel with
CONFIG_LOCALVERSION_AUTO enabled where the commit hash lands into the
kernel and thus the package version. This not only makes the package
version non-reproducible, but also leads to version mismatches between
kernel modules built against a fresh kernel checkout and the kernel
retrieved from the sstate cache.

The class uses 'check_git_config', but this only sets the git user and
only if none existed before. Thus it doesn't really help here.

Since in Git the committer information can be set only from the
environment variables GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL, and
GIT_COMMITTER_DATE, we introduce the helper function
'reproducible_git_committer' here to set those.
As values simply use PATCH_GIT_USER_NAME, PATCH_GIT_USER_EMAIL (from
patch.bbclass) and SOURCE_DATE_EPOCH.

Using this helper makes the committer date/name/email for the initial
reproducible, as well as the committer name/email for the patches
applied with kern-tools.

What's still missing is the initial commit's date/name/email which we
just set explicitly via command line arguments.

Suggested-by: Felix Klöckner <F.Kloeckner@weinmann-emt.de>
Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
---
 meta/classes-recipe/kernel-yocto.bbclass | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Bruce Ashfield Dec. 20, 2024, 2:59 p.m. UTC | #1
On Tue, Dec 17, 2024 at 2:46 AM Enrico Jörns <ejo@pengutronix.de> wrote:

> The git commit hashes for the kernel checkout are not reproducible under
> certain conditions:
>
> - If the git repository is initialized on an archive (rather than a
>   git), the initial git commit not only has the current user name set,
>   it also uses the current system time as committer and author date.
>   This will affect the initial git hash and thus all subsequent ones.
>
> - The patches applied by the kern-tools have a valid author and date.
>   However, their committer again depends on the user building the BSP.
>
> This is an issue, for example, if one compiles a kernel with
> CONFIG_LOCALVERSION_AUTO enabled where the commit hash lands into the
> kernel and thus the package version. This not only makes the package
> version non-reproducible, but also leads to version mismatches between
> kernel modules built against a fresh kernel checkout and the kernel
> retrieved from the sstate cache.
>
> The class uses 'check_git_config', but this only sets the git user and
> only if none existed before. Thus it doesn't really help here.
>
> Since in Git the committer information can be set only from the
> environment variables GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL, and
> GIT_COMMITTER_DATE, we introduce the helper function
> 'reproducible_git_committer' here to set those.
> As values simply use PATCH_GIT_USER_NAME, PATCH_GIT_USER_EMAIL (from
> patch.bbclass) and SOURCE_DATE_EPOCH.
>
> Using this helper makes the committer date/name/email for the initial
> reproducible, as well as the committer name/email for the patches
> applied with kern-tools.
>
> What's still missing is the initial commit's date/name/email which we
> just set explicitly via command line arguments.
>
> Suggested-by: Felix Klöckner <F.Kloeckner@weinmann-emt.de>
> Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
> ---
>  meta/classes-recipe/kernel-yocto.bbclass | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/meta/classes-recipe/kernel-yocto.bbclass
> b/meta/classes-recipe/kernel-yocto.bbclass
> index a5d89dc2c8..1ba2f91e62 100644
> --- a/meta/classes-recipe/kernel-yocto.bbclass
> +++ b/meta/classes-recipe/kernel-yocto.bbclass
> @@ -9,6 +9,12 @@ SRCTREECOVEREDTASKS += "do_validate_branches
> do_kernel_configcheck do_kernel_che
>  PATCH_GIT_USER_EMAIL ?= "kernel-yocto@oe"
>  PATCH_GIT_USER_NAME ?= "OpenEmbedded"
>
> +reproducible_git_committer() {
> +       export GIT_COMMITTER_NAME="${PATCH_GIT_USER_NAME}"
> +       export GIT_COMMITTER_EMAIL="${PATCH_GIT_USER_EMAIL}"
> +       export GIT_COMMITTER_DATE="$(date -d @${SOURCE_DATE_EPOCH})"
> +}
>

These would be better set in the same class and I'd argue
that they should just be set in same function  as our other
settings check_git_config()

While the check_git_config name might not convey it, the
original intent of that routine was to make sure that if what we
needed wasn't set, that we'd configure/set it for the upcoming
git commands.

Is there any reason this wouldn't work when exported from
classes-global/utils.bbclass ?

We also have already have get_kernel_source_date_epoch(),
that should be used instead of going directly at
SOURCE_DATE_EPOCH (unless the routine is causing a
problem, and then I'd be interested in hearing what it is).

There are some builds that for debugging purposes
we want to inhibit all reproducibility and forced setting of
timestamps, so setting these options should be
conditional on KERNEL_DEBUG_TIMESTAMPS not being
set. I realize that this is a slightly different use of setting
the timestamp as when we build the kernel, but it would
still be useful to disable the forced timestamp in some
scenarios.

+
>  # The distro or local.conf should set this, but if nobody cares...
>  LINUX_KERNEL_TYPE ??= "standard"
>
> @@ -349,6 +355,7 @@ do_patch() {
>         cd ${S}
>
>         check_git_config
> +       reproducible_git_committer
>         meta_dir=$(kgit --meta)
>         (cd ${meta_dir}; ln -sf patch.queue series)
>         if [ -f "${meta_dir}/series" ]; then
> @@ -431,8 +438,9 @@ do_kernel_checkout() {
>                 rm -f .gitignore
>                 git init
>                 check_git_config
> +               reproducible_git_committer
>                 git add .
> -               git commit -q -n -m "baseline commit: creating repo for
> ${PN}-${PV}"
> +               git commit --author="${PATCH_GIT_USER_NAME}
> <${PATCH_GIT_USER_EMAIL}>" --date=${SOURCE_DATE_EPOCH} -q -n -m "baseline
> commit: creating repo for ${PN}-${PV}"
>

The author, author date and author email can also come from the environment
in my experiments.

Can we just set them the same way as the other variables for consistency ?

Bruce


>                 git clean -d -f
>         fi
>
> --
> 2.39.5
>
>
Enrico Jörns Jan. 16, 2025, 2:19 p.m. UTC | #2
Am Freitag, dem 20.12.2024 um 09:59 -0500 schrieb Bruce Ashfield:
> On Tue, Dec 17, 2024 at 2:46 AM Enrico Jörns <ejo@pengutronix.de> wrote:
> > The git commit hashes for the kernel checkout are not reproducible under
> > certain conditions:
> > 
> > - If the git repository is initialized on an archive (rather than a
> >   git), the initial git commit not only has the current user name set,
> >   it also uses the current system time as committer and author date.
> >   This will affect the initial git hash and thus all subsequent ones.
> > 
> > - The patches applied by the kern-tools have a valid author and date.
> >   However, their committer again depends on the user building the BSP.
> > 
> > This is an issue, for example, if one compiles a kernel with
> > CONFIG_LOCALVERSION_AUTO enabled where the commit hash lands into the
> > kernel and thus the package version. This not only makes the package
> > version non-reproducible, but also leads to version mismatches between
> > kernel modules built against a fresh kernel checkout and the kernel
> > retrieved from the sstate cache.
> > 
> > The class uses 'check_git_config', but this only sets the git user and
> > only if none existed before. Thus it doesn't really help here.
> > 
> > Since in Git the committer information can be set only from the
> > environment variables GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL, and
> > GIT_COMMITTER_DATE, we introduce the helper function
> > 'reproducible_git_committer' here to set those.
> > As values simply use PATCH_GIT_USER_NAME, PATCH_GIT_USER_EMAIL (from
> > patch.bbclass) and SOURCE_DATE_EPOCH.
> > 
> > Using this helper makes the committer date/name/email for the initial
> > reproducible, as well as the committer name/email for the patches
> > applied with kern-tools.
> > 
> > What's still missing is the initial commit's date/name/email which we
> > just set explicitly via command line arguments.
> > 
> > Suggested-by: Felix Klöckner <F.Kloeckner@weinmann-emt.de>
> > Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
> > ---
> >  meta/classes-recipe/kernel-yocto.bbclass | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/meta/classes-recipe/kernel-yocto.bbclass b/meta/classes-recipe/kernel-yocto.bbclass
> > index a5d89dc2c8..1ba2f91e62 100644
> > --- a/meta/classes-recipe/kernel-yocto.bbclass
> > +++ b/meta/classes-recipe/kernel-yocto.bbclass
> > @@ -9,6 +9,12 @@ SRCTREECOVEREDTASKS += "do_validate_branches do_kernel_configcheck
> > do_kernel_che
> >  PATCH_GIT_USER_EMAIL ?= "kernel-yocto@oe"
> >  PATCH_GIT_USER_NAME ?= "OpenEmbedded"
> > 
> > +reproducible_git_committer() {
> > +       export GIT_COMMITTER_NAME="${PATCH_GIT_USER_NAME}"
> > +       export GIT_COMMITTER_EMAIL="${PATCH_GIT_USER_EMAIL}"
> > +       export GIT_COMMITTER_DATE="$(date -d @${SOURCE_DATE_EPOCH})"
> > +}
> 
> These would be better set in the same class and I'd argue
> that they should just be set in same function  as our other
> settings check_git_config()
> 
> While the check_git_config name might not convey it, the
> original intent of that routine was to make sure that if what we
> needed wasn't set, that we'd configure/set it for the upcoming
> git commands.

Yeah. Maybe it's useful for others, too.
Will move it to utils.bbclass.

> Is there any reason this wouldn't work when exported from
> classes-global/utils.bbclass ?

None that I would know of.

> We also have already have get_kernel_source_date_epoch(),
> that should be used instead of going directly at
> SOURCE_DATE_EPOCH (unless the routine is causing a
> problem, and then I'd be interested in hearing what it is).

Cannot find a method named 'get_kernel_source_date_epoch()' in any class.
Just find other parts of the kernel classes using SOURCE_DATE_EPOCH directly.
Could you give me a hint what I have to look at?

> There are some builds that for debugging purposes
> we want to inhibit all reproducibility and forced setting of
> timestamps, so setting these options should be
> conditional on KERNEL_DEBUG_TIMESTAMPS not being
> set. I realize that this is a slightly different use of setting
> the timestamp as when we build the kernel, but it would 
> still be useful to disable the forced timestamp in some
> scenarios.

I have no opinion on this and will do as suggested.
If timestamps change I'd assume that I can safely ignore all reproducibility features, right?
Would make it easier then by just guarding the call to 'reproducible_git_commiter'.

> > +
> >  # The distro or local.conf should set this, but if nobody cares...
> >  LINUX_KERNEL_TYPE ??= "standard"
> > 
> > @@ -349,6 +355,7 @@ do_patch() {
> >         cd ${S}
> > 
> >         check_git_config
> > +       reproducible_git_committer
> >         meta_dir=$(kgit --meta)
> >         (cd ${meta_dir}; ln -sf patch.queue series)
> >         if [ -f "${meta_dir}/series" ]; then
> > @@ -431,8 +438,9 @@ do_kernel_checkout() {
> >                 rm -f .gitignore
> >                 git init
> >                 check_git_config
> > +               reproducible_git_committer
> >                 git add .
> > -               git commit -q -n -m "baseline commit: creating repo for ${PN}-${PV}"
> > +               git commit --author="${PATCH_GIT_USER_NAME} <${PATCH_GIT_USER_EMAIL}>" --
> > date=${SOURCE_DATE_EPOCH} -q -n -m "baseline commit: creating repo for ${PN}-${PV}"
> > 
> 
> 
> The author, author date and author email can also come from the environment
> in my experiments.

Yes, this works, too.
Wasn't sure which way to go since e.g. patch.py also uses --author directly.

> Can we just set them the same way as the other variables for consistency ?

Maybe that's better anyway, yes.


Regards, Enrico

> Bruce
>  
> >                 git clean -d -f
> >         fi
> >
Bruce Ashfield Jan. 17, 2025, 4:24 a.m. UTC | #3
On Thu, Jan 16, 2025 at 9:19 AM Enrico Jörns <ejo@pengutronix.de> wrote:

> Am Freitag, dem 20.12.2024 um 09:59 -0500 schrieb Bruce Ashfield:
> > On Tue, Dec 17, 2024 at 2:46 AM Enrico Jörns <ejo@pengutronix.de> wrote:
> > > The git commit hashes for the kernel checkout are not reproducible
> under
> > > certain conditions:
> > >
> > > - If the git repository is initialized on an archive (rather than a
> > >   git), the initial git commit not only has the current user name set,
> > >   it also uses the current system time as committer and author date.
> > >   This will affect the initial git hash and thus all subsequent ones.
> > >
> > > - The patches applied by the kern-tools have a valid author and date.
> > >   However, their committer again depends on the user building the BSP.
> > >
> > > This is an issue, for example, if one compiles a kernel with
> > > CONFIG_LOCALVERSION_AUTO enabled where the commit hash lands into the
> > > kernel and thus the package version. This not only makes the package
> > > version non-reproducible, but also leads to version mismatches between
> > > kernel modules built against a fresh kernel checkout and the kernel
> > > retrieved from the sstate cache.
> > >
> > > The class uses 'check_git_config', but this only sets the git user and
> > > only if none existed before. Thus it doesn't really help here.
> > >
> > > Since in Git the committer information can be set only from the
> > > environment variables GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL, and
> > > GIT_COMMITTER_DATE, we introduce the helper function
> > > 'reproducible_git_committer' here to set those.
> > > As values simply use PATCH_GIT_USER_NAME, PATCH_GIT_USER_EMAIL (from
> > > patch.bbclass) and SOURCE_DATE_EPOCH.
> > >
> > > Using this helper makes the committer date/name/email for the initial
> > > reproducible, as well as the committer name/email for the patches
> > > applied with kern-tools.
> > >
> > > What's still missing is the initial commit's date/name/email which we
> > > just set explicitly via command line arguments.
> > >
> > > Suggested-by: Felix Klöckner <F.Kloeckner@weinmann-emt.de>
> > > Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
> > > ---
> > >  meta/classes-recipe/kernel-yocto.bbclass | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/meta/classes-recipe/kernel-yocto.bbclass
> b/meta/classes-recipe/kernel-yocto.bbclass
> > > index a5d89dc2c8..1ba2f91e62 100644
> > > --- a/meta/classes-recipe/kernel-yocto.bbclass
> > > +++ b/meta/classes-recipe/kernel-yocto.bbclass
> > > @@ -9,6 +9,12 @@ SRCTREECOVEREDTASKS += "do_validate_branches
> do_kernel_configcheck
> > > do_kernel_che
> > >  PATCH_GIT_USER_EMAIL ?= "kernel-yocto@oe"
> > >  PATCH_GIT_USER_NAME ?= "OpenEmbedded"
> > >
> > > +reproducible_git_committer() {
> > > +       export GIT_COMMITTER_NAME="${PATCH_GIT_USER_NAME}"
> > > +       export GIT_COMMITTER_EMAIL="${PATCH_GIT_USER_EMAIL}"
> > > +       export GIT_COMMITTER_DATE="$(date -d @${SOURCE_DATE_EPOCH})"
> > > +}
> >
> > These would be better set in the same class and I'd argue
> > that they should just be set in same function  as our other
> > settings check_git_config()
> >
> > While the check_git_config name might not convey it, the
> > original intent of that routine was to make sure that if what we
> > needed wasn't set, that we'd configure/set it for the upcoming
> > git commands.
>
> Yeah. Maybe it's useful for others, too.
> Will move it to utils.bbclass.
>
> > Is there any reason this wouldn't work when exported from
> > classes-global/utils.bbclass ?
>
> None that I would know of.
>
> > We also have already have get_kernel_source_date_epoch(),
> > that should be used instead of going directly at
> > SOURCE_DATE_EPOCH (unless the routine is causing a
> > problem, and then I'd be interested in hearing what it is).
>
> Cannot find a method named 'get_kernel_source_date_epoch()' in any class.
> Just find other parts of the kernel classes using SOURCE_DATE_EPOCH
> directly.
> Could you give me a hint what I have to look at?
>

HAH!!!!! I'm not all that bright, I forgot that was my own local commit:

commit 5b2b393c70243cdfa04fccd13b1bf599e5d8abed
Author: Bruce Ashfield <bruce.ashfield@gmail.com>
Date:   Tue May 12 15:05:19 2022 -0400

    kernel/reproducibility: factor kernel source epoch for common use

    Rather than duplicating the code to generate the kbuild environment
    variable: KBUILD_BUILD_TIMESTAMP, we can factor the checks into a
    python function and call it from the main image and kernel module
    do_compile routines.

    All paths have been checked and the timestamp is identical to the
    previous shell variant.

    Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>

And a rather old one that that :)

So ignore that part of the comment, going at the variable itself
is the only option at the moment.


> > There are some builds that for debugging purposes
> > we want to inhibit all reproducibility and forced setting of
> > timestamps, so setting these options should be
> > conditional on KERNEL_DEBUG_TIMESTAMPS not being
> > set. I realize that this is a slightly different use of setting
> > the timestamp as when we build the kernel, but it would
> > still be useful to disable the forced timestamp in some
> > scenarios.
>
> I have no opinion on this and will do as suggested.
> If timestamps change I'd assume that I can safely ignore all
> reproducibility features, right?
>

Correct!

Would make it easier then by just guarding the call to
> 'reproducible_git_commiter'.
>

Whatever is easier/cleaner makes sense to me. As long at it is something
that can be enabled/disabled in times of need :)



>
> > > +
> > >  # The distro or local.conf should set this, but if nobody cares...
> > >  LINUX_KERNEL_TYPE ??= "standard"
> > >
> > > @@ -349,6 +355,7 @@ do_patch() {
> > >         cd ${S}
> > >
> > >         check_git_config
> > > +       reproducible_git_committer
> > >         meta_dir=$(kgit --meta)
> > >         (cd ${meta_dir}; ln -sf patch.queue series)
> > >         if [ -f "${meta_dir}/series" ]; then
> > > @@ -431,8 +438,9 @@ do_kernel_checkout() {
> > >                 rm -f .gitignore
> > >                 git init
> > >                 check_git_config
> > > +               reproducible_git_committer
> > >                 git add .
> > > -               git commit -q -n -m "baseline commit: creating repo
> for ${PN}-${PV}"
> > > +               git commit --author="${PATCH_GIT_USER_NAME}
> <${PATCH_GIT_USER_EMAIL}>" --
> > > date=${SOURCE_DATE_EPOCH} -q -n -m "baseline commit: creating repo for
> ${PN}-${PV}"
> > >
> >
> >
> > The author, author date and author email can also come from
> the environment
> > in my experiments.
>
> Yes, this works, too.
> Wasn't sure which way to go since e.g. patch.py also uses --author
> directly.
>

Organically and incrementally developed things aren't great for
consistency, that is for sure!

Cheers,

Bruce



> > Can we just set them the same way as the other variables for consistency
> ?
>
> Maybe that's better anyway, yes.
>
>
> Regards, Enrico
>
> > Bruce
> >
> > >                 git clean -d -f
> > >         fi
> > >
>
> --
> Pengutronix e.K.                           | Enrico Jörns                |
> Embedded Linux Consulting & Support        | https://www.pengutronix.de/ |
> Steuerwalder Str. 21                       | Phone: +49-5121-206917-180  |
> 31137 Hildesheim, Germany                  | Fax:   +49-5121-206917-9    |
>
diff mbox series

Patch

diff --git a/meta/classes-recipe/kernel-yocto.bbclass b/meta/classes-recipe/kernel-yocto.bbclass
index a5d89dc2c8..1ba2f91e62 100644
--- a/meta/classes-recipe/kernel-yocto.bbclass
+++ b/meta/classes-recipe/kernel-yocto.bbclass
@@ -9,6 +9,12 @@  SRCTREECOVEREDTASKS += "do_validate_branches do_kernel_configcheck do_kernel_che
 PATCH_GIT_USER_EMAIL ?= "kernel-yocto@oe"
 PATCH_GIT_USER_NAME ?= "OpenEmbedded"
 
+reproducible_git_committer() {
+	export GIT_COMMITTER_NAME="${PATCH_GIT_USER_NAME}"
+	export GIT_COMMITTER_EMAIL="${PATCH_GIT_USER_EMAIL}"
+	export GIT_COMMITTER_DATE="$(date -d @${SOURCE_DATE_EPOCH})"
+}
+
 # The distro or local.conf should set this, but if nobody cares...
 LINUX_KERNEL_TYPE ??= "standard"
 
@@ -349,6 +355,7 @@  do_patch() {
 	cd ${S}
 
 	check_git_config
+	reproducible_git_committer
 	meta_dir=$(kgit --meta)
 	(cd ${meta_dir}; ln -sf patch.queue series)
 	if [ -f "${meta_dir}/series" ]; then
@@ -431,8 +438,9 @@  do_kernel_checkout() {
 		rm -f .gitignore
 		git init
 		check_git_config
+		reproducible_git_committer
 		git add .
-		git commit -q -n -m "baseline commit: creating repo for ${PN}-${PV}"
+		git commit --author="${PATCH_GIT_USER_NAME} <${PATCH_GIT_USER_EMAIL}>" --date=${SOURCE_DATE_EPOCH} -q -n -m "baseline commit: creating repo for ${PN}-${PV}"
 		git clean -d -f
 	fi