diff mbox series

[kirkstone,2.0,1/1] siggen.py: Improve taskhash reproducibility

Message ID da5f41996687e18b78d9c9845e621d832115aa1e.1741808747.git.steve@sakoman.com
State Accepted, archived
Commit da5f41996687e18b78d9c9845e621d832115aa1e
Headers show
Series [kirkstone,2.0,1/1] siggen.py: Improve taskhash reproducibility | expand

Commit Message

Steve Sakoman March 12, 2025, 7:48 p.m. UTC
From: Paulo Neves <paulo@myneves.com>

file checksums are part of the data checksummed
to generate the task hash. The list of file checksums
was not ordered.

In this commit we make sure the task hash checksum takes
a list of checksum data that is ordered by unique file name
thus guaranteeing reproducibility.

Signed-off-by: Paulo Neves <paulo@myneves.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Martin Jansa <martin.jansa@gmail.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/siggen.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Ernst Persson March 19, 2025, 8:05 a.m. UTC | #1
I tried building in an old kirkstone workspace with this patch applied and
I got a lot of taskhash mismatch errors.
After deleting tmp and cache the build was fine, though it did rebuild
everything from scratch.
Does this need bumping of something like HASHEQUIV_HASH_VERSION or
SSTATE_VERSION... ?

Regards
//Ernst

Den ons 12 mars 2025 kl 20:48 skrev Steve Sakoman via lists.openembedded.org
<steve=sakoman.com@lists.openembedded.org>:

> From: Paulo Neves <paulo@myneves.com>
>
> file checksums are part of the data checksummed
> to generate the task hash. The list of file checksums
> was not ordered.
>
> In this commit we make sure the task hash checksum takes
> a list of checksum data that is ordered by unique file name
> thus guaranteeing reproducibility.
>
> Signed-off-by: Paulo Neves <paulo@myneves.com>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> Signed-off-by: Martin Jansa <martin.jansa@gmail.com>
> Signed-off-by: Steve Sakoman <steve@sakoman.com>
> ---
>  lib/bb/siggen.py | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
> index 0a9ce0ede..828729d8b 100644
> --- a/lib/bb/siggen.py
> +++ b/lib/bb/siggen.py
> @@ -331,7 +331,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
>          for dep in self.runtaskdeps[tid]:
>              data += self.get_unihash(dep)
>
> -        for (f, cs) in self.file_checksum_values[tid]:
> +        for (f, cs) in sorted(self.file_checksum_values[tid],
> key=clean_checksum_file_path):
>              if cs:
>                  if "/./" in f:
>                      data += "./" + f.split("/./")[1]
> @@ -393,7 +393,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
>          if runtime and tid in self.taskhash:
>              data['runtaskdeps'] = self.runtaskdeps[tid]
>              data['file_checksum_values'] = []
> -            for f,cs in self.file_checksum_values[tid]:
> +            for f,cs in sorted(self.file_checksum_values[tid],
> key=clean_checksum_file_path):
>                  if "/./" in f:
>                      data['file_checksum_values'].append(("./" +
> f.split("/./")[1], cs))
>                  else:
> @@ -720,6 +720,12 @@ class
> SignatureGeneratorTestMulticonfigDepends(SignatureGeneratorBasicHash):
>      name = "TestMulticonfigDepends"
>      supports_multiconfig_datacaches = True
>
> +def clean_checksum_file_path(file_checksum_tuple):
> +    f, cs = file_checksum_tuple
> +    if "/./" in f:
> +        return "./" + f.split("/./")[1]
> +    return f
> +
>  def dump_this_task(outfile, d):
>      import bb.parse
>      fn = d.getVar("BB_FILENAME")
> --
> 2.43.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#17438):
> https://lists.openembedded.org/g/bitbake-devel/message/17438
> Mute This Topic: https://lists.openembedded.org/mt/111666877/4947266
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> ernstp@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Steve Sakoman March 19, 2025, 1:43 p.m. UTC | #2
On Wed, Mar 19, 2025 at 1:05 AM Ernst Persson <ernstp@gmail.com> wrote:
>
> I tried building in an old kirkstone workspace with this patch applied and I got a lot of taskhash mismatch errors.
> After deleting tmp and cache the build was fine, though it did rebuild everything from scratch.
> Does this need bumping of something like HASHEQUIV_HASH_VERSION or SSTATE_VERSION... ?

I didn't see any taskhash mismatch errors in any of my testing, so I'm
not sure what happened in your build!

I don't think bumps to HASHEQUIV_HASH_VERSION or SSTATE_VERSION are
required, but perhaps Richard can confirm.

Steve

> Den ons 12 mars 2025 kl 20:48 skrev Steve Sakoman via lists.openembedded.org <steve=sakoman.com@lists.openembedded.org>:
>>
>> From: Paulo Neves <paulo@myneves.com>
>>
>> file checksums are part of the data checksummed
>> to generate the task hash. The list of file checksums
>> was not ordered.
>>
>> In this commit we make sure the task hash checksum takes
>> a list of checksum data that is ordered by unique file name
>> thus guaranteeing reproducibility.
>>
>> Signed-off-by: Paulo Neves <paulo@myneves.com>
>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>> Signed-off-by: Martin Jansa <martin.jansa@gmail.com>
>> Signed-off-by: Steve Sakoman <steve@sakoman.com>
>> ---
>>  lib/bb/siggen.py | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
>> index 0a9ce0ede..828729d8b 100644
>> --- a/lib/bb/siggen.py
>> +++ b/lib/bb/siggen.py
>> @@ -331,7 +331,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
>>          for dep in self.runtaskdeps[tid]:
>>              data += self.get_unihash(dep)
>>
>> -        for (f, cs) in self.file_checksum_values[tid]:
>> +        for (f, cs) in sorted(self.file_checksum_values[tid], key=clean_checksum_file_path):
>>              if cs:
>>                  if "/./" in f:
>>                      data += "./" + f.split("/./")[1]
>> @@ -393,7 +393,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
>>          if runtime and tid in self.taskhash:
>>              data['runtaskdeps'] = self.runtaskdeps[tid]
>>              data['file_checksum_values'] = []
>> -            for f,cs in self.file_checksum_values[tid]:
>> +            for f,cs in sorted(self.file_checksum_values[tid], key=clean_checksum_file_path):
>>                  if "/./" in f:
>>                      data['file_checksum_values'].append(("./" + f.split("/./")[1], cs))
>>                  else:
>> @@ -720,6 +720,12 @@ class SignatureGeneratorTestMulticonfigDepends(SignatureGeneratorBasicHash):
>>      name = "TestMulticonfigDepends"
>>      supports_multiconfig_datacaches = True
>>
>> +def clean_checksum_file_path(file_checksum_tuple):
>> +    f, cs = file_checksum_tuple
>> +    if "/./" in f:
>> +        return "./" + f.split("/./")[1]
>> +    return f
>> +
>>  def dump_this_task(outfile, d):
>>      import bb.parse
>>      fn = d.getVar("BB_FILENAME")
>> --
>> 2.43.0
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#17438): https://lists.openembedded.org/g/bitbake-devel/message/17438
>> Mute This Topic: https://lists.openembedded.org/mt/111666877/4947266
>> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ernstp@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
Richard Purdie March 19, 2025, 3:03 p.m. UTC | #3
On Wed, 2025-03-19 at 06:43 -0700, Steve Sakoman wrote:
> On Wed, Mar 19, 2025 at 1:05 AM Ernst Persson <ernstp@gmail.com>
> wrote:
> > 
> > I tried building in an old kirkstone workspace with this patch
> > applied and I got a lot of taskhash mismatch errors.
> > After deleting tmp and cache the build was fine, though it did
> > rebuild everything from scratch.
> > Does this need bumping of something like HASHEQUIV_HASH_VERSION or
> > SSTATE_VERSION... ?
> 
> I didn't see any taskhash mismatch errors in any of my testing, so
> I'm
> not sure what happened in your build!
> 
> I don't think bumps to HASHEQUIV_HASH_VERSION or SSTATE_VERSION are
> required, but perhaps Richard can confirm.

I think it probably needs a cache version bump, something like:

https://git.yoctoproject.org/poky/commit/bitbake/lib/bb/cache.py?id=fced0b8d7c852e52606fa8da321211a98a1dfaea

which would then trigger a reparse. Things should be ok after the
reparse.

Cheers,

Richard
Steve Sakoman March 19, 2025, 3:07 p.m. UTC | #4
On Wed, Mar 19, 2025 at 8:03 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> On Wed, 2025-03-19 at 06:43 -0700, Steve Sakoman wrote:
> > On Wed, Mar 19, 2025 at 1:05 AM Ernst Persson <ernstp@gmail.com>
> > wrote:
> > >
> > > I tried building in an old kirkstone workspace with this patch
> > > applied and I got a lot of taskhash mismatch errors.
> > > After deleting tmp and cache the build was fine, though it did
> > > rebuild everything from scratch.
> > > Does this need bumping of something like HASHEQUIV_HASH_VERSION or
> > > SSTATE_VERSION... ?
> >
> > I didn't see any taskhash mismatch errors in any of my testing, so
> > I'm
> > not sure what happened in your build!
> >
> > I don't think bumps to HASHEQUIV_HASH_VERSION or SSTATE_VERSION are
> > required, but perhaps Richard can confirm.
>
> I think it probably needs a cache version bump, something like:
>
> https://git.yoctoproject.org/poky/commit/bitbake/lib/bb/cache.py?id=fced0b8d7c852e52606fa8da321211a98a1dfaea
>
> which would then trigger a reparse. Things should be ok after the
> reparse.

Ok, thanks!  I'll send a patch.

Steve
diff mbox series

Patch

diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
index 0a9ce0ede..828729d8b 100644
--- a/lib/bb/siggen.py
+++ b/lib/bb/siggen.py
@@ -331,7 +331,7 @@  class SignatureGeneratorBasic(SignatureGenerator):
         for dep in self.runtaskdeps[tid]:
             data += self.get_unihash(dep)
 
-        for (f, cs) in self.file_checksum_values[tid]:
+        for (f, cs) in sorted(self.file_checksum_values[tid], key=clean_checksum_file_path):
             if cs:
                 if "/./" in f:
                     data += "./" + f.split("/./")[1]
@@ -393,7 +393,7 @@  class SignatureGeneratorBasic(SignatureGenerator):
         if runtime and tid in self.taskhash:
             data['runtaskdeps'] = self.runtaskdeps[tid]
             data['file_checksum_values'] = []
-            for f,cs in self.file_checksum_values[tid]:
+            for f,cs in sorted(self.file_checksum_values[tid], key=clean_checksum_file_path):
                 if "/./" in f:
                     data['file_checksum_values'].append(("./" + f.split("/./")[1], cs))
                 else:
@@ -720,6 +720,12 @@  class SignatureGeneratorTestMulticonfigDepends(SignatureGeneratorBasicHash):
     name = "TestMulticonfigDepends"
     supports_multiconfig_datacaches = True
 
+def clean_checksum_file_path(file_checksum_tuple):
+    f, cs = file_checksum_tuple
+    if "/./" in f:
+        return "./" + f.split("/./")[1]
+    return f
+
 def dump_this_task(outfile, d):
     import bb.parse
     fn = d.getVar("BB_FILENAME")