diff mbox series

checksum: fix unstable sort of checksums

Message ID 20240514101257.31020-1-yang.xu@mediatek.com
State New
Headers show
Series checksum: fix unstable sort of checksums | expand

Commit Message

Yang Xu May 14, 2024, 10:12 a.m. UTC
From: Yang Xu <yang.xu@mediatek.com>

Using only the checksum as sorting key can lead to unstable results when
same content files exist in different subfolders, affecting the taskhash
calculation due to its sensitivity to the order of files in checksums.

This commit changes the sorting key to use both checksum and file path,
ensuring stable sorting and enhancing sstate cache hit rates.

Signed-off-by: Yang Xu <yang.xu@mediatek.com>
---
 lib/bb/checksum.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Purdie May 14, 2024, 10:29 a.m. UTC | #1
On Tue, 2024-05-14 at 10:12 +0000, Yang Xu via lists.openembedded.org wrote:
> From: Yang Xu <yang.xu@mediatek.com>
> 
> Using only the checksum as sorting key can lead to unstable results when
> same content files exist in different subfolders, affecting the taskhash
> calculation due to its sensitivity to the order of files in checksums.
> 
> This commit changes the sorting key to use both checksum and file path,
> ensuring stable sorting and enhancing sstate cache hit rates.
> 
> Signed-off-by: Yang Xu <yang.xu@mediatek.com>
> ---
>  lib/bb/checksum.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
> index 557793d3..5294ddf1 100644
> --- a/lib/bb/checksum.py
> +++ b/lib/bb/checksum.py
> @@ -140,5 +140,5 @@ class FileChecksumCache(MultiProcessCache):
>                  if checksum:
>                      checksums.append((pth, checksum))
>  
> -        checksums.sort(key=operator.itemgetter(1))
> +        checksums.sort(key=lambda x:(x[1], x[0]))
>          return checksums
> 

Which version of the project are you testing this with?

The checksum data is sorted where it is used in siggen.py so sorting
here shouldn't make much difference?

Cheers,

Richard
Yang Xu May 14, 2024, 11:20 a.m. UTC | #2
Dear Richard,

Sorry, I use a old version "kirkstone". Yes, I see the unstable sort problem is fixed in siggen.py in newer version.
I think the existed solution is also OK.

Please ignore current patch.

Thank you
diff mbox series

Patch

diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
index 557793d3..5294ddf1 100644
--- a/lib/bb/checksum.py
+++ b/lib/bb/checksum.py
@@ -140,5 +140,5 @@  class FileChecksumCache(MultiProcessCache):
                 if checksum:
                     checksums.append((pth, checksum))
 
-        checksums.sort(key=operator.itemgetter(1))
+        checksums.sort(key=lambda x:(x[1], x[0]))
         return checksums