diff mbox series

[3/3] bitbake: fetch2: Fix LFS object checkout in submodules

Message ID 20260309212125.3172717-4-michael.siebold@gmail.com
State New
Headers show
Series Fix git lfs submodule expansion | expand

Commit Message

Michael Siebold March 9, 2026, 9:21 p.m. UTC
From: Philip Lorenz <philip.lorenz@bmw.de>

Skipping smudging prevents the LFS objects from replacing their
placeholder files when `git submodule update` actually checks out the
target revision in the submodule. Smudging cannot happen earlier as the
clone stored in `.git/modules` is bare.

This should be fine as long as all LFS objects are available in the
download cache (which they are after the other fixes are applied).

(Bitbake rev: d270e33a07c50bb9c08861cf9a6dc51e1fd2d874)

Upstream-Status: Backport [from commit 3eeac69385]

Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 3eeac69385e8f29a08d022a17b28b5d504deed66)
Signed-off-by: Michael Siebold <michael.siebold@gmail.com>
---
 bitbake/lib/bb/fetch2/gitsm.py | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Richard Purdie March 9, 2026, 9:32 p.m. UTC | #1
On Mon, 2026-03-09 at 14:21 -0700, Michael Siebold wrote:
> From: Philip Lorenz <philip.lorenz@bmw.de>
> 
> Skipping smudging prevents the LFS objects from replacing their
> placeholder files when `git submodule update` actually checks out the
> target revision in the submodule. Smudging cannot happen earlier as the
> clone stored in `.git/modules` is bare.
> 
> This should be fine as long as all LFS objects are available in the
> download cache (which they are after the other fixes are applied).
> 
> (Bitbake rev: d270e33a07c50bb9c08861cf9a6dc51e1fd2d874)
> 
> Upstream-Status: Backport [from commit 3eeac69385]
> 
> Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit 3eeac69385e8f29a08d022a17b28b5d504deed66)
> Signed-off-by: Michael Siebold <michael.siebold@gmail.com>
> ---
>  bitbake/lib/bb/fetch2/gitsm.py | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py
> index 5c98991480..ef19053330 100644
> --- a/bitbake/lib/bb/fetch2/gitsm.py
> +++ b/bitbake/lib/bb/fetch2/gitsm.py
> @@ -243,12 +243,11 @@ class GitSM(Git):
>          ret = self.process_submodules(ud, ud.destdir, unpack_submodules, d)
>  
>          if not ud.bareclone and ret:
> -            # All submodules should already be downloaded and configured in the tree.  This simply
> -            # sets up the configuration and checks out the files.  The main project config should
> -            # remain unmodified, and no download from the internet should occur. As such, lfs smudge
> -            # should also be skipped as these files were already smudged in the fetch stage if lfs
> -            # was enabled.
> -            runfetchcmd("GIT_LFS_SKIP_SMUDGE=1 %s submodule update --recursive --no-fetch" % (ud.basecmd), d, quiet=True, workdir=ud.destdir)
> +            cmdprefix = ""
> +            # Avoid LFS smudging (replacing the LFS pointers with the actual content) when LFS shouldn't be used but git-lfs is installed.
> +            if not self._need_lfs(ud):
> +                cmdprefix = "GIT_LFS_SKIP_SMUDGE=1 "
> +            runfetchcmd("%s%s submodule update --recursive --no-fetch" % (cmdprefix, ud.basecmd), d, quiet=True, workdir=ud.destdir)
>      def clean(self, ud, d):
>          def clean_submodule(ud, url, module, modpath, workdir, d):
>              url += ";bareclone=1;nobranch=1"


We've had a lot of churn on this code and it isn't something I use and
fully understand myself so I need to ask some questions to make sure we
get this right this time.

Is "git submodule update --recursive --no-fetch" going to access the
network? 

If I understand correctly, you say it shouldn't as things should
already be in DL_DIR. What happens if they're not? Where are the large
files stored in DL_DIR?

From the older comments in the code, it sounds like the smudging was
meant to happen at do_fetch time and this is now being changed to
happen at do_unpack.

Put differently, the fetcher code needs to:

* ensure software manifests are correct and only specifically
referenced things are fetched, no random revisions or accesses outside
of what is listed
* ensure mirroring works correctly and all artefacts needed (including
lfs ones) can be handled by a mirror setting
* be reproducible, the same thing will always be fetched for a given
url/revision

I'd like to be certain this change allows for that and the smudging
doesn't bypass things.

Also, do we have tests covering this from bitbake-selftest?

Cheers,

Richard
Michael Siebold March 9, 2026, 11:14 p.m. UTC | #2
Hi Richard,

I apologize in advance, I'm far from being an expert in this area myself.
These patches are cherry-picking a fix from master into Scarthgap.

And I see that the [Scarthgap] tag made it into the cover letter but not in
the subsequent patches, sorry about that!

I thought about simply requesting this fix be backported into Scarthgap,
but my hope is this makes things easier.

Best,

Michael

On Mon, Mar 9, 2026 at 2:33 PM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Mon, 2026-03-09 at 14:21 -0700, Michael Siebold wrote:
> > From: Philip Lorenz <philip.lorenz@bmw.de>
> >
> > Skipping smudging prevents the LFS objects from replacing their
> > placeholder files when `git submodule update` actually checks out the
> > target revision in the submodule. Smudging cannot happen earlier as the
> > clone stored in `.git/modules` is bare.
> >
> > This should be fine as long as all LFS objects are available in the
> > download cache (which they are after the other fixes are applied).
> >
> > (Bitbake rev: d270e33a07c50bb9c08861cf9a6dc51e1fd2d874)
> >
> > Upstream-Status: Backport [from commit 3eeac69385]
> >
> > Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
> > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > (cherry picked from commit 3eeac69385e8f29a08d022a17b28b5d504deed66)
> > Signed-off-by: Michael Siebold <michael.siebold@gmail.com>
> > ---
> >  bitbake/lib/bb/fetch2/gitsm.py | 11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/bitbake/lib/bb/fetch2/gitsm.py
> b/bitbake/lib/bb/fetch2/gitsm.py
> > index 5c98991480..ef19053330 100644
> > --- a/bitbake/lib/bb/fetch2/gitsm.py
> > +++ b/bitbake/lib/bb/fetch2/gitsm.py
> > @@ -243,12 +243,11 @@ class GitSM(Git):
> >          ret = self.process_submodules(ud, ud.destdir,
> unpack_submodules, d)
> >
> >          if not ud.bareclone and ret:
> > -            # All submodules should already be downloaded and
> configured in the tree.  This simply
> > -            # sets up the configuration and checks out the files.  The
> main project config should
> > -            # remain unmodified, and no download from the internet
> should occur. As such, lfs smudge
> > -            # should also be skipped as these files were already
> smudged in the fetch stage if lfs
> > -            # was enabled.
> > -            runfetchcmd("GIT_LFS_SKIP_SMUDGE=1 %s submodule update
> --recursive --no-fetch" % (ud.basecmd), d, quiet=True, workdir=ud.destdir)
> > +            cmdprefix = ""
> > +            # Avoid LFS smudging (replacing the LFS pointers with the
> actual content) when LFS shouldn't be used but git-lfs is installed.
> > +            if not self._need_lfs(ud):
> > +                cmdprefix = "GIT_LFS_SKIP_SMUDGE=1 "
> > +            runfetchcmd("%s%s submodule update --recursive --no-fetch"
> % (cmdprefix, ud.basecmd), d, quiet=True, workdir=ud.destdir)
> >      def clean(self, ud, d):
> >          def clean_submodule(ud, url, module, modpath, workdir, d):
> >              url += ";bareclone=1;nobranch=1"
>
>
> We've had a lot of churn on this code and it isn't something I use and
> fully understand myself so I need to ask some questions to make sure we
> get this right this time.
>
> Is "git submodule update --recursive --no-fetch" going to access the
> network?
>
> If I understand correctly, you say it shouldn't as things should
> already be in DL_DIR. What happens if they're not? Where are the large
> files stored in DL_DIR?
>
> From the older comments in the code, it sounds like the smudging was
> meant to happen at do_fetch time and this is now being changed to
> happen at do_unpack.
>
> Put differently, the fetcher code needs to:
>
> * ensure software manifests are correct and only specifically
> referenced things are fetched, no random revisions or accesses outside
> of what is listed
> * ensure mirroring works correctly and all artefacts needed (including
> lfs ones) can be handled by a mirror setting
> * be reproducible, the same thing will always be fetched for a given
> url/revision
>
> I'd like to be certain this change allows for that and the smudging
> doesn't bypass things.
>
> Also, do we have tests covering this from bitbake-selftest?
>
> Cheers,
>
> Richard
>
>
>
>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py
index 5c98991480..ef19053330 100644
--- a/bitbake/lib/bb/fetch2/gitsm.py
+++ b/bitbake/lib/bb/fetch2/gitsm.py
@@ -243,12 +243,11 @@  class GitSM(Git):
         ret = self.process_submodules(ud, ud.destdir, unpack_submodules, d)
 
         if not ud.bareclone and ret:
-            # All submodules should already be downloaded and configured in the tree.  This simply
-            # sets up the configuration and checks out the files.  The main project config should
-            # remain unmodified, and no download from the internet should occur. As such, lfs smudge
-            # should also be skipped as these files were already smudged in the fetch stage if lfs
-            # was enabled.
-            runfetchcmd("GIT_LFS_SKIP_SMUDGE=1 %s submodule update --recursive --no-fetch" % (ud.basecmd), d, quiet=True, workdir=ud.destdir)
+            cmdprefix = ""
+            # Avoid LFS smudging (replacing the LFS pointers with the actual content) when LFS shouldn't be used but git-lfs is installed.
+            if not self._need_lfs(ud):
+                cmdprefix = "GIT_LFS_SKIP_SMUDGE=1 "
+            runfetchcmd("%s%s submodule update --recursive --no-fetch" % (cmdprefix, ud.basecmd), d, quiet=True, workdir=ud.destdir)
     def clean(self, ud, d):
         def clean_submodule(ud, url, module, modpath, workdir, d):
             url += ";bareclone=1;nobranch=1"