diff mbox series

[3/8] fetch2: Check for git-lfs existence before using it

Message ID 20250423151901.1300944-4-philip.lorenz@bmw.de
State Accepted, archived
Commit 5818367db9b261b7e07c347d38044e6cba8f9727
Headers show
Series Fix gitsm LFS support | expand

Commit Message

Philip Lorenz April 23, 2025, 3:18 p.m. UTC
So far, existence of `git-lfs` was only checked during unpacking. As the
binary is also used in earlier steps also check for its existence there.

Additionally, factor out the LFS existence check into a dedicated
function and call it wherever git-lfs is used for the first time.

Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
---
 lib/bb/fetch2/git.py  | 26 ++++++++++++++++++--------
 lib/bb/tests/fetch.py | 26 ++++++++++----------------
 2 files changed, 28 insertions(+), 24 deletions(-)

Comments

Mathieu Dubois-Briand April 24, 2025, 7:10 a.m. UTC | #1
On Wed Apr 23, 2025 at 5:18 PM CEST, Philip Lorenz via lists.openembedded.org wrote:
> So far, existence of `git-lfs` was only checked during unpacking. As the
> binary is also used in earlier steps also check for its existence there.
>
> Additionally, factor out the LFS existence check into a dedicated
> function and call it wherever git-lfs is used for the first time.
>
> Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
> ---

Hi Philip,

Thanks for your patch.

I was not able to test this series on the autobuilder so far, as I
believe git-lfs is not installed in there. Now maybe this has to be a
new requirement on the host and maybe this is fine. Just two points:
- Is that really a new requirement and is it needed?
- We probably need to update a bit of documentation about that.
Philip Lorenz April 24, 2025, 7:48 a.m. UTC | #2
Hi Mathieu,

On 24.04.25 09:10, Mathieu Dubois-Briand wrote:
> Sent from outside the BMW organization  - be CAUTIOUS, particularly with links and attachments.
> Absender außerhalb der BMW Organisation - Bitte VORSICHT beim Öffnen von Links und Anhängen.
> -----------------------------------------------------------------------------------------------
>
> On Wed Apr 23, 2025 at 5:18 PM CEST, Philip Lorenz via lists.openembedded.org wrote:
>> So far, existence of `git-lfs` was only checked during unpacking. As the
>> binary is also used in earlier steps also check for its existence there.
>>
>> Additionally, factor out the LFS existence check into a dedicated
>> function and call it wherever git-lfs is used for the first time.
>>
>> Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
>> ---
>
>
> I was not able to test this series on the autobuilder so far, as I
> believe git-lfs is not installed in there. Now maybe this has to be a
> new requirement on the host and maybe this is fine. Just two points:
> - Is that really a new requirement and is it needed?

The requirement isn't new but simply was never checked before (so users 
would've seen a git-lfs not found error rather than the helpful error 
message that is raised during unpacking). Support for LFS can be 
explicitly disabled for per URI by adding lfs=0 to its parameter list. 
git-lfs is already needed during downloading as this is the point in 
time at which the large file objects need to be retrieved. During unpack 
pointers in the repository (this process is called smudging) are 
replaced by those downloaded objects.

However I just noticed that I was missing parentheses in patch 6 (it 
should have been `url += ";lfs=%s" % ("1" if self._need_lfs(ud) else 
"0")`) which would explain any possible failures (for example in 
vulkan-samples). If there are no further comments on this series I'll 
push a revised series in the next couple of days.

> - We probably need to update a bit of documentation about that.

I would've hoped that this is already documented since the LFS feature 
has been around for quite a while (but have to admit that I couldn't 
find anything on it inside the bitbake documentation). This patch series 
merely ensures that it also works correctly when used via the "gitsm" 
fetcher (and while at it improves some of the existing implementation). 
I'm happy to write up something for the documentation but I'd suggest to 
send this in via a dedicated patch series.

Philip
Alexander Kanavin April 24, 2025, 11:21 a.m. UTC | #3
We probably should install git-lfs on the autobuilder workers to
prevent this from regressing, but keep it as a 'soft' host tool
dependency otherwise?

Alex

On Thu, 24 Apr 2025 at 09:48, Philip Lorenz via lists.openembedded.org
<philip.lorenz=bmw.de@lists.openembedded.org> wrote:
>
> Hi Mathieu,
>
> On 24.04.25 09:10, Mathieu Dubois-Briand wrote:
> > Sent from outside the BMW organization  - be CAUTIOUS, particularly with links and attachments.
> > Absender außerhalb der BMW Organisation - Bitte VORSICHT beim Öffnen von Links und Anhängen.
> > -----------------------------------------------------------------------------------------------
> >
> > On Wed Apr 23, 2025 at 5:18 PM CEST, Philip Lorenz via lists.openembedded.org wrote:
> >> So far, existence of `git-lfs` was only checked during unpacking. As the
> >> binary is also used in earlier steps also check for its existence there.
> >>
> >> Additionally, factor out the LFS existence check into a dedicated
> >> function and call it wherever git-lfs is used for the first time.
> >>
> >> Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
> >> ---
> >
> >
> > I was not able to test this series on the autobuilder so far, as I
> > believe git-lfs is not installed in there. Now maybe this has to be a
> > new requirement on the host and maybe this is fine. Just two points:
> > - Is that really a new requirement and is it needed?
>
> The requirement isn't new but simply was never checked before (so users
> would've seen a git-lfs not found error rather than the helpful error
> message that is raised during unpacking). Support for LFS can be
> explicitly disabled for per URI by adding lfs=0 to its parameter list.
> git-lfs is already needed during downloading as this is the point in
> time at which the large file objects need to be retrieved. During unpack
> pointers in the repository (this process is called smudging) are
> replaced by those downloaded objects.
>
> However I just noticed that I was missing parentheses in patch 6 (it
> should have been `url += ";lfs=%s" % ("1" if self._need_lfs(ud) else
> "0")`) which would explain any possible failures (for example in
> vulkan-samples). If there are no further comments on this series I'll
> push a revised series in the next couple of days.
>
> > - We probably need to update a bit of documentation about that.
>
> I would've hoped that this is already documented since the LFS feature
> has been around for quite a while (but have to admit that I couldn't
> find anything on it inside the bitbake documentation). This patch series
> merely ensures that it also works correctly when used via the "gitsm"
> fetcher (and while at it improves some of the existing implementation).
> I'm happy to write up something for the documentation but I'd suggest to
> send this in via a dedicated patch series.
>
> Philip
>
>
> --
> Philip Lorenz
> BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm
> -------------------------------------------------------------------------
> BMW Car IT GmbH
> Management: Chris Brandt, Michael Böttrich, Christian Salzmann
> Domicile and Court of Registry: München HRB 134810
> -------------------------------------------------------------------------
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#17552): https://lists.openembedded.org/g/bitbake-devel/message/17552
> Mute This Topic: https://lists.openembedded.org/mt/112416098/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Richard Purdie April 24, 2025, 2:29 p.m. UTC | #4
On Thu, 2025-04-24 at 13:21 +0200, Alexander Kanavin via
lists.openembedded.org wrote:
> We probably should install git-lfs on the autobuilder workers to
> prevent this from regressing, but keep it as a 'soft' host tool
> dependency otherwise?

I'm a little reluctant. We try to keep the autobuilders "lean" so that
we test our minimal requirements work. Adding this in makes it all too
easy to then have lfs dependencies creep in.

The autobuilders install should in theory match our minimum
requirements as documented in our docs.

Cheers,

Richard
diff mbox series

Patch

diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 39c183927..b32b18797 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -324,6 +324,11 @@  class Git(FetchMethod):
         return False
 
     def lfs_need_update(self, ud, d):
+        if not self._need_lfs(ud):
+            return False
+
+        self._ensure_git_lfs(d, ud)
+
         if self.clonedir_need_update(ud, d):
             return True
 
@@ -507,7 +512,9 @@  class Git(FetchMethod):
     def lfs_fetch(self, ud, d, clonedir, revision, fetchall=False, progresshandler=None):
         """Helper method for fetching Git LFS data"""
         try:
-            if self._need_lfs(ud) and self._contains_lfs(ud, d, clonedir) and self._find_git_lfs(d) and len(revision):
+            if self._need_lfs(ud) and self._contains_lfs(ud, d, clonedir) and len(revision):
+                self._ensure_git_lfs(d, ud)
+
                 # Using worktree with the revision because .lfsconfig may exists
                 worktree_add_cmd = "%s worktree add wt %s" % (ud.basecmd, revision)
                 runfetchcmd(worktree_add_cmd, d, log=progresshandler, workdir=clonedir)
@@ -740,11 +747,11 @@  class Git(FetchMethod):
         runfetchcmd("%s remote set-url origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=destdir)
 
         if self._contains_lfs(ud, d, destdir):
-            if need_lfs and not self._find_git_lfs(d):
-                raise bb.fetch2.FetchError("Repository %s has LFS content, install git-lfs on host to download (or set lfs=0 to ignore it)" % (repourl))
-            elif not need_lfs:
+            if not need_lfs:
                 bb.note("Repository %s has LFS content but it is not being fetched" % (repourl))
             else:
+                self._ensure_git_lfs(d, ud)
+
                 runfetchcmd("%s lfs install --local" % ud.basecmd, d, workdir=destdir)
 
         if not ud.nocheckout:
@@ -807,7 +814,7 @@  class Git(FetchMethod):
         Verifies whether the LFS objects for requested revisions have already been downloaded
         """
         # Bail out early if this repository doesn't use LFS
-        if not self._need_lfs(ud) or not self._contains_lfs(ud, d, wd):
+        if not self._contains_lfs(ud, d, wd):
             return True
 
         # The Git LFS specification specifies ([1]) the LFS folder layout so it should be safe to check for file
@@ -859,11 +866,14 @@  class Git(FetchMethod):
             pass
         return False
 
-    def _find_git_lfs(self, d):
+    def _ensure_git_lfs(self, d, ud):
         """
-        Return True if git-lfs can be found, False otherwise.
+        Ensures that git-lfs is available, raising a FetchError if it isn't.
         """
-        return shutil.which("git-lfs", path=d.getVar('PATH')) is not None
+        if shutil.which("git-lfs", path=d.getVar('PATH')) is None:
+            raise bb.fetch2.FetchError(
+                "Repository %s has LFS content, install git-lfs on host to download (or set lfs=0 "
+                "to ignore it)" % self._get_repo_url(ud))
 
     def _get_repo_url(self, ud):
         """
diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index f0c628524..a936af6fb 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -9,6 +9,7 @@ 
 import contextlib
 import shutil
 import unittest
+import unittest.mock
 import urllib.parse
 import hashlib
 import tempfile
@@ -2413,18 +2414,17 @@  class GitLfsTest(FetcherTest):
         # Careful: suppress initial attempt at downloading
         fetcher, ud = self.fetch(uri=None, download=False)
 
-        # Artificially assert that git-lfs is not installed, so
-        # we can verify a failure to unpack in it's absence.
-        old_find_git_lfs = ud.method._find_git_lfs
-        try:
-            # If git-lfs cannot be found, the unpack should throw an error
+        # If git-lfs cannot be found, the download should throw an error
+        with unittest.mock.patch("shutil.which", return_value=None):
             with self.assertRaises(bb.fetch2.FetchError):
                 fetcher.download()
-                ud.method._find_git_lfs = lambda d: False
+
+        fetcher.download()
+        # If git-lfs cannot be found, the unpack should throw an error
+        with self.assertRaises(bb.fetch2.FetchError):
+            with unittest.mock.patch("shutil.which", return_value=None):
                 shutil.rmtree(self.gitdir, ignore_errors=True)
                 fetcher.unpack(self.d.getVar('WORKDIR'))
-        finally:
-            ud.method._find_git_lfs = old_find_git_lfs
 
     def test_lfs_disabled_not_installed(self):
         uri = 'git://%s;protocol=file;lfs=0;branch=master' % self.srcdir
@@ -2433,17 +2433,11 @@  class GitLfsTest(FetcherTest):
         # Careful: suppress initial attempt at downloading
         fetcher, ud = self.fetch(uri=None, download=False)
 
-        # Artificially assert that git-lfs is not installed, so
-        # we can verify a failure to unpack in it's absence.
-        old_find_git_lfs = ud.method._find_git_lfs
-        try:
-            # Even if git-lfs cannot be found, the unpack should be successful
+        # Even if git-lfs cannot be found, the download / unpack should be successful
+        with unittest.mock.patch("shutil.which", return_value=None):
             fetcher.download()
-            ud.method._find_git_lfs = lambda d: False
             shutil.rmtree(self.gitdir, ignore_errors=True)
             fetcher.unpack(self.d.getVar('WORKDIR'))
-        finally:
-            ud.method._find_git_lfs = old_find_git_lfs
 
 class GitURLWithSpacesTest(FetcherTest):
     test_git_urls = {