| Message ID | 20260430123847.25046-2-marcio.henriques@ctw.bmwgroup.com |
|---|---|
| State | New |
| Headers | show |
| Series | Subject: [RFC PATCH] bitbake: fetch2/git: add switch to disable fast shallow path | expand |
Hi, On Thu Apr 30, 2026 at 2:38 PM CEST, Marcio Henriques via lists.openembedded.org wrote: [...] > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > index 8d8e8b8b..1def16e7 100644 > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > @@ -390,6 +390,25 @@ overview of their function and contents. > # This defaults to enabled if both BB_GIT_SHALLOW and > # BB_GENERATE_MIRROR_TARBALLS are enabled > BB_GENERATE_SHALLOW_TARBALLS ?= "1" > + There are some whitespaces here you can remove. > + :term:`BB_GIT_SHALLOW_SKIP_FAST` > + When :term:`BB_GIT_SHALLOW` is enabled, BitBake will by default attempt a > + fast initial shallow clone directly from the upstream repository, bypassing > + the creation of a local full clone in :term:`DL_DIR`. Setting this variable > + to ``"1"`` disables that fast path so the fetcher always creates and > + maintains a local clone in :term:`DL_DIR`, allowing subsequent builds to > + fetch only the delta of changes rather than re-downloading the full shallow > + history. > + > + This is useful in CI environments where shallow tarballs are treated as > + fallback artifacts and incremental network efficiency is preferred. > + > + Example usage:: > + > + BB_GIT_SHALLOW ?= "1" > + BB_GIT_SHALLOW_SKIP_FAST ?= "1" > + > + See also :term:`BB_GIT_SHALLOW` and :term:`BB_GIT_SHALLOW_DEPTH`. This paragraph should be indented with 3 spaces below :term:`BB_GIT_SHALLOW_SKIP_FAST`. Antonin
On Thu, 2026-04-30 at 13:38 +0100, Marcio Henriques via lists.openembedded.org wrote: > In some CI environments, shallow tarballs are preferred as fallback artifacts, > not as the primary source path. > > Today, enabling shallow mode also enables an initial fast shallow path that may > avoid creating/using the regular local clone flow in DL_DIR. This can reduce > the effectiveness of incremental fetches and lead to larger re-downloads. > > Add BB_GIT_SHALLOW_SKIP_FAST as an opt-in switch. When set to 1, the fetcher > skips the fast shallow path from initialization and proceeds with the regular > clone/update flow, while still allowing shallow behavior and tarball fallback. > > Default behavior is preserved when the variable is unset. > > Signed-off-by: Marcio Henriques <marcio.henriques@ctw.bmwgroup.com> > --- > .../bitbake-user-manual-ref-variables.rst | 19 +++++++++++++++++++ > lib/bb/fetch2/git.py | 4 +++- > 2 files changed, 22 insertions(+), 1 deletion(-) I'm wondering why you wouldn't just turn off shallow clones in this scenario? I'm also conscious of not wanting to add too many options to the fetcher, it makes the code more of a labyrinth and harder to test/maintain. Would it make sense to use BB_GENERATE_MIRROR_TARBALLS here to change behaviour? That option is usually set for people wanting to build source mirrors... Cheers, Richard
Hello, We want to create a download mirror to store all sources needed to build a specific project. The idea is to use Git shallow tarballs because each tarball has a unique name. This prevents files from being overwritten in the mirror. With full Git tarballs, the name is always the same, so the mirror file gets replaced each time. If something is removed from the upstream repository, we might upload a new tarball that no longer contains a reference used by a recipe that was already integrated. With shallow tarballs, we can safely clean DL_DIR since it's not the source of truth. If something is removed from upstream, we fall back to the mirror and use the shallow tarball. For most of the builds, we want to use the bare clone in DL_DIR/git2 and just update the repo when bumping a SRCREV. The DL_DIR/git2 repositories are available on the machine that triggers the build. The shallow tarball is a fallback for cases where something was removed from the repository (for example, after a force push) and is not present in DL_DIR/git2 or upstream, but was merged at some point. This is why we use Git shallow and set shallow_skip_fast to True. Best regards, Marcio
On Tue, 2026-05-05 at 01:08 -0700, Marcio Henriques via lists.openembedded.org wrote: > We want to create a download mirror to store all sources needed to build a > specific project. > > The idea is to use Git shallow tarballs because each tarball has a unique name. > This prevents files from being overwritten in the mirror. With full Git > tarballs, the name is always the same, so the mirror file gets replaced each > time. If something is removed from the upstream repository, we might upload a > new tarball that no longer contains a reference used by a recipe that was > already integrated. FWIW the git fetcher is specifically coded to avoid removing references. It shouldn't remove old revisions or obsolete heads (e.g. for repos that changed master -> main). If it is doing that, that is something we should fix. > With shallow tarballs, we can safely clean DL_DIR since it's not the source of > truth. If something is removed from upstream, we fall back to the mirror and > use the shallow tarball. > > For most of the builds, we want to use the bare clone in DL_DIR/git2 and just > update the repo when bumping a SRCREV. The DL_DIR/git2 repositories are > available on the machine that triggers the build. > > The shallow tarball is a fallback for cases where something was removed from > the repository (for example, after a force push) and is not present in > DL_DIR/git2 or upstream, but was merged at some point. This is why we use Git > shallow and set shallow_skip_fast to True. I can see why you might decide to do that but it will be a pretty inefficient use of space with an archive for every revision and means turning everything into shallow clones, which has its own challenges. Cheers, Richard
On Tue, May 5, 2026 at 10:41 AM, Richard Purdie wrote: > > FWIW the git fetcher is specifically coded to avoid removing > references. It shouldn't remove old revisions or obsolete heads (e.g. > for repos that changed master -> main). If it is doing that, that is > something we should fix. This is also about dealing with force pushes that may remove something that was previously merged and is no longer available in DL_DIR/git2 or upstream. > > I can see why you might decide to do that but it will be a pretty > inefficient use of space with an archive for every revision and means > turning everything into shallow clones, which has its own challenges. It can actually also be more efficient as the git fetcher fetches all branches and this may require more space than just archiving the source code for the revisions used in the build.
On Mon, 2026-05-11 at 09:14 -0700, Marcio Henriques via lists.openembedded.org wrote: > On Tue, May 5, 2026 at 10:41 AM, Richard Purdie wrote: > > > > FWIW the git fetcher is specifically coded to avoid removing > > references. It shouldn't remove old revisions or obsolete heads (e.g. > > for repos that changed master -> main). If it is doing that, that is > > something we should fix. > This is also about dealing with force pushes that may remove something that > was previously merged and is no longer available in DL_DIR/git2 or upstream. The fetcher should not be removing those things. If it is, we should fix that. > > I can see why you might decide to do that but it will be a pretty > > inefficient use of space with an archive for every revision and means > > turning everything into shallow clones, which has its own challenges. > > It can actually also be more efficient as the git fetcher fetches all branches and this may > require more space than just archiving the source code for the revisions used in the build. Yes and no, in that you have a copy for each revision. Cheers, Richard
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst index 8d8e8b8b..1def16e7 100644 --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst @@ -390,6 +390,25 @@ overview of their function and contents. # This defaults to enabled if both BB_GIT_SHALLOW and # BB_GENERATE_MIRROR_TARBALLS are enabled BB_GENERATE_SHALLOW_TARBALLS ?= "1" + + :term:`BB_GIT_SHALLOW_SKIP_FAST` + When :term:`BB_GIT_SHALLOW` is enabled, BitBake will by default attempt a + fast initial shallow clone directly from the upstream repository, bypassing + the creation of a local full clone in :term:`DL_DIR`. Setting this variable + to ``"1"`` disables that fast path so the fetcher always creates and + maintains a local clone in :term:`DL_DIR`, allowing subsequent builds to + fetch only the delta of changes rather than re-downloading the full shallow + history. + + This is useful in CI environments where shallow tarballs are treated as + fallback artifacts and incremental network efficiency is preferred. + + Example usage:: + + BB_GIT_SHALLOW ?= "1" + BB_GIT_SHALLOW_SKIP_FAST ?= "1" + + See also :term:`BB_GIT_SHALLOW` and :term:`BB_GIT_SHALLOW_DEPTH`. :term:`BB_GIT_SHALLOW_DEPTH` When used with :term:`BB_GENERATE_SHALLOW_TARBALLS`, this variable sets diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py index 21f5f28a..f72173b9 100644 --- a/lib/bb/fetch2/git.py +++ b/lib/bb/fetch2/git.py @@ -196,7 +196,9 @@ class Git(FetchMethod): if ud.bareclone: ud.cloneflags += " --mirror" - ud.shallow_skip_fast = False + # Allow disabling the fast shallow path so DL_DIR keeps a local clone + # that can be incrementally updated in CI. + ud.shallow_skip_fast = d.getVar("BB_GIT_SHALLOW_SKIP_FAST") == "1" ud.shallow = d.getVar("BB_GIT_SHALLOW") == "1" ud.shallow_extra_refs = (d.getVar("BB_GIT_SHALLOW_EXTRA_REFS") or "").split() if 'tag' in ud.parm:
In some CI environments, shallow tarballs are preferred as fallback artifacts, not as the primary source path. Today, enabling shallow mode also enables an initial fast shallow path that may avoid creating/using the regular local clone flow in DL_DIR. This can reduce the effectiveness of incremental fetches and lead to larger re-downloads. Add BB_GIT_SHALLOW_SKIP_FAST as an opt-in switch. When set to 1, the fetcher skips the fast shallow path from initialization and proceeds with the regular clone/update flow, while still allowing shallow behavior and tarball fallback. Default behavior is preserved when the variable is unset. Signed-off-by: Marcio Henriques <marcio.henriques@ctw.bmwgroup.com> --- .../bitbake-user-manual-ref-variables.rst | 19 +++++++++++++++++++ lib/bb/fetch2/git.py | 4 +++- 2 files changed, 22 insertions(+), 1 deletion(-)