diff mbox series

[1/1] bitbake: add BB_GIT_SHALLOW_SKIP_FAST to disable fast shallow mode

Message ID 20260430123847.25046-2-marcio.henriques@ctw.bmwgroup.com
State New
Headers show
Series Subject: [RFC PATCH] bitbake: fetch2/git: add switch to disable fast shallow path | expand

Commit Message

Marcio Henriques April 30, 2026, 12:38 p.m. UTC
In some CI environments, shallow tarballs are preferred as fallback artifacts,
not as the primary source path.

Today, enabling shallow mode also enables an initial fast shallow path that may
avoid creating/using the regular local clone flow in DL_DIR. This can reduce
the effectiveness of incremental fetches and lead to larger re-downloads.

Add BB_GIT_SHALLOW_SKIP_FAST as an opt-in switch. When set to 1, the fetcher
skips the fast shallow path from initialization and proceeds with the regular
clone/update flow, while still allowing shallow behavior and tarball fallback.

Default behavior is preserved when the variable is unset.

Signed-off-by: Marcio Henriques <marcio.henriques@ctw.bmwgroup.com>
---
 .../bitbake-user-manual-ref-variables.rst     | 19 +++++++++++++++++++
 lib/bb/fetch2/git.py                          |  4 +++-
 2 files changed, 22 insertions(+), 1 deletion(-)

Comments

Antonin Godard April 30, 2026, 3:22 p.m. UTC | #1
Hi,

On Thu Apr 30, 2026 at 2:38 PM CEST, Marcio Henriques via lists.openembedded.org wrote:
[...]
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> index 8d8e8b8b..1def16e7 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> @@ -390,6 +390,25 @@ overview of their function and contents.
>           # This defaults to enabled if both BB_GIT_SHALLOW and
>           # BB_GENERATE_MIRROR_TARBALLS are enabled
>           BB_GENERATE_SHALLOW_TARBALLS ?= "1"
> +   

There are some whitespaces here you can remove.

> +      :term:`BB_GIT_SHALLOW_SKIP_FAST`
> +      When :term:`BB_GIT_SHALLOW` is enabled, BitBake will by default attempt a
> +      fast initial shallow clone directly from the upstream repository, bypassing
> +      the creation of a local full clone in :term:`DL_DIR`. Setting this variable
> +      to ``"1"`` disables that fast path so the fetcher always creates and
> +      maintains a local clone in :term:`DL_DIR`, allowing subsequent builds to
> +      fetch only the delta of changes rather than re-downloading the full shallow
> +      history.
> +
> +      This is useful in CI environments where shallow tarballs are treated as
> +      fallback artifacts and incremental network efficiency is preferred.
> +
> +      Example usage::
> +
> +         BB_GIT_SHALLOW ?= "1"
> +         BB_GIT_SHALLOW_SKIP_FAST ?= "1"
> +
> +      See also :term:`BB_GIT_SHALLOW` and :term:`BB_GIT_SHALLOW_DEPTH`.

This paragraph should be indented with 3 spaces below
:term:`BB_GIT_SHALLOW_SKIP_FAST`.

Antonin
Richard Purdie May 1, 2026, 9:29 a.m. UTC | #2
On Thu, 2026-04-30 at 13:38 +0100, Marcio Henriques via lists.openembedded.org wrote:
> In some CI environments, shallow tarballs are preferred as fallback artifacts,
> not as the primary source path.
> 
> Today, enabling shallow mode also enables an initial fast shallow path that may
> avoid creating/using the regular local clone flow in DL_DIR. This can reduce
> the effectiveness of incremental fetches and lead to larger re-downloads.
> 
> Add BB_GIT_SHALLOW_SKIP_FAST as an opt-in switch. When set to 1, the fetcher
> skips the fast shallow path from initialization and proceeds with the regular
> clone/update flow, while still allowing shallow behavior and tarball fallback.
> 
> Default behavior is preserved when the variable is unset.
> 
> Signed-off-by: Marcio Henriques <marcio.henriques@ctw.bmwgroup.com>
> ---
>  .../bitbake-user-manual-ref-variables.rst     | 19 +++++++++++++++++++
>  lib/bb/fetch2/git.py                          |  4 +++-
>  2 files changed, 22 insertions(+), 1 deletion(-)

I'm wondering why you wouldn't just turn off shallow clones in this
scenario?

I'm also conscious of not wanting to add too many options to the
fetcher, it makes the code more of a labyrinth and harder to
test/maintain. Would it make sense to use BB_GENERATE_MIRROR_TARBALLS
here to change behaviour?

That option is usually set for people wanting to build source
mirrors...


Cheers,

Richard
Marcio Henriques May 5, 2026, 8:08 a.m. UTC | #3
Hello,

We want to create a download mirror to store all sources needed to build a
specific project.

The idea is to use Git shallow tarballs because each tarball has a unique name.
This prevents files from being overwritten in the mirror. With full Git
tarballs, the name is always the same, so the mirror file gets replaced each
time. If something is removed from the upstream repository, we might upload a
new tarball that no longer contains a reference used by a recipe that was
already integrated.

With shallow tarballs, we can safely clean DL_DIR since it's not the source of
truth. If something is removed from upstream, we fall back to the mirror and
use the shallow tarball.

For most of the builds, we want to use the bare clone in DL_DIR/git2 and just
update the repo when bumping a SRCREV. The DL_DIR/git2 repositories are
available on the machine that triggers the build.

The shallow tarball is a fallback for cases where something was removed from
the repository (for example, after a force push) and is not present in
DL_DIR/git2 or upstream, but was merged at some point. This is why we use Git
shallow and set shallow_skip_fast to True.

Best regards,
Marcio
Richard Purdie May 5, 2026, 9:41 a.m. UTC | #4
On Tue, 2026-05-05 at 01:08 -0700, Marcio Henriques via lists.openembedded.org wrote:
> We want to create a download mirror to store all sources needed to build a  
> specific project.
>  
> The idea is to use Git shallow tarballs because each tarball has a unique name.  
> This prevents files from being overwritten in the mirror. With full Git  
> tarballs, the name is always the same, so the mirror file gets replaced each  
> time. If something is removed from the upstream repository, we might upload a  
> new tarball that no longer contains a reference used by a recipe that was  
> already integrated.

FWIW the git fetcher is specifically coded to avoid removing
references. It shouldn't remove old revisions or obsolete heads (e.g.
for repos that changed master -> main). If it is doing that, that is
something we should fix.

> With shallow tarballs, we can safely clean DL_DIR since it's not the source of  
> truth. If something is removed from upstream, we fall back to the mirror and  
> use the shallow tarball.
>  
> For most of the builds, we want to use the bare clone in DL_DIR/git2 and just  
> update the repo when bumping a SRCREV. The DL_DIR/git2 repositories are  
> available on the machine that triggers the build.
>  
> The shallow tarball is a fallback for cases where something was removed from  
> the repository (for example, after a force push) and is not present in  
> DL_DIR/git2 or upstream, but was merged at some point. This is why we use Git  
> shallow and set shallow_skip_fast to True.

I can see why you might decide to do that but it will be a pretty
inefficient use of space with an archive for every revision and means
turning everything into shallow clones, which has its own challenges.

Cheers,

Richard
Marcio Henriques May 11, 2026, 4:14 p.m. UTC | #5
On Tue, May 5, 2026 at 10:41 AM, Richard Purdie wrote:

> 
> FWIW the git fetcher is specifically coded to avoid removing
> references. It shouldn't remove old revisions or obsolete heads (e.g.
> for repos that changed master -> main). If it is doing that, that is
> something we should fix.

This is also about dealing with force pushes that may remove something that
was previously merged and is no longer available in DL_DIR/git2 or upstream.

> 
> I can see why you might decide to do that but it will be a pretty
> inefficient use of space with an archive for every revision and means
> turning everything into shallow clones, which has its own challenges.

It can actually also be more efficient as the git fetcher fetches all branches and this may
require more space than just archiving the source code for the revisions used in the build.
Richard Purdie May 12, 2026, 8:15 a.m. UTC | #6
On Mon, 2026-05-11 at 09:14 -0700, Marcio Henriques via lists.openembedded.org wrote:
> On Tue, May 5, 2026 at 10:41 AM, Richard Purdie wrote:
> 
> 
> > FWIW the git fetcher is specifically coded to avoid removing
> > references. It shouldn't remove old revisions or obsolete heads (e.g.
> > for repos that changed master -> main). If it is doing that, that is
> > something we should fix.
> This is also about dealing with force pushes that may remove something that
> was previously merged and is no longer available in DL_DIR/git2 or upstream.

The fetcher should not be removing those things. If it is, we should
fix that.

> > I can see why you might decide to do that but it will be a pretty
> > inefficient use of space with an archive for every revision and means
> > turning everything into shallow clones, which has its own challenges.
> 
> It can actually also be more efficient as the git fetcher fetches all branches and this may
> require more space than just archiving the source code for the revisions used in the build.

Yes and no, in that you have a copy for each revision.

Cheers,

Richard
diff mbox series

Patch

diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
index 8d8e8b8b..1def16e7 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
@@ -390,6 +390,25 @@  overview of their function and contents.
          # This defaults to enabled if both BB_GIT_SHALLOW and
          # BB_GENERATE_MIRROR_TARBALLS are enabled
          BB_GENERATE_SHALLOW_TARBALLS ?= "1"
+   
+      :term:`BB_GIT_SHALLOW_SKIP_FAST`
+      When :term:`BB_GIT_SHALLOW` is enabled, BitBake will by default attempt a
+      fast initial shallow clone directly from the upstream repository, bypassing
+      the creation of a local full clone in :term:`DL_DIR`. Setting this variable
+      to ``"1"`` disables that fast path so the fetcher always creates and
+      maintains a local clone in :term:`DL_DIR`, allowing subsequent builds to
+      fetch only the delta of changes rather than re-downloading the full shallow
+      history.
+
+      This is useful in CI environments where shallow tarballs are treated as
+      fallback artifacts and incremental network efficiency is preferred.
+
+      Example usage::
+
+         BB_GIT_SHALLOW ?= "1"
+         BB_GIT_SHALLOW_SKIP_FAST ?= "1"
+
+      See also :term:`BB_GIT_SHALLOW` and :term:`BB_GIT_SHALLOW_DEPTH`.
 
    :term:`BB_GIT_SHALLOW_DEPTH`
       When used with :term:`BB_GENERATE_SHALLOW_TARBALLS`, this variable sets
diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 21f5f28a..f72173b9 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -196,7 +196,9 @@  class Git(FetchMethod):
         if ud.bareclone:
             ud.cloneflags += " --mirror"
 
-        ud.shallow_skip_fast = False
+        # Allow disabling the fast shallow path so DL_DIR keeps a local clone
+        # that can be incrementally updated in CI.
+        ud.shallow_skip_fast = d.getVar("BB_GIT_SHALLOW_SKIP_FAST") == "1"
         ud.shallow = d.getVar("BB_GIT_SHALLOW") == "1"
         ud.shallow_extra_refs = (d.getVar("BB_GIT_SHALLOW_EXTRA_REFS") or "").split()
         if 'tag' in ud.parm: