diff mbox series

[v3,2/4] bitbake-user-manual: Update documentation for fast `BB_GIT_SHALLOW`

Message ID 20250220172706.3850722-2-stefan-koch@siemens.com
State New
Headers show
Series [v3,1/4] fetch2/git: Add support for fast initial shallow fetch | expand

Commit Message

Stefan Koch Feb. 20, 2025, 5:27 p.m. UTC
Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
---
 .../bitbake-user-manual-ref-variables.rst        | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Quentin Schulz Feb. 21, 2025, 10:52 a.m. UTC | #1
Hi Stefan,

Please Cc the docs mailing list as well

+Cc docs

On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org wrote:
> Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
> ---
>   .../bitbake-user-manual-ref-variables.rst        | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> index ad219b531..f781c004e 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> @@ -315,11 +315,17 @@ overview of their function and contents.
>         mirror tarball. If the shallow mirror tarball cannot be fetched, it will
>         try to fetch the full mirror tarball and use that.
>   
> -      When a mirror tarball is not available, a full git clone will be performed
> -      regardless of whether this variable is set or not. Support for shallow
> -      clones is not currently implemented as git does not directly support
> -      shallow cloning a particular git commit hash (it only supports cloning
> -      from a tag or branch reference).
> +      This setting causes an initial shallow clone instead of an initial full bare clone.
> +      The amount of data transferred during the initial clone will be significantly reduced.
> +
> +      For updates, when keeping the cache within the download directory,

I guess by "updates" you mean pointing to a different commit hash to 
fetch from the remote?

Is there a possible setup where the cache is not within the download 
directory? Not sure to understand the implications here?

Cheers,
Quentin
Stefan Koch Feb. 21, 2025, 4:23 p.m. UTC | #2
On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote:
> Hi Stefan,
> 
> Please Cc the docs mailing list as well
> 
> +Cc docs
Thanks. I'll send updated docs in v4 patch series.
> 
> On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org wrote:
> > Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
> > ---
> >   .../bitbake-user-manual-ref-variables.rst        | 16
> > +++++++++++-----
> >   1 file changed, 11 insertions(+), 5 deletions(-)
> > 
> > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-
> > variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-
> > variables.rst
> > index ad219b531..f781c004e 100644
> > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> > @@ -315,11 +315,17 @@ overview of their function and contents.
> >         mirror tarball. If the shallow mirror tarball cannot be
> > fetched, it will
> >         try to fetch the full mirror tarball and use that.
> >   
> > -      When a mirror tarball is not available, a full git clone
> > will be performed
> > -      regardless of whether this variable is set or not. Support
> > for shallow
> > -      clones is not currently implemented as git does not directly
> > support
> > -      shallow cloning a particular git commit hash (it only
> > supports cloning
> > -      from a tag or branch reference).
> > +      This setting causes an initial shallow clone instead of an
> > initial full bare clone.
> > +      The amount of data transferred during the initial clone will
> > be significantly reduced.
> > +
> > +      For updates, when keeping the cache within the download
> > directory,
> 
> I guess by "updates" you mean pointing to a different commit hash to 
> fetch from the remote?
Every time when the SRCREV changes
> 
> Is there a possible setup where the cache is not within the download 
> directory? Not sure to understand the implications here?
Regardless of cleaning the cache the effect is the same.
When not using shallow and have a cache, the delta downloads for SRCREV
change are smaller.
> 
> Cheers,
> Quentin

-- 
Stefan Koch
Siemens AG
www.siemens.com
Quentin Schulz Feb. 21, 2025, 4:31 p.m. UTC | #3
Hi Stefan,

On 2/21/25 5:23 PM, Koch, Stefan wrote:
> On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote:
>> Hi Stefan,
>>
>> Please Cc the docs mailing list as well
>>
>> +Cc docs
> Thanks. I'll send updated docs in v4 patch series.
>>
>> On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org wrote:
>>> Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
>>> ---
>>>    .../bitbake-user-manual-ref-variables.rst        | 16
>>> +++++++++++-----
>>>    1 file changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-
>>> variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-
>>> variables.rst
>>> index ad219b531..f781c004e 100644
>>> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>>> @@ -315,11 +315,17 @@ overview of their function and contents.
>>>          mirror tarball. If the shallow mirror tarball cannot be
>>> fetched, it will
>>>          try to fetch the full mirror tarball and use that.
>>>    
>>> -      When a mirror tarball is not available, a full git clone
>>> will be performed
>>> -      regardless of whether this variable is set or not. Support
>>> for shallow
>>> -      clones is not currently implemented as git does not directly
>>> support
>>> -      shallow cloning a particular git commit hash (it only
>>> supports cloning
>>> -      from a tag or branch reference).
>>> +      This setting causes an initial shallow clone instead of an
>>> initial full bare clone.
>>> +      The amount of data transferred during the initial clone will
>>> be significantly reduced.
>>> +
>>> +      For updates, when keeping the cache within the download
>>> directory,
>>
>> I guess by "updates" you mean pointing to a different commit hash to
>> fetch from the remote?
> Every time when the SRCREV changes

I can then suggest:

When keeping the cache within the download directory, a change to 
:term:`SRCREV` may induce a significantly higher data transfer because 
entirely new shallow clones are required.

>>
>> Is there a possible setup where the cache is not within the download
>> directory? Not sure to understand the implications here?
> Regardless of cleaning the cache the effect is the same.
> When not using shallow and have a cache, the delta downloads for SRCREV
> change are smaller.

OK, I think I see what you mean here. Does "cache" for you mean the git 
repo for the recipe in DL_DIR?

I think the point you're trying to make is:

If DL_DIR is persistent between builds, using shallow clones may induce 
significantly higher data transfer than when using bare clones, because 
entirely new shallow clones are required whereas bare clones may simply 
be updated.

Did I get that right? I'm being careful about the use of "cache" here 
because we already have sstate-cache, hashserv that does operate like 
some kind of caching, bitbake parsing cache, etc...

Cheers,
Quentin
Stefan Koch Feb. 21, 2025, 4:35 p.m. UTC | #4
On Fri, 2025-02-21 at 17:31 +0100, Quentin Schulz wrote:
> Hi Stefan,
> 
> On 2/21/25 5:23 PM, Koch, Stefan wrote:
> > On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote:
> > > Hi Stefan,
> > > 
> > > Please Cc the docs mailing list as well
> > > 
> > > +Cc docs
> > Thanks. I'll send updated docs in v4 patch series.
> > > 
> > > On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org
> > > wrote:
> > > > Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
> > > > ---
> > > >    .../bitbake-user-manual-ref-variables.rst        | 16
> > > > +++++++++++-----
> > > >    1 file changed, 11 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-
> > > > variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-
> > > > ref-
> > > > variables.rst
> > > > index ad219b531..f781c004e 100644
> > > > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-
> > > > variables.rst
> > > > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-
> > > > variables.rst
> > > > @@ -315,11 +315,17 @@ overview of their function and contents.
> > > >          mirror tarball. If the shallow mirror tarball cannot
> > > > be
> > > > fetched, it will
> > > >          try to fetch the full mirror tarball and use that.
> > > >    
> > > > -      When a mirror tarball is not available, a full git clone
> > > > will be performed
> > > > -      regardless of whether this variable is set or not.
> > > > Support
> > > > for shallow
> > > > -      clones is not currently implemented as git does not
> > > > directly
> > > > support
> > > > -      shallow cloning a particular git commit hash (it only
> > > > supports cloning
> > > > -      from a tag or branch reference).
> > > > +      This setting causes an initial shallow clone instead of
> > > > an
> > > > initial full bare clone.
> > > > +      The amount of data transferred during the initial clone
> > > > will
> > > > be significantly reduced.
> > > > +
> > > > +      For updates, when keeping the cache within the download
> > > > directory,
> > > 
> > > I guess by "updates" you mean pointing to a different commit hash
> > > to
> > > fetch from the remote?
> > Every time when the SRCREV changes
> 
> I can then suggest:
> 
> When keeping the cache within the download directory, a change to 
> :term:`SRCREV` may induce a significantly higher data transfer
> because 
> entirely new shallow clones are required.
> 
> > > 
> > > Is there a possible setup where the cache is not within the
> > > download
> > > directory? Not sure to understand the implications here?
> > Regardless of cleaning the cache the effect is the same.
> > When not using shallow and have a cache, the delta downloads for
> > SRCREV
> > change are smaller.
> 
> OK, I think I see what you mean here. Does "cache" for you mean the
> git 
> repo for the recipe in DL_DIR?
> 
> I think the point you're trying to make is:
> 
> If DL_DIR is persistent between builds, using shallow clones may
> induce 
> significantly higher data transfer than when using bare clones,
> because 
> entirely new shallow clones are required whereas bare clones may
> simply 
> be updated.
> 
> Did I get that right? I'm being careful about the use of "cache" here
> because we already have sstate-cache, hashserv that does operate like
> some kind of caching, bitbake parsing cache, etc...

That was what I have prepared for now:

This setting causes an initial shallow clone instead of an initial full
bare clone.
The amount of data transferred during the initial clone will be
significantly reduced.

However, every time the source revision changes, regardless of
whether the cache within the download directory is cleaned up or not,
the data transfer may be significantly higher because entirely new
shallow clones are required for each source revision change.
Over time, numerous shallow clones may cumulatively transfer
the same amount of data as an initial full bare clone.
This is especially the case with very large repositories.

Existing initial full bare clones, created without this setting,
will still be utilized.

Thanks for the examples. I'll look how to integrate it.

Within the DL_DIR, there is a subdir "git" that contains the bare
mirror clones. The shallow tarballs are directly within DL_DIR
> 
> Cheers,
> Quentin

-- 
Stefan Koch
Siemens AG
www.siemens.com
Quentin Schulz Feb. 21, 2025, 4:47 p.m. UTC | #5
On 2/21/25 5:35 PM, Koch, Stefan wrote:
> On Fri, 2025-02-21 at 17:31 +0100, Quentin Schulz wrote:
>> Hi Stefan,
>>
>> On 2/21/25 5:23 PM, Koch, Stefan wrote:
>>> On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote:
>>>> Hi Stefan,
>>>>
>>>> Please Cc the docs mailing list as well
>>>>
>>>> +Cc docs
>>> Thanks. I'll send updated docs in v4 patch series.
>>>>
>>>> On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org
>>>> wrote:
>>>>> Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
>>>>> ---
>>>>>     .../bitbake-user-manual-ref-variables.rst        | 16
>>>>> +++++++++++-----
>>>>>     1 file changed, 11 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-
>>>>> variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-
>>>>> ref-
>>>>> variables.rst
>>>>> index ad219b531..f781c004e 100644
>>>>> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-
>>>>> variables.rst
>>>>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-
>>>>> variables.rst
>>>>> @@ -315,11 +315,17 @@ overview of their function and contents.
>>>>>           mirror tarball. If the shallow mirror tarball cannot
>>>>> be
>>>>> fetched, it will
>>>>>           try to fetch the full mirror tarball and use that.
>>>>>     
>>>>> -      When a mirror tarball is not available, a full git clone
>>>>> will be performed
>>>>> -      regardless of whether this variable is set or not.
>>>>> Support
>>>>> for shallow
>>>>> -      clones is not currently implemented as git does not
>>>>> directly
>>>>> support
>>>>> -      shallow cloning a particular git commit hash (it only
>>>>> supports cloning
>>>>> -      from a tag or branch reference).
>>>>> +      This setting causes an initial shallow clone instead of
>>>>> an
>>>>> initial full bare clone.
>>>>> +      The amount of data transferred during the initial clone
>>>>> will
>>>>> be significantly reduced.
>>>>> +
>>>>> +      For updates, when keeping the cache within the download
>>>>> directory,
>>>>
>>>> I guess by "updates" you mean pointing to a different commit hash
>>>> to
>>>> fetch from the remote?
>>> Every time when the SRCREV changes
>>
>> I can then suggest:
>>
>> When keeping the cache within the download directory, a change to
>> :term:`SRCREV` may induce a significantly higher data transfer
>> because
>> entirely new shallow clones are required.
>>
>>>>
>>>> Is there a possible setup where the cache is not within the
>>>> download
>>>> directory? Not sure to understand the implications here?
>>> Regardless of cleaning the cache the effect is the same.
>>> When not using shallow and have a cache, the delta downloads for
>>> SRCREV
>>> change are smaller.
>>
>> OK, I think I see what you mean here. Does "cache" for you mean the
>> git
>> repo for the recipe in DL_DIR?
>>
>> I think the point you're trying to make is:
>>
>> If DL_DIR is persistent between builds, using shallow clones may
>> induce
>> significantly higher data transfer than when using bare clones,
>> because
>> entirely new shallow clones are required whereas bare clones may
>> simply
>> be updated.
>>
>> Did I get that right? I'm being careful about the use of "cache" here
>> because we already have sstate-cache, hashserv that does operate like
>> some kind of caching, bitbake parsing cache, etc...
> 
> That was what I have prepared for now:
> 
> This setting causes an initial shallow clone instead of an initial full
> bare clone.
> The amount of data transferred during the initial clone will be
> significantly reduced.
> 
> However, every time the source revision changes, regardless of
> whether the cache within the download directory is cleaned up or not,

Would suggest:

regardless of the cache within the download directory having been 
cleaned up or not

But nothing blocking :)

> the data transfer may be significantly higher because entirely new
> shallow clones are required for each source revision change.
> Over time, numerous shallow clones may cumulatively transfer
> the same amount of data as an initial full bare clone.
> This is especially the case with very large repositories.
> 
> Existing initial full bare clones, created without this setting,
> will still be utilized.
> 

This new wording is clear to me, thanks!

Cheers,
Quentin
diff mbox series

Patch

diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
index ad219b531..f781c004e 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
@@ -315,11 +315,17 @@  overview of their function and contents.
       mirror tarball. If the shallow mirror tarball cannot be fetched, it will
       try to fetch the full mirror tarball and use that.
 
-      When a mirror tarball is not available, a full git clone will be performed
-      regardless of whether this variable is set or not. Support for shallow
-      clones is not currently implemented as git does not directly support
-      shallow cloning a particular git commit hash (it only supports cloning
-      from a tag or branch reference).
+      This setting causes an initial shallow clone instead of an initial full bare clone.
+      The amount of data transferred during the initial clone will be significantly reduced.
+
+      For updates, when keeping the cache within the download directory,
+      the data transfer may be significantly higher because entirely new shallow clones are required.
+      Over time, numerous shallow clones may cumulatively
+      transfer the same amount of data as an initial full bare clone.
+      This is especially the case with very large repositories.
+
+      Existing initial full bare clones, created without this setting,
+      will still be utilized.
 
       See also :term:`BB_GIT_SHALLOW_DEPTH` and
       :term:`BB_GENERATE_SHALLOW_TARBALLS`.