Message ID | 20250220172706.3850722-2-stefan-koch@siemens.com |
---|---|
State | New |
Headers | show |
Series | [v3,1/4] fetch2/git: Add support for fast initial shallow fetch | expand |
Hi Stefan, Please Cc the docs mailing list as well +Cc docs On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org wrote: > Signed-off-by: Stefan Koch <stefan-koch@siemens.com> > --- > .../bitbake-user-manual-ref-variables.rst | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > index ad219b531..f781c004e 100644 > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > @@ -315,11 +315,17 @@ overview of their function and contents. > mirror tarball. If the shallow mirror tarball cannot be fetched, it will > try to fetch the full mirror tarball and use that. > > - When a mirror tarball is not available, a full git clone will be performed > - regardless of whether this variable is set or not. Support for shallow > - clones is not currently implemented as git does not directly support > - shallow cloning a particular git commit hash (it only supports cloning > - from a tag or branch reference). > + This setting causes an initial shallow clone instead of an initial full bare clone. > + The amount of data transferred during the initial clone will be significantly reduced. > + > + For updates, when keeping the cache within the download directory, I guess by "updates" you mean pointing to a different commit hash to fetch from the remote? Is there a possible setup where the cache is not within the download directory? Not sure to understand the implications here? Cheers, Quentin
On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote: > Hi Stefan, > > Please Cc the docs mailing list as well > > +Cc docs Thanks. I'll send updated docs in v4 patch series. > > On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org wrote: > > Signed-off-by: Stefan Koch <stefan-koch@siemens.com> > > --- > > .../bitbake-user-manual-ref-variables.rst | 16 > > +++++++++++----- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref- > > variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref- > > variables.rst > > index ad219b531..f781c004e 100644 > > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > @@ -315,11 +315,17 @@ overview of their function and contents. > > mirror tarball. If the shallow mirror tarball cannot be > > fetched, it will > > try to fetch the full mirror tarball and use that. > > > > - When a mirror tarball is not available, a full git clone > > will be performed > > - regardless of whether this variable is set or not. Support > > for shallow > > - clones is not currently implemented as git does not directly > > support > > - shallow cloning a particular git commit hash (it only > > supports cloning > > - from a tag or branch reference). > > + This setting causes an initial shallow clone instead of an > > initial full bare clone. > > + The amount of data transferred during the initial clone will > > be significantly reduced. > > + > > + For updates, when keeping the cache within the download > > directory, > > I guess by "updates" you mean pointing to a different commit hash to > fetch from the remote? Every time when the SRCREV changes > > Is there a possible setup where the cache is not within the download > directory? Not sure to understand the implications here? Regardless of cleaning the cache the effect is the same. When not using shallow and have a cache, the delta downloads for SRCREV change are smaller. > > Cheers, > Quentin -- Stefan Koch Siemens AG www.siemens.com
Hi Stefan, On 2/21/25 5:23 PM, Koch, Stefan wrote: > On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote: >> Hi Stefan, >> >> Please Cc the docs mailing list as well >> >> +Cc docs > Thanks. I'll send updated docs in v4 patch series. >> >> On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org wrote: >>> Signed-off-by: Stefan Koch <stefan-koch@siemens.com> >>> --- >>> .../bitbake-user-manual-ref-variables.rst | 16 >>> +++++++++++----- >>> 1 file changed, 11 insertions(+), 5 deletions(-) >>> >>> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref- >>> variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref- >>> variables.rst >>> index ad219b531..f781c004e 100644 >>> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst >>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst >>> @@ -315,11 +315,17 @@ overview of their function and contents. >>> mirror tarball. If the shallow mirror tarball cannot be >>> fetched, it will >>> try to fetch the full mirror tarball and use that. >>> >>> - When a mirror tarball is not available, a full git clone >>> will be performed >>> - regardless of whether this variable is set or not. Support >>> for shallow >>> - clones is not currently implemented as git does not directly >>> support >>> - shallow cloning a particular git commit hash (it only >>> supports cloning >>> - from a tag or branch reference). >>> + This setting causes an initial shallow clone instead of an >>> initial full bare clone. >>> + The amount of data transferred during the initial clone will >>> be significantly reduced. >>> + >>> + For updates, when keeping the cache within the download >>> directory, >> >> I guess by "updates" you mean pointing to a different commit hash to >> fetch from the remote? > Every time when the SRCREV changes I can then suggest: When keeping the cache within the download directory, a change to :term:`SRCREV` may induce a significantly higher data transfer because entirely new shallow clones are required. >> >> Is there a possible setup where the cache is not within the download >> directory? Not sure to understand the implications here? > Regardless of cleaning the cache the effect is the same. > When not using shallow and have a cache, the delta downloads for SRCREV > change are smaller. OK, I think I see what you mean here. Does "cache" for you mean the git repo for the recipe in DL_DIR? I think the point you're trying to make is: If DL_DIR is persistent between builds, using shallow clones may induce significantly higher data transfer than when using bare clones, because entirely new shallow clones are required whereas bare clones may simply be updated. Did I get that right? I'm being careful about the use of "cache" here because we already have sstate-cache, hashserv that does operate like some kind of caching, bitbake parsing cache, etc... Cheers, Quentin
On Fri, 2025-02-21 at 17:31 +0100, Quentin Schulz wrote: > Hi Stefan, > > On 2/21/25 5:23 PM, Koch, Stefan wrote: > > On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote: > > > Hi Stefan, > > > > > > Please Cc the docs mailing list as well > > > > > > +Cc docs > > Thanks. I'll send updated docs in v4 patch series. > > > > > > On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org > > > wrote: > > > > Signed-off-by: Stefan Koch <stefan-koch@siemens.com> > > > > --- > > > > .../bitbake-user-manual-ref-variables.rst | 16 > > > > +++++++++++----- > > > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref- > > > > variables.rst b/doc/bitbake-user-manual/bitbake-user-manual- > > > > ref- > > > > variables.rst > > > > index ad219b531..f781c004e 100644 > > > > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref- > > > > variables.rst > > > > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref- > > > > variables.rst > > > > @@ -315,11 +315,17 @@ overview of their function and contents. > > > > mirror tarball. If the shallow mirror tarball cannot > > > > be > > > > fetched, it will > > > > try to fetch the full mirror tarball and use that. > > > > > > > > - When a mirror tarball is not available, a full git clone > > > > will be performed > > > > - regardless of whether this variable is set or not. > > > > Support > > > > for shallow > > > > - clones is not currently implemented as git does not > > > > directly > > > > support > > > > - shallow cloning a particular git commit hash (it only > > > > supports cloning > > > > - from a tag or branch reference). > > > > + This setting causes an initial shallow clone instead of > > > > an > > > > initial full bare clone. > > > > + The amount of data transferred during the initial clone > > > > will > > > > be significantly reduced. > > > > + > > > > + For updates, when keeping the cache within the download > > > > directory, > > > > > > I guess by "updates" you mean pointing to a different commit hash > > > to > > > fetch from the remote? > > Every time when the SRCREV changes > > I can then suggest: > > When keeping the cache within the download directory, a change to > :term:`SRCREV` may induce a significantly higher data transfer > because > entirely new shallow clones are required. > > > > > > > Is there a possible setup where the cache is not within the > > > download > > > directory? Not sure to understand the implications here? > > Regardless of cleaning the cache the effect is the same. > > When not using shallow and have a cache, the delta downloads for > > SRCREV > > change are smaller. > > OK, I think I see what you mean here. Does "cache" for you mean the > git > repo for the recipe in DL_DIR? > > I think the point you're trying to make is: > > If DL_DIR is persistent between builds, using shallow clones may > induce > significantly higher data transfer than when using bare clones, > because > entirely new shallow clones are required whereas bare clones may > simply > be updated. > > Did I get that right? I'm being careful about the use of "cache" here > because we already have sstate-cache, hashserv that does operate like > some kind of caching, bitbake parsing cache, etc... That was what I have prepared for now: This setting causes an initial shallow clone instead of an initial full bare clone. The amount of data transferred during the initial clone will be significantly reduced. However, every time the source revision changes, regardless of whether the cache within the download directory is cleaned up or not, the data transfer may be significantly higher because entirely new shallow clones are required for each source revision change. Over time, numerous shallow clones may cumulatively transfer the same amount of data as an initial full bare clone. This is especially the case with very large repositories. Existing initial full bare clones, created without this setting, will still be utilized. Thanks for the examples. I'll look how to integrate it. Within the DL_DIR, there is a subdir "git" that contains the bare mirror clones. The shallow tarballs are directly within DL_DIR > > Cheers, > Quentin -- Stefan Koch Siemens AG www.siemens.com
On 2/21/25 5:35 PM, Koch, Stefan wrote: > On Fri, 2025-02-21 at 17:31 +0100, Quentin Schulz wrote: >> Hi Stefan, >> >> On 2/21/25 5:23 PM, Koch, Stefan wrote: >>> On Fri, 2025-02-21 at 11:52 +0100, Quentin Schulz wrote: >>>> Hi Stefan, >>>> >>>> Please Cc the docs mailing list as well >>>> >>>> +Cc docs >>> Thanks. I'll send updated docs in v4 patch series. >>>> >>>> On 2/20/25 6:27 PM, Koch, Stefan via lists.openembedded.org >>>> wrote: >>>>> Signed-off-by: Stefan Koch <stefan-koch@siemens.com> >>>>> --- >>>>> .../bitbake-user-manual-ref-variables.rst | 16 >>>>> +++++++++++----- >>>>> 1 file changed, 11 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref- >>>>> variables.rst b/doc/bitbake-user-manual/bitbake-user-manual- >>>>> ref- >>>>> variables.rst >>>>> index ad219b531..f781c004e 100644 >>>>> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref- >>>>> variables.rst >>>>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref- >>>>> variables.rst >>>>> @@ -315,11 +315,17 @@ overview of their function and contents. >>>>> mirror tarball. If the shallow mirror tarball cannot >>>>> be >>>>> fetched, it will >>>>> try to fetch the full mirror tarball and use that. >>>>> >>>>> - When a mirror tarball is not available, a full git clone >>>>> will be performed >>>>> - regardless of whether this variable is set or not. >>>>> Support >>>>> for shallow >>>>> - clones is not currently implemented as git does not >>>>> directly >>>>> support >>>>> - shallow cloning a particular git commit hash (it only >>>>> supports cloning >>>>> - from a tag or branch reference). >>>>> + This setting causes an initial shallow clone instead of >>>>> an >>>>> initial full bare clone. >>>>> + The amount of data transferred during the initial clone >>>>> will >>>>> be significantly reduced. >>>>> + >>>>> + For updates, when keeping the cache within the download >>>>> directory, >>>> >>>> I guess by "updates" you mean pointing to a different commit hash >>>> to >>>> fetch from the remote? >>> Every time when the SRCREV changes >> >> I can then suggest: >> >> When keeping the cache within the download directory, a change to >> :term:`SRCREV` may induce a significantly higher data transfer >> because >> entirely new shallow clones are required. >> >>>> >>>> Is there a possible setup where the cache is not within the >>>> download >>>> directory? Not sure to understand the implications here? >>> Regardless of cleaning the cache the effect is the same. >>> When not using shallow and have a cache, the delta downloads for >>> SRCREV >>> change are smaller. >> >> OK, I think I see what you mean here. Does "cache" for you mean the >> git >> repo for the recipe in DL_DIR? >> >> I think the point you're trying to make is: >> >> If DL_DIR is persistent between builds, using shallow clones may >> induce >> significantly higher data transfer than when using bare clones, >> because >> entirely new shallow clones are required whereas bare clones may >> simply >> be updated. >> >> Did I get that right? I'm being careful about the use of "cache" here >> because we already have sstate-cache, hashserv that does operate like >> some kind of caching, bitbake parsing cache, etc... > > That was what I have prepared for now: > > This setting causes an initial shallow clone instead of an initial full > bare clone. > The amount of data transferred during the initial clone will be > significantly reduced. > > However, every time the source revision changes, regardless of > whether the cache within the download directory is cleaned up or not, Would suggest: regardless of the cache within the download directory having been cleaned up or not But nothing blocking :) > the data transfer may be significantly higher because entirely new > shallow clones are required for each source revision change. > Over time, numerous shallow clones may cumulatively transfer > the same amount of data as an initial full bare clone. > This is especially the case with very large repositories. > > Existing initial full bare clones, created without this setting, > will still be utilized. > This new wording is clear to me, thanks! Cheers, Quentin
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst index ad219b531..f781c004e 100644 --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst @@ -315,11 +315,17 @@ overview of their function and contents. mirror tarball. If the shallow mirror tarball cannot be fetched, it will try to fetch the full mirror tarball and use that. - When a mirror tarball is not available, a full git clone will be performed - regardless of whether this variable is set or not. Support for shallow - clones is not currently implemented as git does not directly support - shallow cloning a particular git commit hash (it only supports cloning - from a tag or branch reference). + This setting causes an initial shallow clone instead of an initial full bare clone. + The amount of data transferred during the initial clone will be significantly reduced. + + For updates, when keeping the cache within the download directory, + the data transfer may be significantly higher because entirely new shallow clones are required. + Over time, numerous shallow clones may cumulatively + transfer the same amount of data as an initial full bare clone. + This is especially the case with very large repositories. + + Existing initial full bare clones, created without this setting, + will still be utilized. See also :term:`BB_GIT_SHALLOW_DEPTH` and :term:`BB_GENERATE_SHALLOW_TARBALLS`.
Signed-off-by: Stefan Koch <stefan-koch@siemens.com> --- .../bitbake-user-manual-ref-variables.rst | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)