diff mbox series

[RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata

Message ID 20220819165455.270130-1-marex@denx.de
State Accepted, archived
Commit d32e5b0ec2ab85ffad7e56ac5b3160860b732556
Headers show
Series [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata | expand

Commit Message

Marek Vasut Aug. 19, 2022, 4:54 p.m. UTC
The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
single object in the remote repository. This works poorly with gitlab
and github, which use the remote git repository to track its metadata
like merge requests, CI pipelines and such.

Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
and refs/keep-around/* and they all contain massive amount of data that
are useless for the bitbake build purposes. The amount of useless data
can in fact be so massive (e.g. with FDO mesa.git repository) that some
proxies may outright terminate the 'git fetch' connection, and make it
appear as if bitbake got stuck on 'git fetch' with no output.

To avoid fetching all these useless metadata, tweak the git fetcher such
that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
refspecs as those are only available in new git versions.

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Martin Jansa <Martin.Jansa@gmail.com>
Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 lib/bb/fetch2/git.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Peter Kjellerstedt Aug. 20, 2022, 12:06 p.m. UTC | #1
> -----Original Message-----
> From: Marek Vasut <marex@denx.de>
> Sent: den 19 augusti 2022 18:55
> To: bitbake-devel@lists.openembedded.org
> Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org>
> Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
> 
> The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> single object in the remote repository. This works poorly with gitlab
> and github, which use the remote git repository to track its metadata
> like merge requests, CI pipelines and such.
> 
> Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> and refs/keep-around/* and they all contain massive amount of data that
> are useless for the bitbake build purposes. The amount of useless data
> can in fact be so massive (e.g. with FDO mesa.git repository) that some
> proxies may outright terminate the 'git fetch' connection, and make it
> appear as if bitbake got stuck on 'git fetch' with no output.
> 
> To avoid fetching all these useless metadata, tweak the git fetcher such
> that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> refspecs as those are only available in new git versions.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Martin Jansa <Martin.Jansa@gmail.com>
> Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>  lib/bb/fetch2/git.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 4534bd75..b5fc0a51 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -382,7 +382,7 @@ class Git(FetchMethod):
>                runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
> 
>              runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> -            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> +            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
>              if ud.proto.lower() != 'file':
>                  bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
>              progresshandler = GitProgressHandler(d)
> --
> 2.35.1

Seems like the right thing to do. We use Gerrit, which also has its 
metadata in special refs/ spaces. One repository I tested with grew 
from 3 MB to 35 MB when I fetched using refs/* while another grew 
from 20 MB to 120 MB, so there is definitely space and time to be 
saved by only fetching the refs/heads and refs/tags spaces....

Reviewed-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>

//Peter
Mikko Rapeli Aug. 22, 2022, 5:19 a.m. UTC | #2
Hi,

On Sat, Aug 20, 2022 at 12:06:55PM +0000, Peter Kjellerstedt wrote:
> > -----Original Message-----
> > From: Marek Vasut <marex@denx.de>
> > Sent: den 19 augusti 2022 18:55
> > To: bitbake-devel@lists.openembedded.org
> > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org>
> > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
> > 
> > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > single object in the remote repository. This works poorly with gitlab
> > and github, which use the remote git repository to track its metadata
> > like merge requests, CI pipelines and such.
> > 
> > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > and refs/keep-around/* and they all contain massive amount of data that
> > are useless for the bitbake build purposes. The amount of useless data
> > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > proxies may outright terminate the 'git fetch' connection, and make it
> > appear as if bitbake got stuck on 'git fetch' with no output.
> > 
> > To avoid fetching all these useless metadata, tweak the git fetcher such
> > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > refspecs as those are only available in new git versions.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > ---
> > Cc: Martin Jansa <Martin.Jansa@gmail.com>
> > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> > ---
> >  lib/bb/fetch2/git.py | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > index 4534bd75..b5fc0a51 100644
> > --- a/lib/bb/fetch2/git.py
> > +++ b/lib/bb/fetch2/git.py
> > @@ -382,7 +382,7 @@ class Git(FetchMethod):
> >                runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
> > 
> >              runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> > -            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> > +            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
> >              if ud.proto.lower() != 'file':
> >                  bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
> >              progresshandler = GitProgressHandler(d)
> > --
> > 2.35.1
> 
> Seems like the right thing to do. We use Gerrit, which also has its 
> metadata in special refs/ spaces. One repository I tested with grew 
> from 3 MB to 35 MB when I fetched using refs/* while another grew 
> from 20 MB to 120 MB, so there is definitely space and time to be 
> saved by only fetching the refs/heads and refs/tags spaces....

As user of Gerrit, I fear this will cause problems. In my case developers
are used to creating test topics and using git hashes in recipes which
are not yet released, e.g. not yet in release branches or tags. This can of
course create problems when such changes end up in real releases.

Workaround is that developers can create throw away testing branches
and refer to them in recipes.

From one side this is an improvement to have less data in caches, but on
the other side this adds extra actions to developers who want to test
changes to their recipes. Can't decide which one is more important though :/

Cheers,

-Mikko
Alexander Kanavin Aug. 22, 2022, 6:57 a.m. UTC | #3
Can be solved with a parameter to a fetcher perhaps?

Alex

On Mon 22. Aug 2022 at 7.20, Mikko Rapeli <mikko.rapeli@bmw.de> wrote:

> Hi,
>
> On Sat, Aug 20, 2022 at 12:06:55PM +0000, Peter Kjellerstedt wrote:
> > > -----Original Message-----
> > > From: Marek Vasut <marex@denx.de>
> > > Sent: den 19 augusti 2022 18:55
> > > To: bitbake-devel@lists.openembedded.org
> > > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>;
> Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <
> richard.purdie@linuxfoundation.org>
> > > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching
> gitlab repository metadata
> > >
> > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > > single object in the remote repository. This works poorly with gitlab
> > > and github, which use the remote git repository to track its metadata
> > > like merge requests, CI pipelines and such.
> > >
> > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > > and refs/keep-around/* and they all contain massive amount of data that
> > > are useless for the bitbake build purposes. The amount of useless data
> > > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > > proxies may outright terminate the 'git fetch' connection, and make it
> > > appear as if bitbake got stuck on 'git fetch' with no output.
> > >
> > > To avoid fetching all these useless metadata, tweak the git fetcher
> such
> > > that it only fetches refs/heads/* and refs/tags/* . Avoid using
> negative
> > > refspecs as those are only available in new git versions.
> > >
> > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > ---
> > > Cc: Martin Jansa <Martin.Jansa@gmail.com>
> > > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > > Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> > > ---
> > >  lib/bb/fetch2/git.py | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > > index 4534bd75..b5fc0a51 100644
> > > --- a/lib/bb/fetch2/git.py
> > > +++ b/lib/bb/fetch2/git.py
> > > @@ -382,7 +382,7 @@ class Git(FetchMethod):
> > >                runfetchcmd("%s remote rm origin" % ud.basecmd, d,
> workdir=ud.clonedir)
> > >
> > >              runfetchcmd("%s remote add --mirror=fetch origin %s" %
> (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> > > -            fetch_cmd = "LANG=C %s fetch -f --progress %s
> refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> > > +            fetch_cmd = "LANG=C %s fetch -f --progress %s
> refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd,
> shlex.quote(repourl))
> > >              if ud.proto.lower() != 'file':
> > >                  bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
> > >              progresshandler = GitProgressHandler(d)
> > > --
> > > 2.35.1
> >
> > Seems like the right thing to do. We use Gerrit, which also has its
> > metadata in special refs/ spaces. One repository I tested with grew
> > from 3 MB to 35 MB when I fetched using refs/* while another grew
> > from 20 MB to 120 MB, so there is definitely space and time to be
> > saved by only fetching the refs/heads and refs/tags spaces....
>
> As user of Gerrit, I fear this will cause problems. In my case developers
> are used to creating test topics and using git hashes in recipes which
> are not yet released, e.g. not yet in release branches or tags. This can of
> course create problems when such changes end up in real releases.
>
> Workaround is that developers can create throw away testing branches
> and refer to them in recipes.
>
> From one side this is an improvement to have less data in caches, but on
> the other side this adds extra actions to developers who want to test
> changes to their recipes. Can't decide which one is more important though
> :/
>
> Cheers,
>
> -Mikko
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13910):
> https://lists.openembedded.org/g/bitbake-devel/message/13910
> Mute This Topic: https://lists.openembedded.org/mt/93128921/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Mikko Rapeli Aug. 22, 2022, 7:38 a.m. UTC | #4
On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
> Can be solved with a parameter to a fetcher perhaps?

Frequently developers know to change the git URL in recipes from
"branch=master" to "nobranch=1" for their test commits.

This could be used for fetching the changes too, to limit the scope.

Cheers,

-Mikko
Marek Vasut Aug. 22, 2022, 8:29 a.m. UTC | #5
On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
>> Can be solved with a parameter to a fetcher perhaps?
> 
> Frequently developers know to change the git URL in recipes from
> "branch=master" to "nobranch=1" for their test commits.
> 
> This could be used for fetching the changes too, to limit the scope.

So maybe the easy way out is, if nobranch=1 then fetch everything, else 
just heads and tags ?
Marek Vasut Aug. 22, 2022, 8:37 a.m. UTC | #6
On 8/22/22 10:29, Marek Vasut wrote:
> On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
>> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
>>> Can be solved with a parameter to a fetcher perhaps?
>>
>> Frequently developers know to change the git URL in recipes from
>> "branch=master" to "nobranch=1" for their test commits.
>>
>> This could be used for fetching the changes too, to limit the scope.
> 
> So maybe the easy way out is, if nobranch=1 then fetch everything, else 
> just heads and tags ?

No, this won't do, nobranch expects the commit to be in a tag.
Marek Vasut Aug. 22, 2022, 8:41 a.m. UTC | #7
On 8/22/22 10:37, Marek Vasut wrote:
> On 8/22/22 10:29, Marek Vasut wrote:
>> On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
>>> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
>>>> Can be solved with a parameter to a fetcher perhaps?
>>>
>>> Frequently developers know to change the git URL in recipes from
>>> "branch=master" to "nobranch=1" for their test commits.
>>>
>>> This could be used for fetching the changes too, to limit the scope.
>>
>> So maybe the easy way out is, if nobranch=1 then fetch everything, 
>> else just heads and tags ?
> 
> No, this won't do, nobranch expects the commit to be in a tag.

But then, if gerrit works with nobranch=1, then gerrit must be 
generating tags which contain the commits you test ?

And since this patch fetches refs/tags/ , then this patch won't break 
the gerrit setup ?
Alexander Kanavin Aug. 22, 2022, 8:41 a.m. UTC | #8
On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:

> > So maybe the easy way out is, if nobranch=1 then fetch everything, else
> > just heads and tags ?
>
> No, this won't do, nobranch expects the commit to be in a tag.

I don't think it expects that.

Alex
Mikko Rapeli Aug. 22, 2022, 9:09 a.m. UTC | #9
Hi,

On Mon, Aug 22, 2022 at 10:41:23AM +0200, Marek Vasut wrote:
> On 8/22/22 10:37, Marek Vasut wrote:
> > On 8/22/22 10:29, Marek Vasut wrote:
> > > On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
> > > > On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
> > > > > Can be solved with a parameter to a fetcher perhaps?
> > > > 
> > > > Frequently developers know to change the git URL in recipes from
> > > > "branch=master" to "nobranch=1" for their test commits.
> > > > 
> > > > This could be used for fetching the changes too, to limit the scope.
> > > 
> > > So maybe the easy way out is, if nobranch=1 then fetch everything,
> > > else just heads and tags ?

To me this would be the way to go.

> > No, this won't do, nobranch expects the commit to be in a tag.
> 
> But then, if gerrit works with nobranch=1, then gerrit must be generating
> tags which contain the commits you test ?
> 
> And since this patch fetches refs/tags/ , then this patch won't break the
> gerrit setup ?

nobranch=1 works with any branch or tag or open gerrit review commit id.
At least with sumo and dunfell.

Cheers,

-Mikko
Marek Vasut Aug. 22, 2022, 10:35 a.m. UTC | #10
On 8/22/22 10:41, Alexander Kanavin wrote:
> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> 
>>> So maybe the easy way out is, if nobranch=1 then fetch everything, else
>>> just heads and tags ?
>>
>> No, this won't do, nobranch expects the commit to be in a tag.
> 
> I don't think it expects that.

Documentation says it does:

https://git.openembedded.org/bitbake/tree/lib/bb/fetch2/git.py#n45
"
- nobranch
    Don't check the SHA validation for branch. set this option for the 
recipe
    referring to commit which is valid in tag instead of branch.
    The default is "0", set nobranch=1 if needed.
"
Mikko Rapeli Aug. 22, 2022, 10:51 a.m. UTC | #11
On Mon, Aug 22, 2022 at 12:35:08PM +0200, Marek Vasut wrote:
> On 8/22/22 10:41, Alexander Kanavin wrote:
> > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> > 
> > > > So maybe the easy way out is, if nobranch=1 then fetch everything, else
> > > > just heads and tags ?
> > > 
> > > No, this won't do, nobranch expects the commit to be in a tag.
> > 
> > I don't think it expects that.
> 
> Documentation says it does:
> 
> https://git.openembedded.org/bitbake/tree/lib/bb/fetch2/git.py#n45
> "
> - nobranch
>    Don't check the SHA validation for branch. set this option for the recipe
>    referring to commit which is valid in tag instead of branch.
>    The default is "0", set nobranch=1 if needed.
> "

Only the first sentence is enforced. The change can still be in a branch, in
a tag, in random other namespace as long as the commit is found at checkout
time.

Cheers,

-Mikko
Quentin Schulz Aug. 22, 2022, 10:57 a.m. UTC | #12
Hi Marek,

On 8/22/22 12:35, Marek Vasut wrote:
> On 8/22/22 10:41, Alexander Kanavin wrote:
>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
>>
>>>> So maybe the easy way out is, if nobranch=1 then fetch everything, else
>>>> just heads and tags ?
>>>
>>> No, this won't do, nobranch expects the commit to be in a tag.
>>
>> I don't think it expects that.
> 
> Documentation says it does:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= 
> "
> - nobranch
>     Don't check the SHA validation for branch. set this option for the 
> recipe
>     referring to commit which is valid in tag instead of branch.

I assume this was meant to give the example of tags which aren't 
necessarily in a branch (annotated tags or tags of commits not belong to 
any branch anymore (force-push for example, or branch deletion).

The git fetcher does a git log --pretty=oneline -n 1 <hash> when 
nobranch is set, otherwise git branch --contains <hash> --list <branch> 
to check whether a commit exists and can be used by bitbake.

Considering this check, I assume nobranch=1 is working for any commit 
that was fetched by the git fetcher?

(We need to update the docs to reflect that in that case).

Cheers,
Quentin

>     The default is "0", set nobranch=1 if needed.
> "
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13918): https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_g_bitbake-2Ddevel_message_13918&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=g4KWyxwbq71V3gbvIJNG-oA9Gdvj3A5wqfz8Kws5qZg&e=
> Mute This Topic: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_mt_93128921_6293953&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=KMelWJJL5NtG7NWmtiS3jFAONb4GRttyl1ziLzEHhr8&e=
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_g_bitbake-2Ddevel_unsub&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=nyYO56WzR2jmLa4g95pgXToervc-fJhqbVjnOOUDm0g&e=   [quentin.schulz@theobroma-systems.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Marek Vasut Aug. 22, 2022, 11:55 a.m. UTC | #13
On 8/22/22 12:57, Quentin Schulz wrote:
> Hi Marek,
> 
> On 8/22/22 12:35, Marek Vasut wrote:
>> On 8/22/22 10:41, Alexander Kanavin wrote:
>>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
>>>
>>>>> So maybe the easy way out is, if nobranch=1 then fetch everything, 
>>>>> else
>>>>> just heads and tags ?
>>>>
>>>> No, this won't do, nobranch expects the commit to be in a tag.
>>>
>>> I don't think it expects that.
>>
>> Documentation says it does:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
>> - nobranch
>>     Don't check the SHA validation for branch. set this option for the 
>> recipe
>>     referring to commit which is valid in tag instead of branch.
> 
> I assume this was meant to give the example of tags which aren't 
> necessarily in a branch (annotated tags or tags of commits not belong to 
> any branch anymore (force-push for example, or branch deletion).
> 
> The git fetcher does a git log --pretty=oneline -n 1 <hash> when 
> nobranch is set, otherwise git branch --contains <hash> --list <branch> 
> to check whether a commit exists and can be used by bitbake.
> 
> Considering this check, I assume nobranch=1 is working for any commit 
> that was fetched by the git fetcher?
> 
> (We need to update the docs to reflect that in that case).

In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head 
refs/tags' otherwise ?
Richard Purdie Aug. 22, 2022, 2:17 p.m. UTC | #14
On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote:
> On 8/22/22 12:57, Quentin Schulz wrote:
> > Hi Marek,
> > 
> > On 8/22/22 12:35, Marek Vasut wrote:
> > > On 8/22/22 10:41, Alexander Kanavin wrote:
> > > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> > > > 
> > > > > > So maybe the easy way out is, if nobranch=1 then fetch everything, 
> > > > > > else
> > > > > > just heads and tags ?
> > > > > 
> > > > > No, this won't do, nobranch expects the commit to be in a tag.
> > > > 
> > > > I don't think it expects that.
> > > 
> > > Documentation says it does:
> > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
> > > - nobranch
> > >     Don't check the SHA validation for branch. set this option for the 
> > > recipe
> > >     referring to commit which is valid in tag instead of branch.
> > 
> > I assume this was meant to give the example of tags which aren't 
> > necessarily in a branch (annotated tags or tags of commits not belong to 
> > any branch anymore (force-push for example, or branch deletion).
> > 
> > The git fetcher does a git log --pretty=oneline -n 1 <hash> when 
> > nobranch is set, otherwise git branch --contains <hash> --list <branch> 
> > to check whether a commit exists and can be used by bitbake.
> > 
> > Considering this check, I assume nobranch=1 is working for any commit 
> > that was fetched by the git fetcher?
> > 
> > (We need to update the docs to reflect that in that case).
> 
> In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head 
> refs/tags' otherwise ?

This does get a bit more complex though since you now need two
different mirror tarballs, one for each option. The code can do that if
setup correctly but we do need to cover that issue.

Cheers,

Richard
Peter Kjellerstedt Aug. 22, 2022, 3:21 p.m. UTC | #15
> -----Original Message-----
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
> Sent: den 22 augusti 2022 16:17
> To: Marek Vasut <marex@denx.de>; Quentin Schulz <quentin.schulz@theobroma-
> systems.com>; Alexander Kanavin <alex.kanavin@gmail.com>
> Cc: Mikko Rapeli <Mikko.Rapeli@bmw.de>; Martin Jansa
> <Martin.Jansa@gmail.com>; bitbake-devel <bitbake-
> devel@lists.openembedded.org>; Peter Kjellerstedt
> <peter.kjellerstedt@axis.com>
> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher
> from fetching gitlab repository metadata
> 
> On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote:
> > On 8/22/22 12:57, Quentin Schulz wrote:
> > > Hi Marek,
> > >
> > > On 8/22/22 12:35, Marek Vasut wrote:
> > > > On 8/22/22 10:41, Alexander Kanavin wrote:
> > > > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> > > > >
> > > > > > > So maybe the easy way out is, if nobranch=1 then fetch
> everything,
> > > > > > > else
> > > > > > > just heads and tags ?
> > > > > >
> > > > > > No, this won't do, nobranch expects the commit to be in a tag.
> > > > >
> > > > > I don't think it expects that.
> > > >
> > > > Documentation says it does:
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-
> 23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq
> 8yBP6m6qZZ4njZguQhZhkI_-
> 172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6o
> I&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
> > > > - nobranch
> > > >     Don't check the SHA validation for branch. set this option for
> the
> > > > recipe
> > > >     referring to commit which is valid in tag instead of branch.
> > >
> > > I assume this was meant to give the example of tags which aren't
> > > necessarily in a branch (annotated tags or tags of commits not belong
> to
> > > any branch anymore (force-push for example, or branch deletion).
> > >
> > > The git fetcher does a git log --pretty=oneline -n 1 <hash> when
> > > nobranch is set, otherwise git branch --contains <hash> --list
> <branch>
> > > to check whether a commit exists and can be used by bitbake.
> > >
> > > Considering this check, I assume nobranch=1 is working for any commit
> > > that was fetched by the git fetcher?
> > >
> > > (We need to update the docs to reflect that in that case).
> >
> > In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head
> > refs/tags' otherwise ?
> 
> This does get a bit more complex though since you now need two
> different mirror tarballs, one for each option. The code can do that if
> setup correctly but we do need to cover that issue.
> 
> Cheers,
> 
> Richard

I made some testing, and for Gerrit to continue to work it would be
enough to use:

            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))

This should not affect other Git servers and should avoid using 
different fetch commands depending on the URL. The drawback is of 
course that for Gerrit, there would be only marginal benefits to 
this change since the majority of its metadata is in the 
refs/changes space.

However, I wonder if the suggested change actually has any significant 
effect, given that the initial clone is done using --mirror, which means 
all refs/ spaces are fetched. If I remove the --mirror option from the 
clone command the change works as expected, but I have no idea if that 
has any other significant impact...

//Peter
Luca Ceresoli Aug. 22, 2022, 4:02 p.m. UTC | #16
Hi Marek,

On Fri, 19 Aug 2022 18:54:55 +0200
"Marek Vasut" <marex@denx.de> wrote:

> The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> single object in the remote repository. This works poorly with gitlab
> and github, which use the remote git repository to track its metadata
> like merge requests, CI pipelines and such.
> 
> Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> and refs/keep-around/* and they all contain massive amount of data that
> are useless for the bitbake build purposes. The amount of useless data
> can in fact be so massive (e.g. with FDO mesa.git repository) that some
> proxies may outright terminate the 'git fetch' connection, and make it
> appear as if bitbake got stuck on 'git fetch' with no output.
> 
> To avoid fetching all these useless metadata, tweak the git fetcher such
> that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> refspecs as those are only available in new git versions.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>

Of course this might become irrelevant with whatever implementation
will be in v2, however when testing with this patch applied I got the
following warning and wonder whether they are related:

WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR

Full log:
https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio
Peter Kjellerstedt Aug. 22, 2022, 4:06 p.m. UTC | #17
> -----Original Message-----
> From: Luca Ceresoli <luca.ceresoli@bootlin.com>
> Sent: den 22 augusti 2022 18:03
> To: Marek Vasut <marex@denx.de>
> Cc: bitbake-devel@lists.openembedded.org; Martin Jansa
> <Martin.Jansa@gmail.com>; Peter Kjellerstedt
> <peter.kjellerstedt@axis.com>; Richard Purdie
> <richard.purdie@linuxfoundation.org>
> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher
> from fetching gitlab repository metadata
> 
> Hi Marek,
> 
> On Fri, 19 Aug 2022 18:54:55 +0200
> "Marek Vasut" <marex@denx.de> wrote:
> 
> > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > single object in the remote repository. This works poorly with gitlab
> > and github, which use the remote git repository to track its metadata
> > like merge requests, CI pipelines and such.
> >
> > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > and refs/keep-around/* and they all contain massive amount of data that
> > are useless for the bitbake build purposes. The amount of useless data
> > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > proxies may outright terminate the 'git fetch' connection, and make it
> > appear as if bitbake got stuck on 'git fetch' with no output.
> >
> > To avoid fetching all these useless metadata, tweak the git fetcher such
> > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > refspecs as those are only available in new git versions.
> >
> > Signed-off-by: Marek Vasut <marex@denx.de>
> 
> Of course this might become irrelevant with whatever implementation
> will be in v2, however when testing with this patch applied I got the
> following warning and wonder whether they are related:
> 
> WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR

I cannot see any reason how they can be related.

> 
> Full log:
> https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/ste ps/32/logs/stdio
> 
> --
> Luca Ceresoli, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

//Peter
Richard Purdie Aug. 22, 2022, 4:07 p.m. UTC | #18
On Mon, 2022-08-22 at 18:02 +0200, Luca Ceresoli wrote:
> Hi Marek,
> 
> On Fri, 19 Aug 2022 18:54:55 +0200
> "Marek Vasut" <marex@denx.de> wrote:
> 
> > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > single object in the remote repository. This works poorly with gitlab
> > and github, which use the remote git repository to track its metadata
> > like merge requests, CI pipelines and such.
> > 
> > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > and refs/keep-around/* and they all contain massive amount of data that
> > are useless for the bitbake build purposes. The amount of useless data
> > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > proxies may outright terminate the 'git fetch' connection, and make it
> > appear as if bitbake got stuck on 'git fetch' with no output.
> > 
> > To avoid fetching all these useless metadata, tweak the git fetcher such
> > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > refspecs as those are only available in new git versions.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> 
> Of course this might become irrelevant with whatever implementation
> will be in v2, however when testing with this patch applied I got the
> following warning and wonder whether they are related:
> 
> WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR
> 
> Full log:
> https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio
> 

This is a known issue with an open bug assigned to me (unfortunately),
it isn't related. It is intermittent as it is llvm-native related and
we don't commonly rebuild this codepath.

Cheers,

Richard
Marek Vasut Aug. 22, 2022, 4:39 p.m. UTC | #19
On 8/22/22 17:21, Peter Kjellerstedt wrote:
>> -----Original Message-----
>> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>> Sent: den 22 augusti 2022 16:17
>> To: Marek Vasut <marex@denx.de>; Quentin Schulz <quentin.schulz@theobroma-
>> systems.com>; Alexander Kanavin <alex.kanavin@gmail.com>
>> Cc: Mikko Rapeli <Mikko.Rapeli@bmw.de>; Martin Jansa
>> <Martin.Jansa@gmail.com>; bitbake-devel <bitbake-
>> devel@lists.openembedded.org>; Peter Kjellerstedt
>> <peter.kjellerstedt@axis.com>
>> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher
>> from fetching gitlab repository metadata
>>
>> On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote:
>>> On 8/22/22 12:57, Quentin Schulz wrote:
>>>> Hi Marek,
>>>>
>>>> On 8/22/22 12:35, Marek Vasut wrote:
>>>>> On 8/22/22 10:41, Alexander Kanavin wrote:
>>>>>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
>>>>>>
>>>>>>>> So maybe the easy way out is, if nobranch=1 then fetch
>> everything,
>>>>>>>> else
>>>>>>>> just heads and tags ?
>>>>>>>
>>>>>>> No, this won't do, nobranch expects the commit to be in a tag.
>>>>>>
>>>>>> I don't think it expects that.
>>>>>
>>>>> Documentation says it does:
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-
>> 23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq
>> 8yBP6m6qZZ4njZguQhZhkI_-
>> 172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6o
>> I&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
>>>>> - nobranch
>>>>>      Don't check the SHA validation for branch. set this option for
>> the
>>>>> recipe
>>>>>      referring to commit which is valid in tag instead of branch.
>>>>
>>>> I assume this was meant to give the example of tags which aren't
>>>> necessarily in a branch (annotated tags or tags of commits not belong
>> to
>>>> any branch anymore (force-push for example, or branch deletion).
>>>>
>>>> The git fetcher does a git log --pretty=oneline -n 1 <hash> when
>>>> nobranch is set, otherwise git branch --contains <hash> --list
>> <branch>
>>>> to check whether a commit exists and can be used by bitbake.
>>>>
>>>> Considering this check, I assume nobranch=1 is working for any commit
>>>> that was fetched by the git fetcher?
>>>>
>>>> (We need to update the docs to reflect that in that case).
>>>
>>> In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head
>>> refs/tags' otherwise ?
>>
>> This does get a bit more complex though since you now need two
>> different mirror tarballs, one for each option. The code can do that if
>> setup correctly but we do need to cover that issue.
>>
>> Cheers,
>>
>> Richard
> 
> I made some testing, and for Gerrit to continue to work it would be
> enough to use:
> 
>              fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))
> 
> This should not affect other Git servers and should avoid using
> different fetch commands depending on the URL. The drawback is of
> course that for Gerrit, there would be only marginal benefits to
> this change since the majority of its metadata is in the
> refs/changes space.
> 
> However, I wonder if the suggested change actually has any significant
> effect, given that the initial clone is done using --mirror, which means
> all refs/ spaces are fetched. If I remove the --mirror option from the
> clone command the change works as expected, but I have no idea if that
> has any other significant impact...

With this change, I am able to actually fetch mesa from 
gitlab.freedesktop.org without local CI proxy terminating the connection 
in the process. So yes, it does have effect.
Luca Ceresoli Aug. 23, 2022, 2:34 p.m. UTC | #20
Hi Richard, Peter,

On Mon, 22 Aug 2022 17:07:50 +0100
"Richard Purdie" <richard.purdie@linuxfoundation.org> wrote:

> On Mon, 2022-08-22 at 18:02 +0200, Luca Ceresoli wrote:
> > Hi Marek,
> > 
> > On Fri, 19 Aug 2022 18:54:55 +0200
> > "Marek Vasut" <marex@denx.de> wrote:
> >   
> > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > > single object in the remote repository. This works poorly with gitlab
> > > and github, which use the remote git repository to track its metadata
> > > like merge requests, CI pipelines and such.
> > > 
> > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > > and refs/keep-around/* and they all contain massive amount of data that
> > > are useless for the bitbake build purposes. The amount of useless data
> > > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > > proxies may outright terminate the 'git fetch' connection, and make it
> > > appear as if bitbake got stuck on 'git fetch' with no output.
> > > 
> > > To avoid fetching all these useless metadata, tweak the git fetcher such
> > > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > > refspecs as those are only available in new git versions.
> > > 
> > > Signed-off-by: Marek Vasut <marex@denx.de>  
> > 
> > Of course this might become irrelevant with whatever implementation
> > will be in v2, however when testing with this patch applied I got the
> > following warning and wonder whether they are related:
> > 
> > WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR
> > 
> > Full log:
> > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio
> >   
> 
> This is a known issue with an open bug assigned to me (unfortunately),
> it isn't related. It is intermittent as it is llvm-native related and
> we don't commonly rebuild this codepath.

Indeed, added to https://bugzilla.yoctoproject.org/show_bug.cgi?id=14897

Thanks for the hint and apologies for the noise.
Marek Vasut Sept. 1, 2022, 5:50 p.m. UTC | #21
On 8/22/22 18:39, Marek Vasut wrote:

Hi,

[...]

>> I made some testing, and for Gerrit to continue to work it would be
>> enough to use:
>>
>>              fetch_cmd = "LANG=C %s fetch -f --progress %s 
>> refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* 
>> refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))
>>
>> This should not affect other Git servers and should avoid using
>> different fetch commands depending on the URL. The drawback is of
>> course that for Gerrit, there would be only marginal benefits to
>> this change since the majority of its metadata is in the
>> refs/changes space.
>>
>> However, I wonder if the suggested change actually has any significant
>> effect, given that the initial clone is done using --mirror, which means
>> all refs/ spaces are fetched. If I remove the --mirror option from the
>> clone command the change works as expected, but I have no idea if that
>> has any other significant impact...
> 
> With this change, I am able to actually fetch mesa from 
> gitlab.freedesktop.org without local CI proxy terminating the connection 
> in the process. So yes, it does have effect.

I keep running into this problem with mesa, how can we proceed to fix it 
upstream ?
Richard Purdie Sept. 2, 2022, 3:54 p.m. UTC | #22
On Thu, 2022-09-01 at 19:50 +0200, Marek Vasut wrote:
> On 8/22/22 18:39, Marek Vasut wrote:
> 
> Hi,
> 
> [...]
> 
> > > I made some testing, and for Gerrit to continue to work it would be
> > > enough to use:
> > > 
> > >              fetch_cmd = "LANG=C %s fetch -f --progress %s 
> > > refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* 
> > > refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))
> > > 
> > > This should not affect other Git servers and should avoid using
> > > different fetch commands depending on the URL. The drawback is of
> > > course that for Gerrit, there would be only marginal benefits to
> > > this change since the majority of its metadata is in the
> > > refs/changes space.
> > > 
> > > However, I wonder if the suggested change actually has any significant
> > > effect, given that the initial clone is done using --mirror, which means
> > > all refs/ spaces are fetched. If I remove the --mirror option from the
> > > clone command the change works as expected, but I have no idea if that
> > > has any other significant impact...
> > 
> > With this change, I am able to actually fetch mesa from 
> > gitlab.freedesktop.org without local CI proxy terminating the connection 
> > in the process. So yes, it does have effect.
> 
> I keep running into this problem with mesa, how can we proceed to fix it 
> upstream ?

We probably need a version of the patch which restricts by default but
allows it restriction to be turned off on a per url basis with a
parameter.

That restriction needs to be reflected in the mirror tarball name too.

Cheers,

Richard
diff mbox series

Patch

diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 4534bd75..b5fc0a51 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -382,7 +382,7 @@  class Git(FetchMethod):
               runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
 
             runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
-            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
+            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
             if ud.proto.lower() != 'file':
                 bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
             progresshandler = GitProgressHandler(d)