diff mbox series

fetch2/git: Switch branch to correct hash in case of two premirrors

Message ID 20220810084401.3885-1-tomasz.dziendzielski@gmail.com
State New
Headers show
Series fetch2/git: Switch branch to correct hash in case of two premirrors | expand

Commit Message

Tomasz Dziendzielski Aug. 10, 2022, 8:44 a.m. UTC
If we use two premirrors and tarball from the first one contains older
commit then after fetching from tmpdir master ref still points to older
commit, so _contains_ref can't find the proper revision on a branch.

Switch master ref to correct hash so _contains ref can find it. It needs
to be done in for loop because we don't know which branch is used for
specific repo in case of multiple repositories.

We need to use two try blocks because both commands might fail and it's
not the reason to break the execution. There are two specific cases I
want to cover. First is when we have multiple git:// entries in SRC_URI
and one uses "master" and second "main". In this code we don't know
which branch is assigned for which repository so we want to add both
branches - so first we need to remove both branches and one might not
exist. We could check if the branch exists and create it only then but
this would break the second case. Now the second case is when for the
first premirror we remove the branch but we can't create it later
because the hash does not exist on the branch since tarball doesn't
contain the newer commits - then for the second premirror the removal of
the branch would fail.

Signed-off-by: Tomasz Dziendzielski <tomasz.dziendzielski@gmail.com>
Signed-off-by: Mateusz Marciniec <mateuszmar2@gmail.com>
Signed-off-by: Jan Brzezanski <jan.brzezanski@gmail.com>
---
 lib/bb/fetch2/git.py | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Pavel Zhukov Aug. 10, 2022, 1:36 p.m. UTC | #1
Some basic test would be useful here. 
There're few tests to cover premirror functionality in the testsuite so adding new one should be relatively easy. 

-- Pavel

On Wed, Aug 10, 2022, at 04:44, Tomasz Dziendzielski wrote:
> If we use two premirrors and tarball from the first one contains older
> commit then after fetching from tmpdir master ref still points to older
> commit, so _contains_ref can't find the proper revision on a branch.
> 
> Switch master ref to correct hash so _contains ref can find it. It needs
> to be done in for loop because we don't know which branch is used for
> specific repo in case of multiple repositories.
> 
> We need to use two try blocks because both commands might fail and it's
> not the reason to break the execution. There are two specific cases I
> want to cover. First is when we have multiple git:// entries in SRC_URI
> and one uses "master" and second "main". In this code we don't know
> which branch is assigned for which repository so we want to add both
> branches - so first we need to remove both branches and one might not
> exist. We could check if the branch exists and create it only then but
> this would break the second case. Now the second case is when for the
> first premirror we remove the branch but we can't create it later
> because the hash does not exist on the branch since tarball doesn't
> contain the newer commits - then for the second premirror the removal of
> the branch would fail.
> 
> Signed-off-by: Tomasz Dziendzielski <tomasz.dziendzielski@gmail.com>
> Signed-off-by: Mateusz Marciniec <mateuszmar2@gmail.com>
> Signed-off-by: Jan Brzezanski <jan.brzezanski@gmail.com>
> ---
> lib/bb/fetch2/git.py | 10 ++++++++++
> 1 file changed, 10 insertions(+)
> 
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 4534bd758..ea47df3cf 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -362,6 +362,16 @@ class Git(FetchMethod):
>                  runfetchcmd("tar -xzf %s" % ud.fullmirror, d, workdir=tmpdir)
>                  fetch_cmd = "LANG=C %s fetch -f --progress %s " % (ud.basecmd, shlex.quote(tmpdir))
>                  runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
> +                # Switch branch ref to a correct hash
> +                for name in ud.names:
> +                    try:
> +                        runfetchcmd("LANG=C %s branch -d %s" % (ud.basecmd, ud.parm.get('branch', ud.branches[name])), d, workdir=ud.clonedir)
> +                    except:
> +                        pass
> +                    try:
> +                        runfetchcmd("LANG=C %s branch %s %s" % (ud.basecmd, ud.parm.get('branch', ud.branches[name]), ud.revisions[name]), d, workdir=ud.clonedir)
> +                    except:
> +                        pass
>          repourl = self._get_repo_url(ud)
>  
>          # If the repo still doesn't exist, fallback to cloning it
> -- 
> 2.36.1
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13878): https://lists.openembedded.org/g/bitbake-devel/message/13878
> Mute This Topic: https://lists.openembedded.org/mt/92932619/6390638
> Group Owner: bitbake-devel+owner@lists.openembedded.org <mailto:bitbake-devel%2Bowner@lists.openembedded.org>
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [pavel@zhukoff.net]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
>
Richard Purdie Aug. 10, 2022, 2:05 p.m. UTC | #2
On Wed, 2022-08-10 at 10:44 +0200, Tomasz Dziendzielski wrote:
> If we use two premirrors and tarball from the first one contains older
> commit then after fetching from tmpdir master ref still points to older
> commit, so _contains_ref can't find the proper revision on a branch.
> 
> Switch master ref to correct hash so _contains ref can find it. It needs
> to be done in for loop because we don't know which branch is used for
> specific repo in case of multiple repositories.
> 
> We need to use two try blocks because both commands might fail and it's
> not the reason to break the execution. There are two specific cases I
> want to cover. First is when we have multiple git:// entries in SRC_URI
> and one uses "master" and second "main". In this code we don't know
> which branch is assigned for which repository so we want to add both
> branches - so first we need to remove both branches and one might not
> exist. We could check if the branch exists and create it only then but
> this would break the second case. Now the second case is when for the
> first premirror we remove the branch but we can't create it later
> because the hash does not exist on the branch since tarball doesn't
> contain the newer commits - then for the second premirror the removal of
> the branch would fail.
> 
> Signed-off-by: Tomasz Dziendzielski <tomasz.dziendzielski@gmail.com>
> Signed-off-by: Mateusz Marciniec <mateuszmar2@gmail.com>
> Signed-off-by: Jan Brzezanski <jan.brzezanski@gmail.com>
> ---
>  lib/bb/fetch2/git.py | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 4534bd758..ea47df3cf 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -362,6 +362,16 @@ class Git(FetchMethod):
>                  runfetchcmd("tar -xzf %s" % ud.fullmirror, d, workdir=tmpdir)
>                  fetch_cmd = "LANG=C %s fetch -f --progress %s " % (ud.basecmd, shlex.quote(tmpdir))
>                  runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
> +                # Switch branch ref to a correct hash
> +                for name in ud.names:
> +                    try:
> +                        runfetchcmd("LANG=C %s branch -d %s" % (ud.basecmd, ud.parm.get('branch', ud.branches[name])), d, workdir=ud.clonedir)
> +                    except:
> +                        pass
> +                    try:
> +                        runfetchcmd("LANG=C %s branch %s %s" % (ud.basecmd, ud.parm.get('branch', ud.branches[name]), ud.revisions[name]), d, workdir=ud.clonedir)
> +                    except:
> +                        pass
>          repourl = self._get_repo_url(ud)
>  
>          # If the repo still doesn't exist, fallback to cloning it


If I understand what that code is doing, even if the upstream branch
didn't have (and never had) the revision in question, it would force
that revision onto the branch?

I don't think we want to do that.

Cheers,

Richard
Tomasz Dziendzielski Aug. 10, 2022, 2:33 p.m. UTC | #3
>If I understand what that code is doing, even if the upstream branch
>didn't have (and never had) the revision in question, it would force
>that revision onto the branch?
>
>I don't think we want to do that.
Actually yes, since I couldn't find a way to determine the specific branch
for specific git:// entry at this point. It's still better than not being
able to checkout the hash on a branch even though it's there. Maybe the
better solution would be to get rid of 'tmpdir' and just clean the
clonedir and unpack the tarball to clonedir? I wasn't sure if there was
some reason for using tmpdir, so I didn't want to change that.


>Some basic test would be useful here.
I will try to work on some test once we find some solution.

Best regards,
Tomasz Dziendzielski
Richard Purdie Aug. 10, 2022, 2:48 p.m. UTC | #4
On Wed, 2022-08-10 at 16:33 +0200, Tomasz Dziendzielski wrote:
> > If I understand what that code is doing, even if the upstream
> > branch
> > didn't have (and never had) the revision in question, it would
> > force
> > that revision onto the branch?
> > 
> > I don't think we want to do that.
> Actually yes, since I couldn't find a way to determine the specific
> branch for specific git:// entry at this point. It's still better
> than not being able to checkout the hash on a branch even though it's
> there. Maybe the better solution would be to get rid of 'tmpdir' and
> just clean the clonedir and unpack the tarball to clonedir? I wasn't
> sure if there was some reason for using tmpdir, so I didn't want to
> change that.

Looking at the git history for the file:

https://git.yoctoproject.org/poky/log/bitbake/lib/bb/fetch2/git.py

you'll see it was added recently:

https://git.yoctoproject.org/poky/commit/bitbake/lib/bb/fetch2/git.py?id=53ed421226228f60a89a1868819b6c3ed6d45026

and addressed a number of long standing bugs. In particular the fetcher
was wiping out data and causing long refetches when it was
misconfigured. We did add test cases for those issues.
> 

> > Some basic test would be useful here. 
> I will try to work on some test once we find some solution.

Test cases can also be useful to clarify exactly what the problem being
solved is too. We'll need a test case regardless of how we solve it.

Cheers,

Richard
diff mbox series

Patch

diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 4534bd758..ea47df3cf 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -362,6 +362,16 @@  class Git(FetchMethod):
                 runfetchcmd("tar -xzf %s" % ud.fullmirror, d, workdir=tmpdir)
                 fetch_cmd = "LANG=C %s fetch -f --progress %s " % (ud.basecmd, shlex.quote(tmpdir))
                 runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
+                # Switch branch ref to a correct hash
+                for name in ud.names:
+                    try:
+                        runfetchcmd("LANG=C %s branch -d %s" % (ud.basecmd, ud.parm.get('branch', ud.branches[name])), d, workdir=ud.clonedir)
+                    except:
+                        pass
+                    try:
+                        runfetchcmd("LANG=C %s branch %s %s" % (ud.basecmd, ud.parm.get('branch', ud.branches[name]), ud.revisions[name]), d, workdir=ud.clonedir)
+                    except:
+                        pass
         repourl = self._get_repo_url(ud)
 
         # If the repo still doesn't exist, fallback to cloning it