fetch2/git: Check srcrev exists in checkstatus()

Message ID 20220705174630.13360-1-t-maiaxiao@linux.microsoft.com
State New
Headers show
Series fetch2/git: Check srcrev exists in checkstatus() | expand

Commit Message

Maia Xiao July 5, 2022, 5:46 p.m. UTC
Currently we only check that we can list the remote, but do not validate
if the srcrev actually exists. Now we ensure that the rev exists by
attempting to shallow clone it from the remote.

Fixes [YOCTO #11199].

Signed-off-by: Maia Xiao <t-maiaxiao@linux.microsoft.com>
---
 lib/bb/fetch2/git.py  | 28 +++++++++++++++++++++++-----
 lib/bb/tests/fetch.py | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 5 deletions(-)

Comments

Richard Purdie July 5, 2022, 9:27 p.m. UTC | #1
On Tue, 2022-07-05 at 17:46 +0000, Maia Xiao wrote:
> Currently we only check that we can list the remote, but do not validate
> if the srcrev actually exists. Now we ensure that the rev exists by
> attempting to shallow clone it from the remote.
> 
> Fixes [YOCTO #11199].
> 
> Signed-off-by: Maia Xiao <t-maiaxiao@linux.microsoft.com>
> ---
>  lib/bb/fetch2/git.py  | 28 +++++++++++++++++++++++-----
>  lib/bb/tests/fetch.py | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 07b3d9a7..99780de8 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -832,8 +832,26 @@ class Git(FetchMethod):
>              return True, str(rev)
>  
>      def checkstatus(self, fetch, ud, d):
> -        try:
> -            self._lsremote(ud, d, "")
> -            return True
> -        except bb.fetch2.FetchError:
> -            return False
> +        repourl = self._get_repo_url(ud)
> +        for name in ud.names:
> +            tmpdir = tempfile.mkdtemp()
> +            revision = ud.revisions[name]
> +
> +            try:
> +                # Initialise an empty git repo for shallow fetching
> +                runfetchcmd("%s init" % ud.basecmd, d, workdir=tmpdir)
> +                runfetchcmd("%s remote add origin %s" %
> +                            (ud.basecmd, shlex.quote(repourl)),
> +                            d,
> +                            workdir=tmpdir)
> +
> +                # Try to fetch only the specified srcrev
> +                runfetchcmd("%s fetch -f --progress --depth 1 origin %s" %
> +                            (ud.basecmd, revision),
> +                            d,
> +                            workdir=tmpdir)
> +            except bb.fetch2.FetchError:
> +                return False
> +            finally:
> +                bb.utils.remove(tmpdir, recurse=True)
> +        return True

This makes the git backend do something quite different from the
design. For instance the wget backend will only check that an http file
exists, it won't attempt to actually download it.

I'm therefore not convinced this is the right thing to make
checkstatus() do...

Cheers,

Richard
Paul Eggleton July 5, 2022, 10:02 p.m. UTC | #2
On Wednesday, 6 July 2022 09:27:38 NZST Richard Purdie wrote:
> On Tue, 2022-07-05 at 17:46 +0000, Maia Xiao wrote:
> > Currently we only check that we can list the remote, but do not validate
> > if the srcrev actually exists. Now we ensure that the rev exists by
> > attempting to shallow clone it from the remote.
> > 
> > Fixes [YOCTO #11199].
> > 
> > Signed-off-by: Maia Xiao <t-maiaxiao@linux.microsoft.com>
> > ---
> > 
> >  lib/bb/fetch2/git.py  | 28 +++++++++++++++++++++++-----
> >  lib/bb/tests/fetch.py | 34 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 57 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > index 07b3d9a7..99780de8 100644
> > --- a/lib/bb/fetch2/git.py
> > +++ b/lib/bb/fetch2/git.py
> > 
> > @@ -832,8 +832,26 @@ class Git(FetchMethod):
> >              return True, str(rev)
> >      
> >      def checkstatus(self, fetch, ud, d):
> > -        try:
> > -            self._lsremote(ud, d, "")
> > -            return True
> > -        except bb.fetch2.FetchError:
> > -            return False
> > +        repourl = self._get_repo_url(ud)
> > +        for name in ud.names:
> > +            tmpdir = tempfile.mkdtemp()
> > +            revision = ud.revisions[name]
> > +
> > +            try:
> > +                # Initialise an empty git repo for shallow fetching
> > +                runfetchcmd("%s init" % ud.basecmd, d, workdir=tmpdir)
> > +                runfetchcmd("%s remote add origin %s" %
> > +                            (ud.basecmd, shlex.quote(repourl)),
> > +                            d,
> > +                            workdir=tmpdir)
> > +
> > +                # Try to fetch only the specified srcrev
> > +                runfetchcmd("%s fetch -f --progress --depth 1 origin %s"
> > %
> > +                            (ud.basecmd, revision),
> > +                            d,
> > +                            workdir=tmpdir)
> > +            except bb.fetch2.FetchError:
> > +                return False
> > +            finally:
> > +                bb.utils.remove(tmpdir, recurse=True)
> > +        return True
> 
> This makes the git backend do something quite different from the
> design. For instance the wget backend will only check that an http file
> exists, it won't attempt to actually download it.
> 
> I'm therefore not convinced this is the right thing to make
> checkstatus() do...

I'm not sure there is a better way to do it currently - git ls-remote lists 
only certain refs (corresponding to tags and branches), so checking the output 
of that would be a potential shortcut and would work in a lot of cases, but 
this would still have to be a fallback. There is support for fetching a 
particular revision in recent versions of git (2.5+), but apparently the 
server has to support and enable it [1] and that would still count as 
downloading.

I guess it just depends on how much we need/want checkstatus to check the 
revision. Adding Ross as the original reporter of the bug.

Cheers
Paul

[1] https://stackoverflow.com/questions/14872486/retrieve-specific-commit-from-a-remote-git-repository/30701724#30701724
Bruce Ashfield July 6, 2022, 3:12 p.m. UTC | #3
On Tue, Jul 5, 2022 at 1:46 PM Maia Xiao <t-maiaxiao@linux.microsoft.com> wrote:
>
> Currently we only check that we can list the remote, but do not validate
> if the srcrev actually exists. Now we ensure that the rev exists by
> attempting to shallow clone it from the remote.
>
> Fixes [YOCTO #11199].
>
> Signed-off-by: Maia Xiao <t-maiaxiao@linux.microsoft.com>
> ---
>  lib/bb/fetch2/git.py  | 28 +++++++++++++++++++++++-----
>  lib/bb/tests/fetch.py | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 07b3d9a7..99780de8 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -832,8 +832,26 @@ class Git(FetchMethod):
>              return True, str(rev)
>
>      def checkstatus(self, fetch, ud, d):
> -        try:
> -            self._lsremote(ud, d, "")
> -            return True
> -        except bb.fetch2.FetchError:
> -            return False
> +        repourl = self._get_repo_url(ud)
> +        for name in ud.names:
> +            tmpdir = tempfile.mkdtemp()
> +            revision = ud.revisions[name]
> +
> +            try:
> +                # Initialise an empty git repo for shallow fetching
> +                runfetchcmd("%s init" % ud.basecmd, d, workdir=tmpdir)
> +                runfetchcmd("%s remote add origin %s" %
> +                            (ud.basecmd, shlex.quote(repourl)),
> +                            d,
> +                            workdir=tmpdir)
> +
> +                # Try to fetch only the specified srcrev
> +                runfetchcmd("%s fetch -f --progress --depth 1 origin %s" %

I think this is something that has to be optional, and not always enabled.
I'm sure Richard can advise better than I can on the options for that.

I've written some python modules that do something similar to this, and
there's no guarantee that the git hash you are validating can be reached
with a depth 1 clone. In that case, we need to decide if it's a warning, an
error, or something else.

Bruce

> +                            (ud.basecmd, revision),
> +                            d,
> +                            workdir=tmpdir)
> +            except bb.fetch2.FetchError:
> +                return False
> +            finally:
> +                bb.utils.remove(tmpdir, recurse=True)
> +        return True
> diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
> index ee41bff4..00def1bd 100644
> --- a/lib/bb/tests/fetch.py
> +++ b/lib/bb/tests/fetch.py
> @@ -1478,6 +1478,40 @@ class FetchCheckStatusTest(FetcherTest):
>
>          connection_cache.close_connections()
>
> +    @skipIfNoNetwork()
> +    def test_git_checkstatus(self):
> +        CHECKSTATUS_GOOD_REPO = "git://git.yoctoproject.org/yocto-docs;branch=master"
> +        CHECKSTATUS_GOOD_HASH = "2bd14d3810a68082ce01b459cf2799d796a32e63"
> +
> +        def do_git_checkstatus(url, srcrev):
> +            self.d.setVar("SRCREV", srcrev)
> +            fetch = bb.fetch2.Fetch([url], self.d)
> +            ud = fetch.ud[url]
> +            m = ud.method
> +            return m.checkstatus(fetch, ud, self.d)
> +
> +        # Valid URI + SRCREV returns true
> +        ret = do_git_checkstatus(CHECKSTATUS_GOOD_REPO, CHECKSTATUS_GOOD_HASH)
> +        self.assertTrue(ret,
> +                        msg="URI %s, can't check status" %
> +                        (CHECKSTATUS_GOOD_REPO))
> +
> +        # Unavailable server returns false
> +        invalid_git_uri = 'git://git.test.invalid/yocto-docs;branch=master'
> +        ret = do_git_checkstatus(invalid_git_uri, CHECKSTATUS_GOOD_HASH)
> +        self.assertFalse(
> +            ret,
> +            msg="URI %s, checkstatus() returns true for invalid Git server" %
> +            (invalid_git_uri))
> +
> +        # Invalid/disappeared SRCREV returns false
> +        rev_bad = "f" * 40
> +        ret = do_git_checkstatus(CHECKSTATUS_GOOD_REPO, rev_bad)
> +        self.assertFalse(
> +            ret,
> +            msg="URI %s, REF %s, checkstatus() returns true for invalid ref" %
> +            (CHECKSTATUS_GOOD_REPO, rev_bad))
> +
>
>  class GitMakeShallowTest(FetcherTest):
>      def setUp(self):
> --
> 2.17.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13799): https://lists.openembedded.org/g/bitbake-devel/message/13799
> Mute This Topic: https://lists.openembedded.org/mt/92190251/1050810
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [bruce.ashfield@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Bruce Ashfield July 6, 2022, 3:13 p.m. UTC | #4
On Wed, Jul 6, 2022 at 11:12 AM Bruce Ashfield <bruce.ashfield@gmail.com> wrote:
>
> On Tue, Jul 5, 2022 at 1:46 PM Maia Xiao <t-maiaxiao@linux.microsoft.com> wrote:
> >
> > Currently we only check that we can list the remote, but do not validate
> > if the srcrev actually exists. Now we ensure that the rev exists by
> > attempting to shallow clone it from the remote.
> >
> > Fixes [YOCTO #11199].
> >
> > Signed-off-by: Maia Xiao <t-maiaxiao@linux.microsoft.com>
> > ---
> >  lib/bb/fetch2/git.py  | 28 +++++++++++++++++++++++-----
> >  lib/bb/tests/fetch.py | 34 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 57 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > index 07b3d9a7..99780de8 100644
> > --- a/lib/bb/fetch2/git.py
> > +++ b/lib/bb/fetch2/git.py
> > @@ -832,8 +832,26 @@ class Git(FetchMethod):
> >              return True, str(rev)
> >
> >      def checkstatus(self, fetch, ud, d):
> > -        try:
> > -            self._lsremote(ud, d, "")
> > -            return True
> > -        except bb.fetch2.FetchError:
> > -            return False
> > +        repourl = self._get_repo_url(ud)
> > +        for name in ud.names:
> > +            tmpdir = tempfile.mkdtemp()
> > +            revision = ud.revisions[name]
> > +
> > +            try:
> > +                # Initialise an empty git repo for shallow fetching
> > +                runfetchcmd("%s init" % ud.basecmd, d, workdir=tmpdir)
> > +                runfetchcmd("%s remote add origin %s" %
> > +                            (ud.basecmd, shlex.quote(repourl)),
> > +                            d,
> > +                            workdir=tmpdir)
> > +
> > +                # Try to fetch only the specified srcrev
> > +                runfetchcmd("%s fetch -f --progress --depth 1 origin %s" %
>
> I think this is something that has to be optional, and not always enabled.
> I'm sure Richard can advise better than I can on the options for that.
>

Actually never mind on this first point, assuming that this isn't run on all
fetch paths (I didn't check that).

But my second comment about the depth is something to consider.

Bruce

> I've written some python modules that do something similar to this, and
> there's no guarantee that the git hash you are validating can be reached
> with a depth 1 clone. In that case, we need to decide if it's a warning, an
> error, or something else.
>
> Bruce
>
> > +                            (ud.basecmd, revision),
> > +                            d,
> > +                            workdir=tmpdir)
> > +            except bb.fetch2.FetchError:
> > +                return False
> > +            finally:
> > +                bb.utils.remove(tmpdir, recurse=True)
> > +        return True
> > diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
> > index ee41bff4..00def1bd 100644
> > --- a/lib/bb/tests/fetch.py
> > +++ b/lib/bb/tests/fetch.py
> > @@ -1478,6 +1478,40 @@ class FetchCheckStatusTest(FetcherTest):
> >
> >          connection_cache.close_connections()
> >
> > +    @skipIfNoNetwork()
> > +    def test_git_checkstatus(self):
> > +        CHECKSTATUS_GOOD_REPO = "git://git.yoctoproject.org/yocto-docs;branch=master"
> > +        CHECKSTATUS_GOOD_HASH = "2bd14d3810a68082ce01b459cf2799d796a32e63"
> > +
> > +        def do_git_checkstatus(url, srcrev):
> > +            self.d.setVar("SRCREV", srcrev)
> > +            fetch = bb.fetch2.Fetch([url], self.d)
> > +            ud = fetch.ud[url]
> > +            m = ud.method
> > +            return m.checkstatus(fetch, ud, self.d)
> > +
> > +        # Valid URI + SRCREV returns true
> > +        ret = do_git_checkstatus(CHECKSTATUS_GOOD_REPO, CHECKSTATUS_GOOD_HASH)
> > +        self.assertTrue(ret,
> > +                        msg="URI %s, can't check status" %
> > +                        (CHECKSTATUS_GOOD_REPO))
> > +
> > +        # Unavailable server returns false
> > +        invalid_git_uri = 'git://git.test.invalid/yocto-docs;branch=master'
> > +        ret = do_git_checkstatus(invalid_git_uri, CHECKSTATUS_GOOD_HASH)
> > +        self.assertFalse(
> > +            ret,
> > +            msg="URI %s, checkstatus() returns true for invalid Git server" %
> > +            (invalid_git_uri))
> > +
> > +        # Invalid/disappeared SRCREV returns false
> > +        rev_bad = "f" * 40
> > +        ret = do_git_checkstatus(CHECKSTATUS_GOOD_REPO, rev_bad)
> > +        self.assertFalse(
> > +            ret,
> > +            msg="URI %s, REF %s, checkstatus() returns true for invalid ref" %
> > +            (CHECKSTATUS_GOOD_REPO, rev_bad))
> > +
> >
> >  class GitMakeShallowTest(FetcherTest):
> >      def setUp(self):
> > --
> > 2.17.1
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Links: You receive all messages sent to this group.
> > View/Reply Online (#13799): https://lists.openembedded.org/g/bitbake-devel/message/13799
> > Mute This Topic: https://lists.openembedded.org/mt/92190251/1050810
> > Group Owner: bitbake-devel+owner@lists.openembedded.org
> > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [bruce.ashfield@gmail.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
>
>
> --
> - Thou shalt not follow the NULL pointer, for chaos and madness await
> thee at its end
> - "Use the force Harry" - Gandalf, Star Trek II
Richard Purdie July 6, 2022, 4:19 p.m. UTC | #5
On Wed, 2022-07-06 at 11:13 -0400, Bruce Ashfield wrote:
> On Wed, Jul 6, 2022 at 11:12 AM Bruce Ashfield <bruce.ashfield@gmail.com> wrote:
> > 
> > On Tue, Jul 5, 2022 at 1:46 PM Maia Xiao <t-maiaxiao@linux.microsoft.com> wrote:
> > > 
> > > Currently we only check that we can list the remote, but do not validate
> > > if the srcrev actually exists. Now we ensure that the rev exists by
> > > attempting to shallow clone it from the remote.
> > > 
> > > Fixes [YOCTO #11199].
> > > 
> > > Signed-off-by: Maia Xiao <t-maiaxiao@linux.microsoft.com>
> > > ---
> > >  lib/bb/fetch2/git.py  | 28 +++++++++++++++++++++++-----
> > >  lib/bb/tests/fetch.py | 34 ++++++++++++++++++++++++++++++++++
> > >  2 files changed, 57 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > > index 07b3d9a7..99780de8 100644
> > > --- a/lib/bb/fetch2/git.py
> > > +++ b/lib/bb/fetch2/git.py
> > > @@ -832,8 +832,26 @@ class Git(FetchMethod):
> > >              return True, str(rev)
> > > 
> > >      def checkstatus(self, fetch, ud, d):
> > > -        try:
> > > -            self._lsremote(ud, d, "")
> > > -            return True
> > > -        except bb.fetch2.FetchError:
> > > -            return False
> > > +        repourl = self._get_repo_url(ud)
> > > +        for name in ud.names:
> > > +            tmpdir = tempfile.mkdtemp()
> > > +            revision = ud.revisions[name]
> > > +
> > > +            try:
> > > +                # Initialise an empty git repo for shallow fetching
> > > +                runfetchcmd("%s init" % ud.basecmd, d, workdir=tmpdir)
> > > +                runfetchcmd("%s remote add origin %s" %
> > > +                            (ud.basecmd, shlex.quote(repourl)),
> > > +                            d,
> > > +                            workdir=tmpdir)
> > > +
> > > +                # Try to fetch only the specified srcrev
> > > +                runfetchcmd("%s fetch -f --progress --depth 1 origin %s" %
> > 
> > I think this is something that has to be optional, and not always enabled.
> > I'm sure Richard can advise better than I can on the options for that.
> > 
> 
> Actually never mind on this first point, assuming that this isn't run on all
> fetch paths (I didn't check that).
> 
> But my second comment about the depth is something to consider.

Yes, I'm also remembering comments someone made to me recently that
many git servers don't support limited depth clones because of the load
they put on the server.

I appreciate this was moved to a newcomer bug as I think we assumed
there was a straightforward way to remotely ask "is revision X in the
repo" but it is looking like that might not be that easy :(

Cheers,

Richard
Maia Xiao July 6, 2022, 8:14 p.m. UTC | #6
>> Actually never mind on this first point, assuming that this isn't run on
>> all fetch paths (I didn't check that).
>>
>> But my second comment about the depth is something to consider.
>
> Yes, I'm also remembering comments someone made to me recently that
> many git servers don't support limited depth clones because of the load
> they put on the server.
>
> I appreciate this was moved to a newcomer bug as I think we assumed
> there was a straightforward way to remotely ask "is revision X in the
> repo" but it is looking like that might not be that easy :(

Going off what Bruce said, a potentially "softer" way to do this is to
keep self._lsremote(..) as the True condition, but we additionally try
to fetch the specified srcrev and just log a warning if that didn't
work.

If we feel reluctant about the additional fetch, another thing that we
could still do is to log a warning if the srcrev isn't a tag / at the
top of some branch (i.e., not in the ls-remote). I'm not sure if it is
(un)common enough for that check to make sense, though.

Happy to offer a v2 patch if there is interest in either. Thanks for the
discussion, folks!

~maia
Bruce Ashfield July 7, 2022, 4:55 p.m. UTC | #7
On Wed, Jul 6, 2022 at 4:14 PM Maia Xiao <t-maiaxiao@linux.microsoft.com> wrote:
>
> >> Actually never mind on this first point, assuming that this isn't run on
> >> all fetch paths (I didn't check that).
> >>
> >> But my second comment about the depth is something to consider.
> >
> > Yes, I'm also remembering comments someone made to me recently that
> > many git servers don't support limited depth clones because of the load
> > they put on the server.
> >
> > I appreciate this was moved to a newcomer bug as I think we assumed
> > there was a straightforward way to remotely ask "is revision X in the
> > repo" but it is looking like that might not be that easy :(
>
> Going off what Bruce said, a potentially "softer" way to do this is to
> keep self._lsremote(..) as the True condition, but we additionally try
> to fetch the specified srcrev and just log a warning if that didn't
> work.
>
> If we feel reluctant about the additional fetch, another thing that we
> could still do is to log a warning if the srcrev isn't a tag / at the
> top of some branch (i.e., not in the ls-remote). I'm not sure if it is
> (un)common enough for that check to make sense, though.

Either of those options makes sense to me. As I mentioned in my
follow up, I wasn't sure if the revision checking was in the hot path
(i.e. run on every git fetcher opportunity) .. if it is in that path, then
we have to be more careful.

I thought the 2nd option you mentioned sounded interesting, but
it is quite common for branches to move on quickly, while we build
an older tag that is a release point. So it could potentially generate
a lot of warnings.

Bruce

>
> Happy to offer a v2 patch if there is interest in either. Thanks for the
> discussion, folks!
>
> ~maia
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13806): https://lists.openembedded.org/g/bitbake-devel/message/13806
> Mute This Topic: https://lists.openembedded.org/mt/92190251/1050810
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [bruce.ashfield@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Richard Purdie July 7, 2022, 8:50 p.m. UTC | #8
On Thu, 2022-07-07 at 12:55 -0400, Bruce Ashfield wrote:
> On Wed, Jul 6, 2022 at 4:14 PM Maia Xiao <t-maiaxiao@linux.microsoft.com> wrote:
> > 
> > > > Actually never mind on this first point, assuming that this isn't run on
> > > > all fetch paths (I didn't check that).
> > > > 
> > > > But my second comment about the depth is something to consider.
> > > 
> > > Yes, I'm also remembering comments someone made to me recently that
> > > many git servers don't support limited depth clones because of the load
> > > they put on the server.
> > > 
> > > I appreciate this was moved to a newcomer bug as I think we assumed
> > > there was a straightforward way to remotely ask "is revision X in the
> > > repo" but it is looking like that might not be that easy :(
> > 
> > Going off what Bruce said, a potentially "softer" way to do this is to
> > keep self._lsremote(..) as the True condition, but we additionally try
> > to fetch the specified srcrev and just log a warning if that didn't
> > work.
> > 
> > If we feel reluctant about the additional fetch, another thing that we
> > could still do is to log a warning if the srcrev isn't a tag / at the
> > top of some branch (i.e., not in the ls-remote). I'm not sure if it is
> > (un)common enough for that check to make sense, though.
> 
> Either of those options makes sense to me. As I mentioned in my
> follow up, I wasn't sure if the revision checking was in the hot path
> (i.e. run on every git fetcher opportunity) .. if it is in that path, then
> we have to be more careful.
> 
> I thought the 2nd option you mentioned sounded interesting, but
> it is quite common for branches to move on quickly, while we build
> an older tag that is a release point. So it could potentially generate
> a lot of warnings.

This shouldn't be in the hotpath for most code. The wget checkstatus()
*is* as it is used heavily by the sstate code but people don't put
sstate into git (I hope!).

I think keeping the lsremote and then warning if the revision check
fails might be a good compromise. I'd be interested to know how a world
build running checkstatus for each recipe looks for something like oe-
core or poky.

Cheers,

Richard

Patch

diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 07b3d9a7..99780de8 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -832,8 +832,26 @@  class Git(FetchMethod):
             return True, str(rev)
 
     def checkstatus(self, fetch, ud, d):
-        try:
-            self._lsremote(ud, d, "")
-            return True
-        except bb.fetch2.FetchError:
-            return False
+        repourl = self._get_repo_url(ud)
+        for name in ud.names:
+            tmpdir = tempfile.mkdtemp()
+            revision = ud.revisions[name]
+
+            try:
+                # Initialise an empty git repo for shallow fetching
+                runfetchcmd("%s init" % ud.basecmd, d, workdir=tmpdir)
+                runfetchcmd("%s remote add origin %s" %
+                            (ud.basecmd, shlex.quote(repourl)),
+                            d,
+                            workdir=tmpdir)
+
+                # Try to fetch only the specified srcrev
+                runfetchcmd("%s fetch -f --progress --depth 1 origin %s" %
+                            (ud.basecmd, revision),
+                            d,
+                            workdir=tmpdir)
+            except bb.fetch2.FetchError:
+                return False
+            finally:
+                bb.utils.remove(tmpdir, recurse=True)
+        return True
diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index ee41bff4..00def1bd 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -1478,6 +1478,40 @@  class FetchCheckStatusTest(FetcherTest):
 
         connection_cache.close_connections()
 
+    @skipIfNoNetwork()
+    def test_git_checkstatus(self):
+        CHECKSTATUS_GOOD_REPO = "git://git.yoctoproject.org/yocto-docs;branch=master"
+        CHECKSTATUS_GOOD_HASH = "2bd14d3810a68082ce01b459cf2799d796a32e63"
+
+        def do_git_checkstatus(url, srcrev):
+            self.d.setVar("SRCREV", srcrev)
+            fetch = bb.fetch2.Fetch([url], self.d)
+            ud = fetch.ud[url]
+            m = ud.method
+            return m.checkstatus(fetch, ud, self.d)
+
+        # Valid URI + SRCREV returns true
+        ret = do_git_checkstatus(CHECKSTATUS_GOOD_REPO, CHECKSTATUS_GOOD_HASH)
+        self.assertTrue(ret,
+                        msg="URI %s, can't check status" %
+                        (CHECKSTATUS_GOOD_REPO))
+
+        # Unavailable server returns false
+        invalid_git_uri = 'git://git.test.invalid/yocto-docs;branch=master'
+        ret = do_git_checkstatus(invalid_git_uri, CHECKSTATUS_GOOD_HASH)
+        self.assertFalse(
+            ret,
+            msg="URI %s, checkstatus() returns true for invalid Git server" %
+            (invalid_git_uri))
+
+        # Invalid/disappeared SRCREV returns false
+        rev_bad = "f" * 40
+        ret = do_git_checkstatus(CHECKSTATUS_GOOD_REPO, rev_bad)
+        self.assertFalse(
+            ret,
+            msg="URI %s, REF %s, checkstatus() returns true for invalid ref" %
+            (CHECKSTATUS_GOOD_REPO, rev_bad))
+
 
 class GitMakeShallowTest(FetcherTest):
     def setUp(self):