diff mbox series

bitbake: fetch2: git: revert path checks

Message ID 20230913084344.29928-1-mikko.rapeli@linaro.org
State New
Headers show
Series bitbake: fetch2: git: revert path checks | expand

Commit Message

Mikko Rapeli Sept. 13, 2023, 8:43 a.m. UTC
Commit 10f38157a069cb6938323cadd5e523037a29ace9
triggers re-clones of uptodate repositories when their
access path changes. This is wrong. It is enough that
needed commits are found from the git repo. For example
when same DL_DIR is accessed using two different paths
due to a symlink, then the resulting absolute paths will
differ but the content is the same. Trust the content.

Avoids extra downloads of large repositories like linux-yocto
for kernel when the local cache was already uptodate.

Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
---
 bitbake/lib/bb/fetch2/git.py | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

Comments

Richard Purdie Sept. 13, 2023, 8:51 a.m. UTC | #1
On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
> Commit 10f38157a069cb6938323cadd5e523037a29ace9
> triggers re-clones of uptodate repositories when their
> access path changes. This is wrong. It is enough that
> needed commits are found from the git repo. For example
> when same DL_DIR is accessed using two different paths
> due to a symlink, then the resulting absolute paths will
> differ but the content is the same. Trust the content.
> 
> Avoids extra downloads of large repositories like linux-yocto
> for kernel when the local cache was already uptodate.
> 
> Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> ---
>  bitbake/lib/bb/fetch2/git.py | 14 +-------------
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> index e11271b757..0c547ea303 100644
> --- a/bitbake/lib/bb/fetch2/git.py
> +++ b/bitbake/lib/bb/fetch2/git.py
> @@ -371,19 +371,7 @@ class Git(FetchMethod):
>              # The directory may exist, but not be the top level of a bare git
>              # repository in which case it needs to be deleted and re-cloned.
>              try:
> -                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
> -                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> -
> -                toplevel = os.path.abspath(output.rstrip())
> -                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
> -                # The top level Git directory must either be the clone directory
> -                # or a child of the clone directory. Any ancestor directory of
> -                # the clone directory is not valid as the Git directory (and
> -                # probably belongs to some other unrelated repository), so a
> -                # clone is required
> -                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
> -                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> -                    needs_clone = True
> +                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>              except bb.fetch2.FetchError as e:
>                  logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
>                  needs_clone = True

Doesn't this give rise to some of the original issues (it could find a
parent git repo instead of the one being cloned)?

Cheers,

Richard
Mikko Rapeli Sept. 13, 2023, 9:04 a.m. UTC | #2
Hi,

On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
> On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
> > Commit 10f38157a069cb6938323cadd5e523037a29ace9
> > triggers re-clones of uptodate repositories when their
> > access path changes. This is wrong. It is enough that
> > needed commits are found from the git repo. For example
> > when same DL_DIR is accessed using two different paths
> > due to a symlink, then the resulting absolute paths will
> > differ but the content is the same. Trust the content.
> > 
> > Avoids extra downloads of large repositories like linux-yocto
> > for kernel when the local cache was already uptodate.
> > 
> > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > ---
> >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
> >  1 file changed, 1 insertion(+), 13 deletions(-)
> > 
> > diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> > index e11271b757..0c547ea303 100644
> > --- a/bitbake/lib/bb/fetch2/git.py
> > +++ b/bitbake/lib/bb/fetch2/git.py
> > @@ -371,19 +371,7 @@ class Git(FetchMethod):
> >              # The directory may exist, but not be the top level of a bare git
> >              # repository in which case it needs to be deleted and re-cloned.
> >              try:
> > -                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
> > -                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> > -
> > -                toplevel = os.path.abspath(output.rstrip())
> > -                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
> > -                # The top level Git directory must either be the clone directory
> > -                # or a child of the clone directory. Any ancestor directory of
> > -                # the clone directory is not valid as the Git directory (and
> > -                # probably belongs to some other unrelated repository), so a
> > -                # clone is required
> > -                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
> > -                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> > -                    needs_clone = True
> > +                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> >              except bb.fetch2.FetchError as e:
> >                  logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
> >                  needs_clone = True
> 
> Doesn't this give rise to some of the original issues (it could find a
> parent git repo instead of the one being cloned)?

Yes, but this has not been a problem for me. Re-cloning existing fully cached
repositories is the effect I see.

I think the git commit id and branch should decide if fetch or re-clone steps are needed.
The path of the repository should not matter. The git remotes also not. Both of those can
and do change.

Cheers,

-Mikko
Richard Purdie Sept. 13, 2023, 9:22 a.m. UTC | #3
On Wed, 2023-09-13 at 12:04 +0300, Mikko Rapeli wrote:
> Hi,
> 
> On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
> > On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
> > > Commit 10f38157a069cb6938323cadd5e523037a29ace9
> > > triggers re-clones of uptodate repositories when their
> > > access path changes. This is wrong. It is enough that
> > > needed commits are found from the git repo. For example
> > > when same DL_DIR is accessed using two different paths
> > > due to a symlink, then the resulting absolute paths will
> > > differ but the content is the same. Trust the content.
> > > 
> > > Avoids extra downloads of large repositories like linux-yocto
> > > for kernel when the local cache was already uptodate.
> > > 
> > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > > ---
> > >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
> > >  1 file changed, 1 insertion(+), 13 deletions(-)
> > > 
> > > diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> > > index e11271b757..0c547ea303 100644
> > > --- a/bitbake/lib/bb/fetch2/git.py
> > > +++ b/bitbake/lib/bb/fetch2/git.py
> > > @@ -371,19 +371,7 @@ class Git(FetchMethod):
> > >              # The directory may exist, but not be the top level of a bare git
> > >              # repository in which case it needs to be deleted and re-cloned.
> > >              try:
> > > -                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
> > > -                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> > > -
> > > -                toplevel = os.path.abspath(output.rstrip())
> > > -                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
> > > -                # The top level Git directory must either be the clone directory
> > > -                # or a child of the clone directory. Any ancestor directory of
> > > -                # the clone directory is not valid as the Git directory (and
> > > -                # probably belongs to some other unrelated repository), so a
> > > -                # clone is required
> > > -                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
> > > -                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> > > -                    needs_clone = True
> > > +                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> > >              except bb.fetch2.FetchError as e:
> > >                  logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
> > >                  needs_clone = True
> > 
> > Doesn't this give rise to some of the original issues (it could find a
> > parent git repo instead of the one being cloned)?
> 
> Yes, but this has not been a problem for me. Re-cloning existing fully cached
> repositories is the effect I see.
> 
> I think the git commit id and branch should decide if fetch or re-clone steps are needed.
> The path of the repository should not matter. The git remotes also not. Both of those can
> and do change.

In that case you're effectively sending a revert and we should probably
be clear about that...

We really need to find a way to resolve things for everyone.

Cheers,

Richard
Martin Jansa Sept. 13, 2023, 9:42 a.m. UTC | #4
FWIW: I'm also seeing warnings like:

WARNING: Top level directory
'/data001/work/downloads/git2/git.libssh.org.projects.libssh.git' doesn't
match expected
'/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git'.
Re-cloning

Where /mnt/mirror-write/downloads/ is configured DL_DIR and
/data001/work/downloads/ was probably the DL_DIR of the jenkins jobs which
originally fetched the libssh and created the tarball with it for premirror.

Cheers,

On Wed, Sep 13, 2023 at 11:22 AM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Wed, 2023-09-13 at 12:04 +0300, Mikko Rapeli wrote:
> > Hi,
> >
> > On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
> > > On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
> > > > Commit 10f38157a069cb6938323cadd5e523037a29ace9
> > > > triggers re-clones of uptodate repositories when their
> > > > access path changes. This is wrong. It is enough that
> > > > needed commits are found from the git repo. For example
> > > > when same DL_DIR is accessed using two different paths
> > > > due to a symlink, then the resulting absolute paths will
> > > > differ but the content is the same. Trust the content.
> > > >
> > > > Avoids extra downloads of large repositories like linux-yocto
> > > > for kernel when the local cache was already uptodate.
> > > >
> > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > > > ---
> > > >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
> > > >
> > > > diff --git a/bitbake/lib/bb/fetch2/git.py
> b/bitbake/lib/bb/fetch2/git.py
> > > > index e11271b757..0c547ea303 100644
> > > > --- a/bitbake/lib/bb/fetch2/git.py
> > > > +++ b/bitbake/lib/bb/fetch2/git.py
> > > > @@ -371,19 +371,7 @@ class Git(FetchMethod):
> > > >              # The directory may exist, but not be the top level of
> a bare git
> > > >              # repository in which case it needs to be deleted and
> re-cloned.
> > > >              try:
> > > > -                # Since clones can be bare, use --absolute-git-dir
> instead of --show-toplevel
> > > > -                output = runfetchcmd("LANG=C %s rev-parse
> --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> > > > -
> > > > -                toplevel = os.path.abspath(output.rstrip())
> > > > -                abs_clonedir =
> os.path.abspath(ud.clonedir).rstrip('/')
> > > > -                # The top level Git directory must either be the
> clone directory
> > > > -                # or a child of the clone directory. Any ancestor
> directory of
> > > > -                # the clone directory is not valid as the Git
> directory (and
> > > > -                # probably belongs to some other unrelated
> repository), so a
> > > > -                # clone is required
> > > > -                if os.path.commonprefix([abs_clonedir, toplevel])
> != abs_clonedir:
> > > > -                    logger.warning("Top level directory '%s'
> doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> > > > -                    needs_clone = True
> > > > +                runfetchcmd("LANG=C %s rev-parse
> --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> > > >              except bb.fetch2.FetchError as e:
> > > >                  logger.warning("Unable to get top level for %s (not
> a git directory?): %s", ud.clonedir, e)
> > > >                  needs_clone = True
> > >
> > > Doesn't this give rise to some of the original issues (it could find a
> > > parent git repo instead of the one being cloned)?
> >
> > Yes, but this has not been a problem for me. Re-cloning existing fully
> cached
> > repositories is the effect I see.
> >
> > I think the git commit id and branch should decide if fetch or re-clone
> steps are needed.
> > The path of the repository should not matter. The git remotes also not.
> Both of those can
> > and do change.
>
> In that case you're effectively sending a revert and we should probably
> be clear about that...
>
> We really need to find a way to resolve things for everyone.
>
> Cheers,
>
> Richard
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#15045):
> https://lists.openembedded.org/g/bitbake-devel/message/15045
> Mute This Topic: https://lists.openembedded.org/mt/101333074/3617156
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe:
> https://lists.openembedded.org/g/bitbake-devel/leave/8021546/3617156/1661570722/xyzzy
> [martin.jansa@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Joshua Watt Sept. 13, 2023, 2:09 p.m. UTC | #5
On Wed, Sep 13, 2023 at 3:42 AM Martin Jansa <martin.jansa@gmail.com> wrote:
>
> FWIW: I'm also seeing warnings like:
>
> WARNING: Top level directory '/data001/work/downloads/git2/git.libssh.org.projects.libssh.git' doesn't match expected '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git'. Re-cloning
>
> Where /mnt/mirror-write/downloads/ is configured DL_DIR and /data001/work/downloads/ was probably the DL_DIR of the jenkins jobs which originally fetched the libssh and created the tarball with it for premirror.

That's strange.... I'll take a look real quick and see whats going on

>
> Cheers,
>
> On Wed, Sep 13, 2023 at 11:22 AM Richard Purdie <richard.purdie@linuxfoundation.org> wrote:
>>
>> On Wed, 2023-09-13 at 12:04 +0300, Mikko Rapeli wrote:
>> > Hi,
>> >
>> > On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
>> > > On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
>> > > > Commit 10f38157a069cb6938323cadd5e523037a29ace9
>> > > > triggers re-clones of uptodate repositories when their
>> > > > access path changes. This is wrong. It is enough that
>> > > > needed commits are found from the git repo. For example
>> > > > when same DL_DIR is accessed using two different paths
>> > > > due to a symlink, then the resulting absolute paths will
>> > > > differ but the content is the same. Trust the content.
>> > > >
>> > > > Avoids extra downloads of large repositories like linux-yocto
>> > > > for kernel when the local cache was already uptodate.
>> > > >
>> > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
>> > > > ---
>> > > >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
>> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
>> > > >
>> > > > diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
>> > > > index e11271b757..0c547ea303 100644
>> > > > --- a/bitbake/lib/bb/fetch2/git.py
>> > > > +++ b/bitbake/lib/bb/fetch2/git.py
>> > > > @@ -371,19 +371,7 @@ class Git(FetchMethod):
>> > > >              # The directory may exist, but not be the top level of a bare git
>> > > >              # repository in which case it needs to be deleted and re-cloned.
>> > > >              try:
>> > > > -                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
>> > > > -                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> > > > -
>> > > > -                toplevel = os.path.abspath(output.rstrip())
>> > > > -                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
>> > > > -                # The top level Git directory must either be the clone directory
>> > > > -                # or a child of the clone directory. Any ancestor directory of
>> > > > -                # the clone directory is not valid as the Git directory (and
>> > > > -                # probably belongs to some other unrelated repository), so a
>> > > > -                # clone is required
>> > > > -                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
>> > > > -                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
>> > > > -                    needs_clone = True
>> > > > +                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> > > >              except bb.fetch2.FetchError as e:
>> > > >                  logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
>> > > >                  needs_clone = True
>> > >
>> > > Doesn't this give rise to some of the original issues (it could find a
>> > > parent git repo instead of the one being cloned)?
>> >
>> > Yes, but this has not been a problem for me. Re-cloning existing fully cached
>> > repositories is the effect I see.
>> >
>> > I think the git commit id and branch should decide if fetch or re-clone steps are needed.
>> > The path of the repository should not matter. The git remotes also not. Both of those can
>> > and do change.
>>
>> In that case you're effectively sending a revert and we should probably
>> be clear about that...
>>
>> We really need to find a way to resolve things for everyone.
>>
>> Cheers,
>>
>> Richard
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#15045): https://lists.openembedded.org/g/bitbake-devel/message/15045
>> Mute This Topic: https://lists.openembedded.org/mt/101333074/3617156
>> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/leave/8021546/3617156/1661570722/xyzzy [martin.jansa@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
Joshua Watt Sept. 13, 2023, 2:45 p.m. UTC | #6
On Wed, Sep 13, 2023 at 3:42 AM Martin Jansa <martin.jansa@gmail.com> wrote:
>
> FWIW: I'm also seeing warnings like:
>
> WARNING: Top level directory '/data001/work/downloads/git2/git.libssh.org.projects.libssh.git' doesn't match expected '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git'. Re-cloning
>
> Where /mnt/mirror-write/downloads/ is configured DL_DIR and /data001/work/downloads/ was probably the DL_DIR of the jenkins jobs which originally fetched the libssh and created the tarball with it for premirror.

Ya, that /data001 path is _really_ weird. That path comes from running
`git rev-parse --absolute-git-dir` in the extracted directory
(ud.clonedir, '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git').
The should be no absolute paths encoded in the mirror tarballs, so
this seems like it's some sort of misconfiguration(?), and not related
to Mikkos problem.


I think I have a fix for Mikkos symlink problem though

>
> Cheers,
>
> On Wed, Sep 13, 2023 at 11:22 AM Richard Purdie <richard.purdie@linuxfoundation.org> wrote:
>>
>> On Wed, 2023-09-13 at 12:04 +0300, Mikko Rapeli wrote:
>> > Hi,
>> >
>> > On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
>> > > On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
>> > > > Commit 10f38157a069cb6938323cadd5e523037a29ace9
>> > > > triggers re-clones of uptodate repositories when their
>> > > > access path changes. This is wrong. It is enough that
>> > > > needed commits are found from the git repo. For example
>> > > > when same DL_DIR is accessed using two different paths
>> > > > due to a symlink, then the resulting absolute paths will
>> > > > differ but the content is the same. Trust the content.
>> > > >
>> > > > Avoids extra downloads of large repositories like linux-yocto
>> > > > for kernel when the local cache was already uptodate.
>> > > >
>> > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
>> > > > ---
>> > > >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
>> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
>> > > >
>> > > > diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
>> > > > index e11271b757..0c547ea303 100644
>> > > > --- a/bitbake/lib/bb/fetch2/git.py
>> > > > +++ b/bitbake/lib/bb/fetch2/git.py
>> > > > @@ -371,19 +371,7 @@ class Git(FetchMethod):
>> > > >              # The directory may exist, but not be the top level of a bare git
>> > > >              # repository in which case it needs to be deleted and re-cloned.
>> > > >              try:
>> > > > -                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
>> > > > -                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> > > > -
>> > > > -                toplevel = os.path.abspath(output.rstrip())
>> > > > -                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
>> > > > -                # The top level Git directory must either be the clone directory
>> > > > -                # or a child of the clone directory. Any ancestor directory of
>> > > > -                # the clone directory is not valid as the Git directory (and
>> > > > -                # probably belongs to some other unrelated repository), so a
>> > > > -                # clone is required
>> > > > -                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
>> > > > -                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
>> > > > -                    needs_clone = True
>> > > > +                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> > > >              except bb.fetch2.FetchError as e:
>> > > >                  logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
>> > > >                  needs_clone = True
>> > >
>> > > Doesn't this give rise to some of the original issues (it could find a
>> > > parent git repo instead of the one being cloned)?
>> >
>> > Yes, but this has not been a problem for me. Re-cloning existing fully cached
>> > repositories is the effect I see.
>> >
>> > I think the git commit id and branch should decide if fetch or re-clone steps are needed.
>> > The path of the repository should not matter. The git remotes also not. Both of those can
>> > and do change.
>>
>> In that case you're effectively sending a revert and we should probably
>> be clear about that...
>>
>> We really need to find a way to resolve things for everyone.
>>
>> Cheers,
>>
>> Richard
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#15045): https://lists.openembedded.org/g/bitbake-devel/message/15045
>> Mute This Topic: https://lists.openembedded.org/mt/101333074/3617156
>> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/leave/8021546/3617156/1661570722/xyzzy [martin.jansa@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
Martin Jansa Sept. 13, 2023, 2:53 p.m. UTC | #7
> not related to Mikkos problem.
> I think I have a fix for Mikkos symlink problem though

Maybe it is related in the end, to my surprise /mnt/mirror-write/downloads
is on some servers a symlink to /data001/work/downloads and then this
/mnt/mirror-write/download
is used as DL_DIR there (not /data001/work/downloads directly).

On Wed, Sep 13, 2023 at 4:45 PM Joshua Watt <jpewhacker@gmail.com> wrote:

> On Wed, Sep 13, 2023 at 3:42 AM Martin Jansa <martin.jansa@gmail.com>
> wrote:
> >
> > FWIW: I'm also seeing warnings like:
> >
> > WARNING: Top level directory
> '/data001/work/downloads/git2/git.libssh.org.projects.libssh.git' doesn't
> match expected
> '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git'.
> Re-cloning
> >
> > Where /mnt/mirror-write/downloads/ is configured DL_DIR and
> /data001/work/downloads/ was probably the DL_DIR of the jenkins jobs which
> originally fetched the libssh and created the tarball with it for premirror.
>
> Ya, that /data001 path is _really_ weird. That path comes from running
> `git rev-parse --absolute-git-dir` in the extracted directory
> (ud.clonedir,
> '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git').
> The should be no absolute paths encoded in the mirror tarballs, so
> this seems like it's some sort of misconfiguration(?), and not related
> to Mikkos problem.
>
>
> I think I have a fix for Mikkos symlink problem though
>
> >
> > Cheers,
> >
> > On Wed, Sep 13, 2023 at 11:22 AM Richard Purdie <
> richard.purdie@linuxfoundation.org> wrote:
> >>
> >> On Wed, 2023-09-13 at 12:04 +0300, Mikko Rapeli wrote:
> >> > Hi,
> >> >
> >> > On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
> >> > > On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
> >> > > > Commit 10f38157a069cb6938323cadd5e523037a29ace9
> >> > > > triggers re-clones of uptodate repositories when their
> >> > > > access path changes. This is wrong. It is enough that
> >> > > > needed commits are found from the git repo. For example
> >> > > > when same DL_DIR is accessed using two different paths
> >> > > > due to a symlink, then the resulting absolute paths will
> >> > > > differ but the content is the same. Trust the content.
> >> > > >
> >> > > > Avoids extra downloads of large repositories like linux-yocto
> >> > > > for kernel when the local cache was already uptodate.
> >> > > >
> >> > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> >> > > > ---
> >> > > >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
> >> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
> >> > > >
> >> > > > diff --git a/bitbake/lib/bb/fetch2/git.py
> b/bitbake/lib/bb/fetch2/git.py
> >> > > > index e11271b757..0c547ea303 100644
> >> > > > --- a/bitbake/lib/bb/fetch2/git.py
> >> > > > +++ b/bitbake/lib/bb/fetch2/git.py
> >> > > > @@ -371,19 +371,7 @@ class Git(FetchMethod):
> >> > > >              # The directory may exist, but not be the top level
> of a bare git
> >> > > >              # repository in which case it needs to be deleted
> and re-cloned.
> >> > > >              try:
> >> > > > -                # Since clones can be bare, use
> --absolute-git-dir instead of --show-toplevel
> >> > > > -                output = runfetchcmd("LANG=C %s rev-parse
> --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> >> > > > -
> >> > > > -                toplevel = os.path.abspath(output.rstrip())
> >> > > > -                abs_clonedir =
> os.path.abspath(ud.clonedir).rstrip('/')
> >> > > > -                # The top level Git directory must either be the
> clone directory
> >> > > > -                # or a child of the clone directory. Any
> ancestor directory of
> >> > > > -                # the clone directory is not valid as the Git
> directory (and
> >> > > > -                # probably belongs to some other unrelated
> repository), so a
> >> > > > -                # clone is required
> >> > > > -                if os.path.commonprefix([abs_clonedir,
> toplevel]) != abs_clonedir:
> >> > > > -                    logger.warning("Top level directory '%s'
> doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> >> > > > -                    needs_clone = True
> >> > > > +                runfetchcmd("LANG=C %s rev-parse
> --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> >> > > >              except bb.fetch2.FetchError as e:
> >> > > >                  logger.warning("Unable to get top level for %s
> (not a git directory?): %s", ud.clonedir, e)
> >> > > >                  needs_clone = True
> >> > >
> >> > > Doesn't this give rise to some of the original issues (it could
> find a
> >> > > parent git repo instead of the one being cloned)?
> >> >
> >> > Yes, but this has not been a problem for me. Re-cloning existing
> fully cached
> >> > repositories is the effect I see.
> >> >
> >> > I think the git commit id and branch should decide if fetch or
> re-clone steps are needed.
> >> > The path of the repository should not matter. The git remotes also
> not. Both of those can
> >> > and do change.
> >>
> >> In that case you're effectively sending a revert and we should probably
> >> be clear about that...
> >>
> >> We really need to find a way to resolve things for everyone.
> >>
> >> Cheers,
> >>
> >> Richard
> >>
> >> -=-=-=-=-=-=-=-=-=-=-=-
> >> Links: You receive all messages sent to this group.
> >> View/Reply Online (#15045):
> https://lists.openembedded.org/g/bitbake-devel/message/15045
> >> Mute This Topic: https://lists.openembedded.org/mt/101333074/3617156
> >> Group Owner: bitbake-devel+owner@lists.openembedded.org
> >> Unsubscribe:
> https://lists.openembedded.org/g/bitbake-devel/leave/8021546/3617156/1661570722/xyzzy
> [martin.jansa@gmail.com]
> >> -=-=-=-=-=-=-=-=-=-=-=-
> >>
>
Joshua Watt Sept. 13, 2023, 3 p.m. UTC | #8
On Wed, Sep 13, 2023 at 8:53 AM Martin Jansa <martin.jansa@gmail.com> wrote:
>
> > not related to Mikkos problem.
> > I think I have a fix for Mikkos symlink problem though
>
> Maybe it is related in the end, to my surprise /mnt/mirror-write/downloads is on some servers a symlink to /data001/work/downloads and then this /mnt/mirror-write/download is used as DL_DIR there (not /data001/work/downloads directly).

Ah, OK. That is the same problem as Mikko. I was assuming you were
seeing that locally where the two aren't symlinked together. That is a
much simpler explanation :)

>
> On Wed, Sep 13, 2023 at 4:45 PM Joshua Watt <jpewhacker@gmail.com> wrote:
>>
>> On Wed, Sep 13, 2023 at 3:42 AM Martin Jansa <martin.jansa@gmail.com> wrote:
>> >
>> > FWIW: I'm also seeing warnings like:
>> >
>> > WARNING: Top level directory '/data001/work/downloads/git2/git.libssh.org.projects.libssh.git' doesn't match expected '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git'. Re-cloning
>> >
>> > Where /mnt/mirror-write/downloads/ is configured DL_DIR and /data001/work/downloads/ was probably the DL_DIR of the jenkins jobs which originally fetched the libssh and created the tarball with it for premirror.
>>
>> Ya, that /data001 path is _really_ weird. That path comes from running
>> `git rev-parse --absolute-git-dir` in the extracted directory
>> (ud.clonedir, '/mnt/mirror-write/downloads/git2/git.libssh.org.projects.libssh.git').
>> The should be no absolute paths encoded in the mirror tarballs, so
>> this seems like it's some sort of misconfiguration(?), and not related
>> to Mikkos problem.
>>
>>
>> I think I have a fix for Mikkos symlink problem though
>>
>> >
>> > Cheers,
>> >
>> > On Wed, Sep 13, 2023 at 11:22 AM Richard Purdie <richard.purdie@linuxfoundation.org> wrote:
>> >>
>> >> On Wed, 2023-09-13 at 12:04 +0300, Mikko Rapeli wrote:
>> >> > Hi,
>> >> >
>> >> > On Wed, Sep 13, 2023 at 09:51:39AM +0100, Richard Purdie wrote:
>> >> > > On Wed, 2023-09-13 at 11:43 +0300, Mikko Rapeli wrote:
>> >> > > > Commit 10f38157a069cb6938323cadd5e523037a29ace9
>> >> > > > triggers re-clones of uptodate repositories when their
>> >> > > > access path changes. This is wrong. It is enough that
>> >> > > > needed commits are found from the git repo. For example
>> >> > > > when same DL_DIR is accessed using two different paths
>> >> > > > due to a symlink, then the resulting absolute paths will
>> >> > > > differ but the content is the same. Trust the content.
>> >> > > >
>> >> > > > Avoids extra downloads of large repositories like linux-yocto
>> >> > > > for kernel when the local cache was already uptodate.
>> >> > > >
>> >> > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
>> >> > > > ---
>> >> > > >  bitbake/lib/bb/fetch2/git.py | 14 +-------------
>> >> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
>> >> > > >
>> >> > > > diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
>> >> > > > index e11271b757..0c547ea303 100644
>> >> > > > --- a/bitbake/lib/bb/fetch2/git.py
>> >> > > > +++ b/bitbake/lib/bb/fetch2/git.py
>> >> > > > @@ -371,19 +371,7 @@ class Git(FetchMethod):
>> >> > > >              # The directory may exist, but not be the top level of a bare git
>> >> > > >              # repository in which case it needs to be deleted and re-cloned.
>> >> > > >              try:
>> >> > > > -                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
>> >> > > > -                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> >> > > > -
>> >> > > > -                toplevel = os.path.abspath(output.rstrip())
>> >> > > > -                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
>> >> > > > -                # The top level Git directory must either be the clone directory
>> >> > > > -                # or a child of the clone directory. Any ancestor directory of
>> >> > > > -                # the clone directory is not valid as the Git directory (and
>> >> > > > -                # probably belongs to some other unrelated repository), so a
>> >> > > > -                # clone is required
>> >> > > > -                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
>> >> > > > -                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
>> >> > > > -                    needs_clone = True
>> >> > > > +                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> >> > > >              except bb.fetch2.FetchError as e:
>> >> > > >                  logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
>> >> > > >                  needs_clone = True
>> >> > >
>> >> > > Doesn't this give rise to some of the original issues (it could find a
>> >> > > parent git repo instead of the one being cloned)?
>> >> >
>> >> > Yes, but this has not been a problem for me. Re-cloning existing fully cached
>> >> > repositories is the effect I see.
>> >> >
>> >> > I think the git commit id and branch should decide if fetch or re-clone steps are needed.
>> >> > The path of the repository should not matter. The git remotes also not. Both of those can
>> >> > and do change.
>> >>
>> >> In that case you're effectively sending a revert and we should probably
>> >> be clear about that...
>> >>
>> >> We really need to find a way to resolve things for everyone.
>> >>
>> >> Cheers,
>> >>
>> >> Richard
>> >>
>> >> -=-=-=-=-=-=-=-=-=-=-=-
>> >> Links: You receive all messages sent to this group.
>> >> View/Reply Online (#15045): https://lists.openembedded.org/g/bitbake-devel/message/15045
>> >> Mute This Topic: https://lists.openembedded.org/mt/101333074/3617156
>> >> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> >> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/leave/8021546/3617156/1661570722/xyzzy [martin.jansa@gmail.com]
>> >> -=-=-=-=-=-=-=-=-=-=-=-
>> >>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index e11271b757..0c547ea303 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -371,19 +371,7 @@  class Git(FetchMethod):
             # The directory may exist, but not be the top level of a bare git
             # repository in which case it needs to be deleted and re-cloned.
             try:
-                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
-                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
-
-                toplevel = os.path.abspath(output.rstrip())
-                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
-                # The top level Git directory must either be the clone directory
-                # or a child of the clone directory. Any ancestor directory of
-                # the clone directory is not valid as the Git directory (and
-                # probably belongs to some other unrelated repository), so a
-                # clone is required
-                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
-                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
-                    needs_clone = True
+                runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
             except bb.fetch2.FetchError as e:
                 logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
                 needs_clone = True