diff mbox series

[1/2] fetch2: local files only in DL_DIR becomes fatal error

Message ID 20220708205407.1680137-1-ptsneves@gmail.com
State New
Headers show
Series [1/2] fetch2: local files only in DL_DIR becomes fatal error | expand

Commit Message

Paulo Neves July 8, 2022, 8:54 p.m. UTC
When trying to checksum local files, if a given file is not found
anywhere else than the DL_DIR then this means that the the build is
inconsistent, and unreproducible.

This also means that if the DL_DIR is removed or not available the
build does not know how to fetch the file and will fail. With this
commit we fail earlier and consistently on this condition.

Signed-off-by: Paulo Neves <ptsneves@gmail.com>
---
 lib/bb/fetch2/__init__.py | 4 +++-
 lib/bb/tests/fetch.py     | 7 +++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

Comments

Richard Purdie July 9, 2022, 6:52 a.m. UTC | #1
On Fri, 2022-07-08 at 22:54 +0200, Paulo Neves wrote:
> When trying to checksum local files, if a given file is not found
> anywhere else than the DL_DIR then this means that the the build is
> inconsistent, and unreproducible.
> 
> This also means that if the DL_DIR is removed or not available the
> build does not know how to fetch the file and will fail. With this
> commit we fail earlier and consistently on this condition.
> 
> Signed-off-by: Paulo Neves <ptsneves@gmail.com>
> ---
>  lib/bb/fetch2/__init__.py | 4 +++-
>  lib/bb/tests/fetch.py     | 7 +++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
> index ac557176..5f05278a 100644
> --- a/lib/bb/fetch2/__init__.py
> +++ b/lib/bb/fetch2/__init__.py
> @@ -1233,7 +1233,9 @@ def get_checksum_file_list(d):
>                  if f.startswith(dl_dir):
>                      # The local fetcher's behaviour is to return a path under DL_DIR if it couldn't find the file anywhere else
>                      if os.path.exists(f):
> -                        bb.warn("Getting checksum for %s SRC_URI entry %s: file not found except in DL_DIR" % (d.getVar('PN'), os.path.basename(f)))
> +                        bb.fatal(("Getting checksum for %s SRC_URI entry %s: file not found except in DL_DIR."
> +                            " This means there is no way to get the file except for an orphaned copy"
> +                            " in DL_DIR.") % (d.getVar('PN'), os.path.basename(f)))
>                      else:
>                          bb.warn("Unable to get checksum for %s SRC_URI entry %s: file could not be found" % (d.getVar('PN'), os.path.basename(f)))
>                  filelist.append(f + ":" + str(os.path.exists(f)))

Did you manage to trigger that error in a real world use case?

I've just been looking at this code and it is horribly old and
outdated. I can't help wonder if we shouldn't do something a bit more
invasive to clean it up a bit more. I suspect it does some of these
things for old/obsolete reasons...

I ask since I'm wondering if anyone ever hits these code paths in a
valid usecase.

Thanks for working on it, we do need to improve some of these things.

Cheers,

Richard
Paulo Neves July 9, 2022, 7:19 a.m. UTC | #2
On Sat, Jul 9, 2022, 08:52 Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Fri, 2022-07-08 at 22:54 +0200, Paulo Neves wrote:
> > When trying to checksum local files, if a given file is not found
> > anywhere else than the DL_DIR then this means that the the build is
> > inconsistent, and unreproducible.
> >
> > This also means that if the DL_DIR is removed or not available the
> > build does not know how to fetch the file and will fail. With this
> > commit we fail earlier and consistently on this condition.
> >
> > Signed-off-by: Paulo Neves <ptsneves@gmail.com>
> > ---
> >  lib/bb/fetch2/__init__.py | 4 +++-
> >  lib/bb/tests/fetch.py     | 7 +++++++
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
> > index ac557176..5f05278a 100644
> > --- a/lib/bb/fetch2/__init__.py
> > +++ b/lib/bb/fetch2/__init__.py
> > @@ -1233,7 +1233,9 @@ def get_checksum_file_list(d):
> >                  if f.startswith(dl_dir):
> >                      # The local fetcher's behaviour is to return a path
> under DL_DIR if it couldn't find the file anywhere else
> >                      if os.path.exists(f):
> > -                        bb.warn("Getting checksum for %s SRC_URI entry
> %s: file not found except in DL_DIR" % (d.getVar('PN'),
> os.path.basename(f)))
> > +                        bb.fatal(("Getting checksum for %s SRC_URI
> entry %s: file not found except in DL_DIR."
> > +                            " This means there is no way to get the
> file except for an orphaned copy"
> > +                            " in DL_DIR.") % (d.getVar('PN'),
> os.path.basename(f)))
> >                      else:
> >                          bb.warn("Unable to get checksum for %s SRC_URI
> entry %s: file could not be found" % (d.getVar('PN'), os.path.basename(f)))
> >                  filelist.append(f + ":" + str(os.path.exists(f)))
>
> Did you manage to trigger that error in a real world use case?
>
> I've just been looking at this code and it is horribly old and
> outdated. I can't help wonder if we shouldn't do something a bit more
> invasive to clean it up a bit more. I suspect it does some of these
> things for old/obsolete reasons...
>
> I ask since I'm wondering if anyone ever hits these code paths in a
> valid usecase.
>
> Thanks for working on it, we do need to improve some of these things.
>
> Cheers,
>
> Richard
>

Hey Richard,

I think these codepaths are hit on mostly on bugs(bad uri parsing) or bad
files(bad permissions).

The use case i added on the tests is also clearly bad, but I have seen
builds going on for very long in the state of warning, which becomes fatal
with the patch, because the dl_dir was stable and used by all users. So
expect this patch to increase the reports of latent issues people just
ignored.

Yesterday I also hit this warning organically exactly on the situation of
the test: a directory mentioned on the SRC_URI which did not exist in any
FILESPATHS but somehow existed on the dl_dir. So it is real.

I will investigate the whole default to dl_dir if not found and remove it
if I cannot find a good reason.

Paulo Neves

>
Richard Purdie July 9, 2022, 1:20 p.m. UTC | #3
On Sat, 2022-07-09 at 13:12 +0200, Paulo Neves wrote:
>  
>  
> On 7/9/22 09:19, Paulo Neves wrote:
>  
> >  
> > 
> >  
> > On Sat, Jul 9, 2022, 08:52 Richard Purdie
> > <richard.purdie@linuxfoundation.org> wrote:
> >  
> > > On Fri, 2022-07-08 at 22:54 +0200, Paulo Neves wrote:
> > >  > When trying to checksum local files, if a given file is not
> > > found
> > >  > anywhere else than the DL_DIR then this means that the the
> > > build is
> > >  > inconsistent, and unreproducible.
> > >  > 
> > >  > This also means that if the DL_DIR is removed or not available
> > > the
> > >  > build does not know how to fetch the file and will fail. With
> > > this
> > >  > commit we fail earlier and consistently on this condition.
> > >  > 
> > >  > Signed-off-by: Paulo Neves <ptsneves@gmail.com>
> > >  > ---
> > >  >  lib/bb/fetch2/__init__.py | 4 +++-
> > >  >  lib/bb/tests/fetch.py     | 7 +++++++
> > >  >  2 files changed, 10 insertions(+), 1 deletion(-)
> > >  > 
> > >  > diff --git a/lib/bb/fetch2/__init__.py
> > > b/lib/bb/fetch2/__init__.py
> > >  > index ac557176..5f05278a 100644
> > >  > --- a/lib/bb/fetch2/__init__.py
> > >  > +++ b/lib/bb/fetch2/__init__.py
> > >  > @@ -1233,7 +1233,9 @@ def get_checksum_file_list(d):
> > >  >                  if f.startswith(dl_dir):
> > >  >                      # The local fetcher's behaviour is to
> > > return a path under DL_DIR if it couldn't find the file anywhere
> > > else
> > >  >                      if os.path.exists(f):
> > >  > -                        bb.warn("Getting checksum for %s
> > > SRC_URI entry %s: file not found except in DL_DIR" %
> > > (d.getVar('PN'), os.path.basename(f)))
> > >  > +                        bb.fatal(("Getting checksum for %s
> > > SRC_URI entry %s: file not found except in DL_DIR."
> > >  > +                            " This means there is no way to
> > > get the file except for an orphaned copy"
> > >  > +                            " in DL_DIR.") % (d.getVar('PN'),
> > > os.path.basename(f)))
> > >  >                      else:
> > >  >                          bb.warn("Unable to get checksum for
> > > %s SRC_URI entry %s: file could not be found" % (d.getVar('PN'),
> > > os.path.basename(f)))
> > >  >                  filelist.append(f + ":" +
> > > str(os.path.exists(f)))
> > >  
> > >  Did you manage to trigger that error in a real world use case?
> > >  
> > >  I've just been looking at this code and it is horribly old and
> > >  outdated. I can't help wonder if we shouldn't do something a bit
> > > more
> > >  invasive to clean it up a bit more. I suspect it does some of
> > > these
> > >  things for old/obsolete reasons...
> > >  
> > >  I ask since I'm wondering if anyone ever hits these code paths
> > > in a
> > >  valid usecase.
> > >  
> > >  Thanks for working on it, we do need to improve some of these
> > > things.
> > >  
> > >  Cheers,
> > >  
> > >  Richard
> > > 
> > 
> > Hey Richard, 
> > 
> > I think these codepaths are hit on mostly on bugs(bad uri parsing)
> > or bad files(bad permissions).
> > 
> > The use case i added on the tests is also clearly bad, but I have
> > seen builds going on for very long in the state of warning, which
> > becomes fatal with the patch, because the dl_dir was stable and
> > used by all users. So expect this patch to increase the reports of
> > latent issues people just ignored. 
> > 
> > Yesterday I also hit this warning organically exactly on the
> > situation of the test: a directory mentioned on the SRC_URI which
> > did not exist in any FILESPATHS but somehow existed on the dl_dir.
> > So it is real. 
> > 
> > I will investigate the whole default to dl_dir if not found and
> > remove it if I cannot find a good reason.
> > 
> > Paulo Neves 
>  
>  I did some archeology trying to find why DL_DIR has anything to do
> with the local fetcher and the reason probably comes from this[1] and
> this[2]. The commit is about sstate cache and i think it considers
> DL_DIR as valid mirror directory in agreement with
> lib/bb/fetch2/README[2]. 
>  
>  The problem is that [3] from the same README "spec", is either
> contradictory with with [2], senseless, or accurate depending on the
> usage of the local file:// fetcher. For a SRC_URI file, what it is
> senseless to search DL_DIR if we have the FILEPATHS? If it is not at
> hand and only in the DL_DIR then it is irreproducible and contradicts
> [3] (the case this patch argues for).

sstate.bbclass sets DL_DIR to SSTATE_DIR in the context of sstate
downloads. It also puts SSTATE_DIR in FILESPATH which I suspect it did
not back in 2011. I therefore suspect we now have a better fix for the
issue as FILESPATH is set and the change from 2011 is now obsolete for
sstate. Or I hope so anyway!

Cheers,

Richard
diff mbox series

Patch

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index ac557176..5f05278a 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1233,7 +1233,9 @@  def get_checksum_file_list(d):
                 if f.startswith(dl_dir):
                     # The local fetcher's behaviour is to return a path under DL_DIR if it couldn't find the file anywhere else
                     if os.path.exists(f):
-                        bb.warn("Getting checksum for %s SRC_URI entry %s: file not found except in DL_DIR" % (d.getVar('PN'), os.path.basename(f)))
+                        bb.fatal(("Getting checksum for %s SRC_URI entry %s: file not found except in DL_DIR."
+                            " This means there is no way to get the file except for an orphaned copy"
+                            " in DL_DIR.") % (d.getVar('PN'), os.path.basename(f)))
                     else:
                         bb.warn("Unable to get checksum for %s SRC_URI entry %s: file could not be found" % (d.getVar('PN'), os.path.basename(f)))
                 filelist.append(f + ":" + str(os.path.exists(f)))
diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index ee41bff4..3ebd9fd7 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -693,6 +693,13 @@  class FetcherLocalTest(FetcherTest):
         flst.sort()
         return flst
 
+    def test_local_checksum_fails_if_only_in_dldir(self):
+        with open(os.path.join(self.dldir, "on_dl_dir"), "wb"):
+            pass
+        self.d.setVar("SRC_URI", "file://on_dl_dir")
+        with self.assertRaises(bb.BBHandledException):
+            bb.fetch.get_checksum_file_list(self.d)
+
     def test_local(self):
         tree = self.fetchUnpack(['file://a', 'file://dir/c'])
         self.assertEqual(tree, ['a', 'dir/c'])