diff mbox series

[v5,1/3] package: change location of debugsources to PKGDESTWORK

Message ID 20250521134400.1733473-2-daniel.turull@ericsson.com
State Superseded
Headers show
Series Check compiled files to filter kernel CVEs | expand

Commit Message

Daniel Turull May 21, 2025, 1:43 p.m. UTC
From: Daniel Turull <daniel.turull@ericsson.com>

Storing file generated doing packaging in WORKDIR,
doesn't get cached in sstate and task depending on them fail
to find them.

Store it instead in pkgdata/debugsources.

Signed-off-by: Daniel Turull <daniel.turull@ericsson.com>
---
 meta/lib/oe/package.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Richard Purdie May 27, 2025, 1:10 p.m. UTC | #1
On Wed, 2025-05-21 at 15:43 +0200, Daniel Turull via lists.openembedded.org wrote:
> From: Daniel Turull <daniel.turull@ericsson.com>
> 
> Storing file generated doing packaging in WORKDIR,
> doesn't get cached in sstate and task depending on them fail
> to find them.
> 
> Store it instead in pkgdata/debugsources.
> 
> Signed-off-by: Daniel Turull <daniel.turull@ericsson.com>
> ---
>  meta/lib/oe/package.py | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
> index 0bcc04ea54..6fcd001db1 100644
> --- a/meta/lib/oe/package.py
> +++ b/meta/lib/oe/package.py
> @@ -972,7 +972,10 @@ def copydebugsources(debugsrcdir, sources, d):
>      cpath = oe.cachedpath.CachedPath()
>  
>      if debugsrcdir and sources:
> -        sourcefile = d.expand("${WORKDIR}/debugsources.list")
> +        sourcefile = d.expand("${PKGDESTWORK}/debugsources/${PN}-debugsources.list")
> +        debugdir = os.path.dirname(sourcefile)
> +        if not os.path.isdir(debugdir):
> +            bb.utils.mkdirhier(debugdir)
>          bb.utils.remove(sourcefile)
>  
>          # filenames are null-separated - this is an artefact of the previous use

This needs a little bit more explanation/justification in the commit
message as this change is a fairly big deal. I can guess why you're
doing that based on the other patches but the commit should stand alone
with information.

Effectively by moving the file, you're taking it from being internal
API to an external API which can be used outside of do_package. When we
do that, we need to be really clear about it and also be 100% sure that
what we're exposing is the right information in the right form.

There are other implications such as increasing the size of the sstate
objects too.

This file is a null separated list, is that a good format to export it
in for wider use? It is certainly different to any other file in
pkgdata.

Also, the file just contains a list of all sources referenced by all
files in the output. Do we need to include more information, such as
which files have which dependencies?

I'm asking these questions as I'm worried about future implications for
this data. Changing the format later is hard.

Cheers,

Richard
Daniel Turull May 27, 2025, 3:06 p.m. UTC | #2
Thanks Richard,
Yes, I'm exposing the file to be able to extract the source code used in the other patches in the series. Since the file was there, I though it will be minimal changes on the current code and less overhead creating new tasks to extract the information that is already available.

Will it make more sense to create a new file under "${PKGDESTWORK}/debugsources/ with the correct format and the first filtering done, more aligned with existing files, since the current file is use a temporal placeholder? Or something in the line of:
binary_file: src1 src2 src3 ...

I can also create a better explanation in thee commit message, in the lines of:
The information provided in debugsource can be use to have more detailed information on the files used during the compilation and improve quality on the SPDX.

I'll try to measure the increase on the sstate.

Cheers,

Daniel

> -----Original Message-----
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
> Sent: Tuesday, 27 May 2025 15:11
> To: Daniel Turull <daniel.turull@ericsson.com>; openembedded-
> core@lists.openembedded.org
> Subject: Re: [OE-core] [PATCH v5 1/3] package: change location of debugsources
> to PKGDESTWORK
> 
> On Wed, 2025-05-21 at 15:43 +0200, Daniel Turull via lists.openembedded.org
> wrote:
> > From: Daniel Turull <daniel.turull@ericsson.com>
> >
> > Storing file generated doing packaging in WORKDIR, doesn't get cached
> > in sstate and task depending on them fail to find them.
> >
> > Store it instead in pkgdata/debugsources.
> >
> > Signed-off-by: Daniel Turull <daniel.turull@ericsson.com>
> > ---
> >  meta/lib/oe/package.py | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py index
> > 0bcc04ea54..6fcd001db1 100644
> > --- a/meta/lib/oe/package.py
> > +++ b/meta/lib/oe/package.py
> > @@ -972,7 +972,10 @@ def copydebugsources(debugsrcdir, sources, d):
> >      cpath = oe.cachedpath.CachedPath()
> >
> >      if debugsrcdir and sources:
> > -        sourcefile = d.expand("${WORKDIR}/debugsources.list")
> > +        sourcefile =
> > +d.expand("${PKGDESTWORK}/debugsources/${PN}-debugsources.list")
> > +        debugdir = os.path.dirname(sourcefile)
> > +        if not os.path.isdir(debugdir):
> > +            bb.utils.mkdirhier(debugdir)
> >          bb.utils.remove(sourcefile)
> >
> >          # filenames are null-separated - this is an artefact of the
> > previous use
> 
> This needs a little bit more explanation/justification in the commit message as this
> change is a fairly big deal. I can guess why you're doing that based on the other
> patches but the commit should stand alone with information.
> 
> Effectively by moving the file, you're taking it from being internal API to an
> external API which can be used outside of do_package. When we do that, we
> need to be really clear about it and also be 100% sure that what we're exposing is
> the right information in the right form.
> 
> There are other implications such as increasing the size of the sstate objects too.
> 
> This file is a null separated list, is that a good format to export it in for wider use?
> It is certainly different to any other file in pkgdata.
> 
> Also, the file just contains a list of all sources referenced by all files in the output.
> Do we need to include more information, such as which files have which
> dependencies?
> 
> I'm asking these questions as I'm worried about future implications for this data.
> Changing the format later is hard.
> 
> Cheers,
> 
> Richard
Richard Purdie May 27, 2025, 4:16 p.m. UTC | #3
On Tue, 2025-05-27 at 15:06 +0000, Daniel Turull via lists.openembedded.org wrote:
> Thanks Richard,
> Yes, I'm exposing the file to be able to extract the source code used
> in the other patches in the series. Since the file was there, I
> though it will be minimal changes on the current code and less
> overhead creating new tasks to extract the information that is
> already available.

Right, I totally understand why you've done this and that is fine. In
order to properly review and get this merged, we just need to consider
the other issues and properly document it. It does need to be a
conscious decision to put the data into pkgdata (and make sure we get
the format right).

> Will it make more sense to create a new file under
> "${PKGDESTWORK}/debugsources/ with the correct format and the first
> filtering done, more aligned with existing files, since the current
> file is use a temporal placeholder? Or something in the line of:
> binary_file: src1 src2 src3 ...

It is null delimited as some filenames can have spaces in. I know there
has been talk of changing our existing file format to be something less
problematic like json, the extended data is already in json. I'd
probably suggest using json for this given it is natively supported by
python.

The above format (per binary file), in json would probably make sense
though.

> I can also create a better explanation in thee commit message, in the lines of:
> The information provided in debugsource can be use to have more
> detailed information on the files used during the compilation and
> improve quality on the SPDX.
> 
> I'll try to measure the increase on the sstate.

I don't expect this to be a huge issue but some kind of number would be
useful. I'd note we did compress the extended data using zstd, just to
improve the IO impact of the files. We could consider that here too and
probably reuse that code.

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
index 0bcc04ea54..6fcd001db1 100644
--- a/meta/lib/oe/package.py
+++ b/meta/lib/oe/package.py
@@ -972,7 +972,10 @@  def copydebugsources(debugsrcdir, sources, d):
     cpath = oe.cachedpath.CachedPath()
 
     if debugsrcdir and sources:
-        sourcefile = d.expand("${WORKDIR}/debugsources.list")
+        sourcefile = d.expand("${PKGDESTWORK}/debugsources/${PN}-debugsources.list")
+        debugdir = os.path.dirname(sourcefile)
+        if not os.path.isdir(debugdir):
+            bb.utils.mkdirhier(debugdir)
         bb.utils.remove(sourcefile)
 
         # filenames are null-separated - this is an artefact of the previous use