mbox series

[RFC,0/1] spdx: Add software file externalRef support

Message ID 20251110171337.754568-1-fbberton@gmail.com
Headers show
Series spdx: Add software file externalRef support | expand

Message

Fabio Berton Nov. 10, 2025, 5:13 p.m. UTC
Hi all,

When starting to test SPDX 3.0 in our projects, we noticed that it would
be necessary to have more information for files fetched via the
'file://' protocol, such as the full path of the file or a URL with git
information.

Our first idea was to use 'downloadLocation', but what I understand is
that this is a package property, and files fetched from the layer are
'software_File' type. Looking at the SPDX spec, it appears we could use
the 'ExternalRef' for this purpose.

The idea is to have two options to add this information: one to add the
full path of a file, and another to add the git information
'git+https://host/repo@commit#path/to/file'. The information is added as
an 'externalRef' and can be configured using these types:
https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Vocabularies/ExternalRefType/

When using the 'path' option, something like this is added:
```
"externalRef": [
          {
            "type": "ExternalRef",
            "externalRefType": "sourceArtifact",
           "locator": [
              "/home/user/src/openembedded-core/meta/recipes-core/busybox/files/syslog"
            ]
          }
        ],
```
This option is non-reproducible, if the build path changes, the SPDX
will be different.

And with the 'git' option:
```
"externalRef": [
          {
            "type": "ExternalRef",
            "externalRefType": "sourceArtifact",
            "locator": [
              "git+https://git.openembedded.org/openembedded-core@ac5d9579a0db63b54bbebb5015de2ae860a462bf#meta/recipes-core/busybox/files/syslog"
            ]
          }
        ],
```

The implementation is not completely finished, but since there is
already a thread on this subject,
https://lists.openembedded.org/g/openembedded-core/topic/thoughts_on_spdx_for_files/116135395,
I wanted to share my work and get opinions on how to improve this
implementation.

My questions are:

Is the 'externalRef' the right way to add the information in the spdx
file?

I'm using the 'choices' type, but this only works when inheriting
typecheck.bbclass, and this bbclass is not inherited when using
OE-Core with 'nodistro'. Can this 'choices' type be used here?

I still need to find a way to cache Git layer information to avoid
calling the 'oe.buildcfg' function every time. Maybe it would be
possible to use something like this:
https://git.openembedded.org/openembedded-core/tree/meta/classes/metadata_scm.bbclass
to get information at parsing time. However, this
information is only needed when using SPDX_FILE_LOCATION with the git
option, and for all layers. Any idea here?

For the git option, we need to get a git remote, but there can be more
than one remote per layer, so we need a way to configure these remotes.
In this first implementation, I'm assuming that all layers use the same
remote, and the remote name can be configured, which fits our current
use case.

Should I add a variable like 'SPDX_FILE_LOCATION_GIT_REMOTE_<layername>
= "remote_name"' to set a specific remote for each layer? Would setting
the git remote be sufficient to cover most cases?

Any feedback or suggestions would be appreciated.

Best regards,
Fabio

Fabio Berton (1):
  spdx: Add software file externalRef support

 meta/classes/create-spdx-3.0.bbclass | 24 +++++++
 meta/lib/oe/sbom30.py                | 14 ++++-
 meta/lib/oe/spdx30_tasks.py          | 93 ++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+), 1 deletion(-)

Comments

Daniel Wagenknecht Nov. 12, 2025, 4:59 p.m. UTC | #1
Hello Fabio,

thanks for your comments and patch!

On Mon, 2025-11-10 at 17:13 +0000, Fabio Berton wrote:
> Our first idea was to use 'downloadLocation', but what I understand is
> that this is a package property, and files fetched from the layer are
> 'software_File' type. Looking at the SPDX spec, it appears we could use
> the 'ExternalRef' for this purpose.

I'm not to familiar with the SPDX spec yet, but adding individual files
entries as `ExternalRef` instead of `downloadLocation` to a recipes
spdx sounds reasonable.

I think in the long term adding a `SPDXRef-Layer-xyz` entry per layer
with a `downloadLocation` pointing to the subpath of the layer inside a
git repo. I'm not quite shure if it would be possible to formulate a
dependency on a file contained within a different SPDXRef, e.g.
```
SPDXRef-Layer-xyz:recipes-core/base-files/base-files/fstab
```
or if we'd have to create a SPDXRef Item for each file within a layer
in order to reference it properly. That would make it even more
verbose.

The approach of having a layer as an independent SPDXRef would mean
getting the git revision etc. for that layer would run only once per
build and not per `file://` entry in SRC_URI.
> 
> The idea is to have two options to add this information: one to add the
> full path of a file, and another to add the git information

IMO the full path to the file is unneeded information, if the file is
solely available locally a `NOASSERTION` would be appropriate.

> 
> Should I add a variable like 'SPDX_FILE_LOCATION_GIT_REMOTE_<layername>
> = "remote_name"' to set a specific remote for each layer? Would setting
> the git remote be sufficient to cover most cases?
In my experimentation I removed the per-layer setting again because
tracking the `vardeps` for the `do_create_spdx` get's more complicated
with per-layer variables.
> 
Sincerely
Daniel Wagenknecht
Fabio Berton Nov. 17, 2025, 10:57 a.m. UTC | #2
On 11/12/25 16:59, Daniel Wagenknecht wrote:
> Hello Fabio,
> 
> thanks for your comments and patch!
> 
> On Mon, 2025-11-10 at 17:13 +0000, Fabio Berton wrote:
>> Our first idea was to use 'downloadLocation', but what I understand is
>> that this is a package property, and files fetched from the layer are
>> 'software_File' type. Looking at the SPDX spec, it appears we could use
>> the 'ExternalRef' for this purpose.
> 
> I'm not to familiar with the SPDX spec yet, but adding individual files
> entries as `ExternalRef` instead of `downloadLocation` to a recipes
> spdx sounds reasonable.
> 
> I think in the long term adding a `SPDXRef-Layer-xyz` entry per layer
> with a `downloadLocation` pointing to the subpath of the layer inside a
> git repo. I'm not quite shure if it would be possible to formulate a
> dependency on a file contained within a different SPDXRef, e.g.
> ```
> SPDXRef-Layer-xyz:recipes-core/base-files/base-files/fstab
> ```
> or if we'd have to create a SPDXRef Item for each file within a layer
> in order to reference it properly. That would make it even more
> verbose.

Hi Daniel,

Yes, we should have a way to get Git information at parser time to avoid calling for every `file://`. But I don't know exactly how to do this, because if we need to add a variable in all layers, and of course, we can't do this, we still need a fallback if the variable doesn't exist. In my case, we don't use OE-Core from https://git.openembedded.org/, we have all layers in an internal infrastructure, so we need to change all variables to point to our fork. My idea to use functions from 'oe.buildcfg' is to get Git information from the layer and not from variables, it doesn't matter if it's a fork or not. But I didn't cover the case where different remotes are used. I know that when using `repo` to manage Git repositories, it's common to use different remotes, e.g., https://github.com/Freescale/fsl-community-bsp-platform/blob/scarthgap/default.xml. Honestly, I don't know if adding a variable to set the "downloadLocation" will be better or not.

> 
> The approach of having a layer as an independent SPDXRef would mean
> getting the git revision etc. for that layer would run only once per
> build and not per `file://` entry in SRC_URI.
>>
>> The idea is to have two options to add this information: one to add the
>> full path of a file, and another to add the git information
> 
> IMO the full path to the file is unneeded information, if the file is
> solely available locally a `NOASSERTION` would be appropriate.

The 'path' option is to not use the 'Git' information, e.g., when using a tarball and not a Git repo. The 'locator' will be '/home/user/src/openembedded-core/meta/recipes-core/busybox/files/syslog' instead of 'git+[https://git.openembedded.org/openembedded-core@ac5d9579a0db63b54bbebb5015de2ae860a462bf#meta/recipes-core/busybox/files/syslog](https://git.openembedded.org/openembedded-core@ac5d9579a0db63b54bbebb5015de2ae860a462bf#meta/recipes-core/busybox/files/syslog)'

> 
>>
>> Should I add a variable like 'SPDX_FILE_LOCATION_GIT_REMOTE_<layername>
>> = "remote_name"' to set a specific remote for each layer? Would setting
>> the git remote be sufficient to cover most cases?
> In my experimentation I removed the per-layer setting again because
> tracking the `vardeps` for the `do_create_spdx` get's more complicated
> with per-layer variables.

Uhmm, good point, I didn't think about `vardeps`.

>>
> Sincerely
> Daniel Wagenknecht