[RFC,0/2] sbom-cve-check: Download CVE DB using BitBake fetcher

Message ID	20260309-add-sbom-cve-check-p2b-v1-0-09165cddfcf1@bootlin.com
Headers	show Return-Path: <benjamin.robin@bootlin.com> ip: 185.246.85.4, mailfrom: benjamin.robin@bootlin.com) From: Benjamin Robin <benjamin.robin@bootlin.com> Subject: [PATCH RFC 0/2] sbom-cve-check: Download CVE DB using BitBake fetcher Date: Mon, 09 Mar 2026 12:57:09 +0100 Message-Id: <20260309-add-sbom-cve-check-p2b-v1-0-09165cddfcf1@bootlin.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: openembedded-core@lists.openembedded.org Cc: ross.burton@arm.com, peter.marko@siemens.com, jpewhacker@gmail.com, olivier.benjamin@bootlin.com, antonin.godard@bootlin.com, mathieu.dubois-briand@bootlin.com, thomas.petazzoni@bootlin.com, Benjamin Robin <benjamin.robin@bootlin.com>
Series	sbom-cve-check: Download CVE DB using BitBake fetcher \| expand [RFC,0/2] sbom-cve-check: Download CVE DB using BitBake fetcher [RFC,1/2] sbom-cve-check: Download CVE DB using BitBake fetcher [RFC,2/2] sbom-cve-check: VEX class is no longer mandatory

Benjamin Robin March 9, 2026, 11:57 a.m. UTC

This series is an RFC and a follow-up to patch 6/6 ("Add class for
post-build CVE analysis"), which was previously discussed [1].
I have prepared two RFC series, this one and another, each exploring
different approaches to handling the download of CVE databases.

I explored using BitBake's internal fetcher instead of direct Git calls
for fetching CVE databases. However, I encountered two major issues:

- No proper shallow clone support: I wanted to clone the repository
  without downloading the entire history (which is very large). While
  `BB_GIT_SHALLOW` exists, it creates multiple tarballs in the download
  directory, which is inefficient for updates.

  In this series, we are going to do a full clone of the git repository,
  so this point is not going to be fixed.

- Performance overhead for CVE databases deployment: The recipes
  downloading CVE databases must copy them to the sysroot or to the
  deploy directory. This requires copying the extracted databases
  multiple times, even with hard links, which is slow due to the
  combined size (~6 GB, ~672,000 small files).

  In this series, we are using a custom deploy task that is going to
  copy the git repository using rsync directly in the final deploy
  directory, by-passing all the Bitbake logic.

Additionally, there's no built-in way to control the interval between
CVE database fetches: In this series, we are going to use AUTOREV,
which imply to query the git repositories for each build, to check if
there is a new git revision.

Moreover, this series ensures that the CVE analysis runs only when
the original SBOM changes or when the CVE databases are updated.

Upon revisiting the class and its associated recipes, I identified
several areas for improvement, which were fixed in the first commit.
This series also includes a second commit making the VEX class optional
rather than mandatory.

[1] https://lore.kernel.org/all/20260226-add-sbom-cve-check-v3-0-2e60423f4d35@bootlin.com/

Signed-off-by: Benjamin Robin <benjamin.robin@bootlin.com>
---
Benjamin Robin (2):
      sbom-cve-check: Download CVE DB using BitBake fetcher
      sbom-cve-check: VEX class is no longer mandatory

 .../sbom-cve-check-update-db.bbclass               | 87 ----------------------
 meta/classes-recipe/sbom-cve-check.bbclass         | 63 ++++++++++------
 meta/recipes-core/meta/sbom-cve-check-config.inc   |  4 +
 .../meta/sbom-cve-check-update-cvelist-native.bb   | 11 ++-
 .../recipes-core/meta/sbom-cve-check-update-db.inc | 28 +++++++
 .../meta/sbom-cve-check-update-nvd-native.bb       | 11 ++-
 6 files changed, 89 insertions(+), 115 deletions(-)
---
base-commit: ac13c78c0b1a73aa3f21a506a8709ecebfd98faf
change-id: 20260308-add-sbom-cve-check-p2b-f3d30694d3a5

Best regards,

Richard Purdie March 18, 2026, 5:45 p.m. UTC | #1

Hi,

On Mon, 2026-03-09 at 12:57 +0100, Benjamin Robin via lists.openembedded.org wrote:
> This series is an RFC and a follow-up to patch 6/6 ("Add class for
> post-build CVE analysis"), which was previously discussed [1].
> I have prepared two RFC series, this one and another, each exploring
> different approaches to handling the download of CVE databases.
> 
> I explored using BitBake's internal fetcher instead of direct Git calls
> for fetching CVE databases. However, I encountered two major issues:
> 
> - No proper shallow clone support: I wanted to clone the repository
>   without downloading the entire history (which is very large). While
>   `BB_GIT_SHALLOW` exists, it creates multiple tarballs in the download
>   directory, which is inefficient for updates.
> 
>   In this series, we are going to do a full clone of the git repository,
>   so this point is not going to be fixed.
> 
> - Performance overhead for CVE databases deployment: The recipes
>   downloading CVE databases must copy them to the sysroot or to the
>   deploy directory. This requires copying the extracted databases
>   multiple times, even with hard links, which is slow due to the
>   combined size (~6 GB, ~672,000 small files).
> 
>   In this series, we are using a custom deploy task that is going to
>   copy the git repository using rsync directly in the final deploy
>   directory, by-passing all the Bitbake logic.
> 
> Additionally, there's no built-in way to control the interval between
> CVE database fetches: In this series, we are going to use AUTOREV,
> which imply to query the git repositories for each build, to check if
> there is a new git revision.
> 
> Moreover, this series ensures that the CVE analysis runs only when
> the original SBOM changes or when the CVE databases are updated.
> 
> Upon revisiting the class and its associated recipes, I identified
> several areas for improvement, which were fixed in the first commit.
> This series also includes a second commit making the VEX class optional
> rather than mandatory.
> 
> [1] https://lore.kernel.org/all/20260226-add-sbom-cve-check-v3-0-2e60423f4d35@bootlin.com/

I've just been trying to work out where we're at with this coming up to
release and we need to get this resolved.

I feel quite strongly that we need to use the fetcher for obtaining
this data. "fetching" isn't trivial and is full of
license/auditing/sbom issues. Making any exception to that, even for
cve data tends to become problematic later.

The existing approach was only done as it was a sqlite database and we
didn't have fetcher support for such a thing. If we need to improve the
git fetcher in some way to better support this use case (e.g. shallow
clone update efficiency), that is something we can work on.

As such, I was wondering if you had never versions of these patches?

I'd note that we can't set AUTOREV by default, we'll need to specify a
revision, and document how the user can enable AUTOREV in their config
(maybe even a config fragment?). Whilst it is annoying to do that, it
is a requirement that the system doesn't touch the network outside
mirrors unless configured to.

Cheers,

Richard

Marta Rybczynska March 19, 2026, 7:29 a.m. UTC | #2

On Wed, Mar 18, 2026 at 6:45 PM Richard Purdie via lists.openembedded.org
<richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:

> Hi,
>
> On Mon, 2026-03-09 at 12:57 +0100, Benjamin Robin via
> lists.openembedded.org wrote:
> > This series is an RFC and a follow-up to patch 6/6 ("Add class for
> > post-build CVE analysis"), which was previously discussed [1].
> > I have prepared two RFC series, this one and another, each exploring
> > different approaches to handling the download of CVE databases.
> >
> > I explored using BitBake's internal fetcher instead of direct Git calls
> > for fetching CVE databases. However, I encountered two major issues:
> >
> > - No proper shallow clone support: I wanted to clone the repository
> >   without downloading the entire history (which is very large). While
> >   `BB_GIT_SHALLOW` exists, it creates multiple tarballs in the download
> >   directory, which is inefficient for updates.
> >
> >   In this series, we are going to do a full clone of the git repository,
> >   so this point is not going to be fixed.
> >
> > - Performance overhead for CVE databases deployment: The recipes
> >   downloading CVE databases must copy them to the sysroot or to the
> >   deploy directory. This requires copying the extracted databases
> >   multiple times, even with hard links, which is slow due to the
> >   combined size (~6 GB, ~672,000 small files).
> >
> >   In this series, we are using a custom deploy task that is going to
> >   copy the git repository using rsync directly in the final deploy
> >   directory, by-passing all the Bitbake logic.
> >
> > Additionally, there's no built-in way to control the interval between
> > CVE database fetches: In this series, we are going to use AUTOREV,
> > which imply to query the git repositories for each build, to check if
> > there is a new git revision.
> >
> > Moreover, this series ensures that the CVE analysis runs only when
> > the original SBOM changes or when the CVE databases are updated.
> >
> > Upon revisiting the class and its associated recipes, I identified
> > several areas for improvement, which were fixed in the first commit.
> > This series also includes a second commit making the VEX class optional
> > rather than mandatory.
> >
> > [1]
> https://lore.kernel.org/all/20260226-add-sbom-cve-check-v3-0-2e60423f4d35@bootlin.com/
>
> I've just been trying to work out where we're at with this coming up to
> release and we need to get this resolved.
>
> I feel quite strongly that we need to use the fetcher for obtaining
> this data. "fetching" isn't trivial and is full of
> license/auditing/sbom issues. Making any exception to that, even for
> cve data tends to become problematic later.
>
> The existing approach was only done as it was a sqlite database and we
> didn't have fetcher support for such a thing. If we need to improve the
> git fetcher in some way to better support this use case (e.g. shallow
> clone update efficiency), that is something we can work on.
>
> As such, I was wondering if you had never versions of these patches?
>
> I'd note that we can't set AUTOREV by default, we'll need to specify a
> revision, and document how the user can enable AUTOREV in their config
> (maybe even a config fragment?). Whilst it is annoying to do that, it
> is a requirement that the system doesn't touch the network outside
> mirrors unless configured to.
>
>
Fetching the complete git repos has a number of problems. Why not use
release
tarballs like those in  https://github.com/CVEProject/cvelistV5/releases ?
Fkie feeds also have them
https://github.com/fkie-cad/nvd-json-data-feeds/releases

CVE versions of those repositories are good for manual analysis, but a
simple
check does not need all of that.

Also, I'm worried about the size explosion with additional databases that
will be
needed in the 1-2 years time period. I also wouldn't assume all of them
will have
git mirrors.

For an analysis I think it would be better to integrate sources in a
database,
but not a relational one (like it was done with sqlite). An object database
corresponds
better to what the data contains.

Kind regards,
Marta

Richard Purdie March 19, 2026, 7:52 a.m. UTC | #3

On Thu, 2026-03-19 at 08:29 +0100, Marta Rybczynska wrote:
> 
> 
> On Wed, Mar 18, 2026 at 6:45 PM Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
> > Hi,
> > 
> > On Mon, 2026-03-09 at 12:57 +0100, Benjamin Robin via lists.openembedded.org wrote:
> > > This series is an RFC and a follow-up to patch 6/6 ("Add class for
> > > post-build CVE analysis"), which was previously discussed [1].
> > > I have prepared two RFC series, this one and another, each exploring
> > > different approaches to handling the download of CVE databases.
> > > 
> > > I explored using BitBake's internal fetcher instead of direct Git calls
> > > for fetching CVE databases. However, I encountered two major issues:
> > > 
> > > - No proper shallow clone support: I wanted to clone the repository
> > >   without downloading the entire history (which is very large). While
> > >   `BB_GIT_SHALLOW` exists, it creates multiple tarballs in the download
> > >   directory, which is inefficient for updates.
> > > 
> > >   In this series, we are going to do a full clone of the git repository,
> > >   so this point is not going to be fixed.
> > > 
> > > - Performance overhead for CVE databases deployment: The recipes
> > >   downloading CVE databases must copy them to the sysroot or to the
> > >   deploy directory. This requires copying the extracted databases
> > >   multiple times, even with hard links, which is slow due to the
> > >   combined size (~6 GB, ~672,000 small files).
> > > 
> > >   In this series, we are using a custom deploy task that is going to
> > >   copy the git repository using rsync directly in the final deploy
> > >   directory, by-passing all the Bitbake logic.
> > > 
> > > Additionally, there's no built-in way to control the interval between
> > > CVE database fetches: In this series, we are going to use AUTOREV,
> > > which imply to query the git repositories for each build, to check if
> > > there is a new git revision.
> > > 
> > > Moreover, this series ensures that the CVE analysis runs only when
> > > the original SBOM changes or when the CVE databases are updated.
> > > 
> > > Upon revisiting the class and its associated recipes, I identified
> > > several areas for improvement, which were fixed in the first commit.
> > > This series also includes a second commit making the VEX class optional
> > > rather than mandatory.
> > > 
> > > [1] https://lore.kernel.org/all/20260226-add-sbom-cve-check-v3-0-2e60423f4d35@bootlin.com/
> > 
> > I've just been trying to work out where we're at with this coming up to
> > release and we need to get this resolved.
> > 
> > I feel quite strongly that we need to use the fetcher for obtaining
> > this data. "fetching" isn't trivial and is full of
> > license/auditing/sbom issues. Making any exception to that, even for
> > cve data tends to become problematic later.
> > 
> > The existing approach was only done as it was a sqlite database and we
> > didn't have fetcher support for such a thing. If we need to improve the
> > git fetcher in some way to better support this use case (e.g. shallow
> > clone update efficiency), that is something we can work on.
> > 
> > As such, I was wondering if you had never versions of these patches?
> > 
> > I'd note that we can't set AUTOREV by default, we'll need to specify a
> > revision, and document how the user can enable AUTOREV in their config
> > (maybe even a config fragment?). Whilst it is annoying to do that, it
> > is a requirement that the system doesn't touch the network outside
> > mirrors unless configured to.
> 
> 
> Fetching the complete git repos has a number of problems. Why not use release
> tarballs like those in  https://github.com/CVEProject/cvelistV5/releases ?
> Fkie feeds also have them https://github.com/fkie-cad/nvd-json-data-feeds/releases

FWIW we can shallow clone git repos, it is just isn't optimal in how
updates are handled which was Benjamin's concern as the shallow clones
end up more like tarballs.

If we use the bitbake fetcher, it also makes it much easier to actually
use tarballs directly too, since the fetcher also supports those and it
just becomes a simple SRC_URI change.

Cheers,

Richard

Benjamin Robin March 19, 2026, 8:45 a.m. UTC | #4

Hello Richard,

On Wednesday, March 18, 2026 at 6:45 PM, Richard Purdie wrote:
> I've just been trying to work out where we're at with this coming up to
> release and we need to get this resolved.
> 
> I feel quite strongly that we need to use the fetcher for obtaining
> this data. "fetching" isn't trivial and is full of
> license/auditing/sbom issues. Making any exception to that, even for
> cve data tends to become problematic later.

I have just a slight implementation "detail" if we are using BitBake
fetcher. What is the license that we should use for the sources?
How to declare that in the recipes?

Because the license of the repositories:
 - https://github.com/CVEProject/cvelistV5 : Their is none
 - https://github.com/fkie-cad/nvd-json-data-feeds/tree/main/LICENSES
   It looks like custom license.

cve-update-db-native.bb is specifying MIT but this is kind of a lie.
I have done the same on my recipes for now...

> The existing approach was only done as it was a sqlite database and we
> didn't have fetcher support for such a thing.

The recipes used to download the CVE databases for the cve-check class
are downloading tarballs. Yes these recipes are going to create a sqlite
database from that. But these recipes implements there own fetcher to
simply download a tarball.
That is why I thought I could implement my own fetcher, which is way
simpler than the update_db_file() in cve-update-db-native.bb which is
quite complex.

> If we need to improve the
> git fetcher in some way to better support this use case (e.g. shallow
> clone update efficiency), that is something we can work on.

Well that was my plan, but for the LTS release this was going to be too
short. So in the meantime I preferred to used a custom fetcher which
was implemented in the other RFC (or in the v4 of the original series).

> As such, I was wondering if you had never versions of these patches?

I sent 2 RFCs, one using my own fetcher, and one using the internal
fetcher (this series). And I sent a v4 of the original series.

> I'd note that we can't set AUTOREV by default, we'll need to specify a
> revision, and document how the user can enable AUTOREV in their config
> (maybe even a config fragment?). Whilst it is annoying to do that, it
> is a requirement that the system doesn't touch the network outside
> mirrors unless configured to.

If we can't use AUTOREV by default, which I understand, a config fragment
is the way to go (I think), with sane default to enable sbom-cve-check.
If the user want specific configuration, they can create their own
configuration. The config fragment would set:
 - IMAGE_CLASSES += "sbom-cve-check"
 - SRCREV:pn-sbom-cve-check-update-nvd-native = "${AUTOREV}"
 - SRCREV:pn-sbom-cve-check-update-cvelist-native = "${AUTOREV}"
 - SPDX_INCLUDE_VEX = "all"
 - SPDX_INCLUDE_COMPILED_SOURCES:pn-linux-yocto = "1"

Also, what do you think about the deployment of the CVE databases
done using rsync? Do you have a better idea?

Marta Rybczynska March 19, 2026, 8:58 a.m. UTC | #5

On Thu, Mar 19, 2026 at 9:45 AM Benjamin Robin via lists.openembedded.org
<benjamin.robin=bootlin.com@lists.openembedded.org> wrote:

> Hello Richard,
>
> On Wednesday, March 18, 2026 at 6:45 PM, Richard Purdie wrote:
> > I've just been trying to work out where we're at with this coming up to
> > release and we need to get this resolved.
> >
> > I feel quite strongly that we need to use the fetcher for obtaining
> > this data. "fetching" isn't trivial and is full of
> > license/auditing/sbom issues. Making any exception to that, even for
> > cve data tends to become problematic later.
>
> I have just a slight implementation "detail" if we are using BitBake
> fetcher. What is the license that we should use for the sources?
> How to declare that in the recipes?
>
> Because the license of the repositories:
>  - https://github.com/CVEProject/cvelistV5 : Their is none
>  - https://github.com/fkie-cad/nvd-json-data-feeds/tree/main/LICENSES
>    It looks like custom license.
>

The CVE project repo does not have a licence included, but it is covered by
https://www.cve.org/legal/termsofuse (the usage part). It is basically MIT.

NVD has the specific,  licence, the one that is in the repo. A warning on
the
needed disclosure sentence in all documentation.

>
> cve-update-db-native.bb is specifying MIT but this is kind of a lie.
> I have done the same on my recipes for now...
>
> > The existing approach was only done as it was a sqlite database and we
> > didn't have fetcher support for such a thing.
>
> The recipes used to download the CVE databases for the cve-check class
> are downloading tarballs. Yes these recipes are going to create a sqlite
> database from that. But these recipes implements there own fetcher to
> simply download a tarball.
> That is why I thought I could implement my own fetcher, which is way
> simpler than the update_db_file() in cve-update-db-native.bb which is
> quite complex.
>

They implement the fetcher to feed into sqlite. Which was an error to use,
in my opinion.


>
> > If we need to improve the
> > git fetcher in some way to better support this use case (e.g. shallow
> > clone update efficiency), that is something we can work on.
>
> Well that was my plan, but for the LTS release this was going to be too
> short. So in the meantime I preferred to used a custom fetcher which
> was implemented in the other RFC (or in the v4 of the original series).
>
> > As such, I was wondering if you had never versions of these patches?
>
> I sent 2 RFCs, one using my own fetcher, and one using the internal
> fetcher (this series). And I sent a v4 of the original series.
>
> > I'd note that we can't set AUTOREV by default, we'll need to specify a
> > revision, and document how the user can enable AUTOREV in their config
> > (maybe even a config fragment?). Whilst it is annoying to do that, it
> > is a requirement that the system doesn't touch the network outside
> > mirrors unless configured to.
>
> If we can't use AUTOREV by default, which I understand, a config fragment
> is the way to go (I think), with sane default to enable sbom-cve-check.
> If the user want specific configuration, they can create their own
> configuration. The config fragment would set:
>  - IMAGE_CLASSES += "sbom-cve-check"
>  - SRCREV:pn-sbom-cve-check-update-nvd-native = "${AUTOREV}"
>  - SRCREV:pn-sbom-cve-check-update-cvelist-native = "${AUTOREV}"
>  - SPDX_INCLUDE_VEX = "all"
>  - SPDX_INCLUDE_COMPILED_SOURCES:pn-linux-yocto = "1"
>
> Also, what do you think about the deployment of the CVE databases
> done using rsync? Do you have a better idea?
>

AUTOREV isn't great here because it will re-fetch for each build. So if
you're
building multiple images or platforms (in CI or so), you will get
potentially different
results. cve-check has a set of variable to handle such use cases. You pin
to one specific release and do the whole checking with one single common
version.

Regards,
Marta

Benjamin Robin March 19, 2026, 9:07 a.m. UTC | #6

Hello Marta and Richard,

On Thursday, March 19, 2026 at 8:52 AM, Richard Purdie wrote:
> On Thu, 2026-03-19 at 08:29 +0100, Marta Rybczynska wrote:

> > Fetching the complete git repos has a number of problems. Why not use release
> > tarballs like those in  https://github.com/CVEProject/cvelistV5/releases ?
> > Fkie feeds also have them https://github.com/fkie-cad/nvd-json-data-feeds/releases

Here the reasons:
 - Fetching the tarballs is quite complex to implement. This was done
   in cve-update-db-native.bb. To do that we must use a custom fetcher
   because we cannot expect the user to manually update the URL each
   time a new CVE analysis needs to be done.
 - Also, sbom-cve-check is expecting a git repository. It does not
   support a simple extraction of the CVE database.
 - sbom-cve-check also expects one JSON file per CVE, which is not
   the case with release tarball for FKIE. This is a simple compressed
   JSON file.

> FWIW we can shallow clone git repos, it is just isn't optimal in how
> updates are handled which was Benjamin's concern as the shallow clones
> end up more like tarballs.
> 
> If we use the bitbake fetcher, it also makes it much easier to actually
> use tarballs directly too, since the fetcher also supports those and it
> just becomes a simple SRC_URI change.

If we are using BitBake fetcher, with tarballs, the download directory
is going to be filled with a lot of version of the CVE databases.
This is really inefficient.

For cvelistV5 the release zip file is the roughly the same size that
the git shallow clone.

For https://github.com/fkie-cad/nvd-json-data-feeds/releases
this is not even an option to use tarball since sbom-cve-check is
not compatible with this format.

Benjamin Robin March 19, 2026, 9:48 a.m. UTC | #7

Hello Marta,

On Thursday, March 19, 2026 at 9:58 AM, Marta Rybczynska wrote:
> On Thu, Mar 19, 2026 at 9:45 AM Benjamin Robin via lists.openembedded.org
> <benjamin.robin=bootlin.com@lists.openembedded.org> wrote:

> > I have just a slight implementation "detail" if we are using BitBake
> > fetcher. What is the license that we should use for the sources?
> > How to declare that in the recipes?
> >
> > Because the license of the repositories:
> >  - https://github.com/CVEProject/cvelistV5 : Their is none
> >  - https://github.com/fkie-cad/nvd-json-data-feeds/tree/main/LICENSES
> >    It looks like custom license.

> The CVE project repo does not have a licence included, but it is covered by
> https://www.cve.org/legal/termsofuse (the usage part). It is basically MIT.
> 
> NVD has the specific,  licence, the one that is in the repo. A warning on
> the
> needed disclosure sentence in all documentation.

So for you, it is fine to declare that the CVE databases are MIT?

> > cve-update-db-native.bb is specifying MIT but this is kind of a lie.
> > I have done the same on my recipes for now...
> >
> > > The existing approach was only done as it was a sqlite database and we
> > > didn't have fetcher support for such a thing.
> >
> > The recipes used to download the CVE databases for the cve-check class
> > are downloading tarballs. Yes these recipes are going to create a sqlite
> > database from that. But these recipes implements there own fetcher to
> > simply download a tarball.
> > That is why I thought I could implement my own fetcher, which is way
> > simpler than the update_db_file() in cve-update-db-native.bb which is
> > quite complex.
> >
> 
> They implement the fetcher to feed into sqlite. Which was an error to use,
> in my opinion.

Well, I understand why they did that. It makes a lot of sense. But it has
a lot of limitation, that is why we developed sbom-cve-check.


> AUTOREV isn't great here because it will re-fetch for each build. So if
> you're
> building multiple images or platforms (in CI or so), you will get
> potentially different
> results. cve-check has a set of variable to handle such use cases. You pin
> to one specific release and do the whole checking with one single common
> version.

Yes, that is why I initially pushed to use my custom fetcher that is
doing a git pull / shallow clone. With this fetcher I have a full control
on the update period.

But if we want to use BitBake fetcher, an user could pin to a specific
version instead of using AUTOREV. But the user needs to to that manually.

Benjamin Robin March 19, 2026, 9:57 a.m. UTC | #8

Hello Marta,

On Thursday, March 19, 2026 at 8:29 AM, Marta Rybczynska wrote:
> Fetching the complete git repos has a number of problems. Why not use
> release
> tarballs like those in  https://github.com/CVEProject/cvelistV5/releases ?
> Fkie feeds also have them
> https://github.com/fkie-cad/nvd-json-data-feeds/releases

sbom-cve-check is not compatible with the tarball release of FKIE. The
CVE database is not in the same format.
For cvelistV5, the shallow git clone is globally the same speed and same
size that the release zip file.

Why fetching git repo has problem? I only see advantages. The update is
quick. We can easily know with which version the analysis was done:
This is the git version.

> CVE versions of those repositories are good for manual analysis, but a
> simple
> check does not need all of that.

I don't understand your point.

> Also, I'm worried about the size explosion with additional databases that
> will be
> needed in the 1-2 years time period. I also wouldn't assume all of them
> will have
> git mirrors.

The git shallow clone of the git repository is the same size that the
tarball, which is logical. I don't understand your point.

> For an analysis I think it would be better to integrate sources in a
> database,
> but not a relational one (like it was done with sqlite). An object database
> corresponds
> better to what the data contains.

sbom-cve-check was not designed like that. We did not want to take this
approach which generates a lot of limitation.

Marta Rybczynska March 19, 2026, noon UTC | #9

On Thu, Mar 19, 2026 at 10:48 AM Benjamin Robin <benjamin.robin@bootlin.com>
wrote:

> Hello Marta,
>
> On Thursday, March 19, 2026 at 9:58 AM, Marta Rybczynska wrote:
> > On Thu, Mar 19, 2026 at 9:45 AM Benjamin Robin via
> lists.openembedded.org
> > <benjamin.robin=bootlin.com@lists.openembedded.org> wrote:
>
> > > I have just a slight implementation "detail" if we are using BitBake
> > > fetcher. What is the license that we should use for the sources?
> > > How to declare that in the recipes?
> > >
> > > Because the license of the repositories:
> > >  - https://github.com/CVEProject/cvelistV5 : Their is none
> > >  - https://github.com/fkie-cad/nvd-json-data-feeds/tree/main/LICENSES
> > >    It looks like custom license.
>
> > The CVE project repo does not have a licence included, but it is covered
> by
> > https://www.cve.org/legal/termsofuse (the usage part). It is basically
> MIT.
> >
> > NVD has the specific,  licence, the one that is in the repo. A warning on
> > the
> > needed disclosure sentence in all documentation.
>
> So for you, it is fine to declare that the CVE databases are MIT?
>

CVE database is MIT
NVD (so also FKIE) is custom


>
>
> > AUTOREV isn't great here because it will re-fetch for each build. So if
> > you're
> > building multiple images or platforms (in CI or so), you will get
> > potentially different
> > results. cve-check has a set of variable to handle such use cases. You
> pin
> > to one specific release and do the whole checking with one single common
> > version.
>
> Yes, that is why I initially pushed to use my custom fetcher that is
> doing a git pull / shallow clone. With this fetcher I have a full control
> on the update period.
>
> But if we want to use BitBake fetcher, an user could pin to a specific
> version instead of using AUTOREV. But the user needs to to that manually.
>
>
I agree with Richard that using a git fetcher (or other existing fetcher)
is a better
idea than developing a custom fetcher.

Regards,
Marta

Benjamin Robin March 19, 2026, 12:03 p.m. UTC | #10

On Thursday, March 19, 2026 at 1:00 PM, Marta Rybczynska wrote:
> On Thu, Mar 19, 2026 at 10:48 AM Benjamin Robin <benjamin.robin@bootlin.com>
> wrote:
> 
> > Hello Marta,
> >
> > On Thursday, March 19, 2026 at 9:58 AM, Marta Rybczynska wrote:
> > > On Thu, Mar 19, 2026 at 9:45 AM Benjamin Robin via
> > lists.openembedded.org
> > > <benjamin.robin=bootlin.com@lists.openembedded.org> wrote:
> >
> > > > I have just a slight implementation "detail" if we are using BitBake
> > > > fetcher. What is the license that we should use for the sources?
> > > > How to declare that in the recipes?
> > > >
> > > > Because the license of the repositories:
> > > >  - https://github.com/CVEProject/cvelistV5 : Their is none
> > > >  - https://github.com/fkie-cad/nvd-json-data-feeds/tree/main/LICENSES
> > > >    It looks like custom license.
> >
> > > The CVE project repo does not have a licence included, but it is covered
> > by
> > > https://www.cve.org/legal/termsofuse (the usage part). It is basically
> > MIT.
> > >
> > > NVD has the specific,  licence, the one that is in the repo. A warning on
> > > the
> > > needed disclosure sentence in all documentation.
> >
> > So for you, it is fine to declare that the CVE databases are MIT?
> >
> 
> CVE database is MIT
> NVD (so also FKIE) is custom

NVD license is apparently "cve-tou" which is available in Yocto.

> >
> > > AUTOREV isn't great here because it will re-fetch for each build. So if
> > > you're
> > > building multiple images or platforms (in CI or so), you will get
> > > potentially different
> > > results. cve-check has a set of variable to handle such use cases. You
> > pin
> > > to one specific release and do the whole checking with one single common
> > > version.
> >
> > Yes, that is why I initially pushed to use my custom fetcher that is
> > doing a git pull / shallow clone. With this fetcher I have a full control
> > on the update period.
> >
> > But if we want to use BitBake fetcher, an user could pin to a specific
> > version instead of using AUTOREV. But the user needs to to that manually.
> >
> >
> I agree with Richard that using a git fetcher (or other existing fetcher)
> is a better
> idea than developing a custom fetcher.

I am preparing a v5 of the patch series based on this RFC series, which is
going to use the BitBake fetcher.

[RFC,0/2] sbom-cve-check: Download CVE DB using BitBake fetcher

Message

Comments