| Message ID | 20241220112613.22647-1-stefan.herbrechtsmeier-oss@weidmueller.com |
|---|---|
| Headers | show |
| Series | Concept for tightly coupled package manager (Node.js, Go, Rust) | expand |
On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > > The patch series improves the fetcher support for tightly coupled > package manager (npm, go and cargo). It adds support for embedded > dependency fetcher via a common dependency mixin. The patch series > reworks the npm-shrinkwrap.json (package-lock.json) support and adds a > fetcher for go.sum and cargo.lock files. The dependency mixin contains > two stages. The first stage locates a local specification file or > fetches an archive or git repository with a specification file. The > second stage resolves the dependency URLs from the specification file > and fetches the dependencies. > > SRC_URI = "<type>://npm-shrinkwrap.json" > SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" > SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" > SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" > > Additionally, the patch series reworks the npm fetcher to work without a > npm binary and external package repository. It adds support for a common > dependency name and version schema to integrate the dependencies into > the SBOM. This certainly sounds promising, thanks for working on it. It will take me a bit of time to digest the changes. A while back I was asked to document the constraints the fetchers operate within and I documented this here: https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README Would you be able to check if this work meets the criteria set out there and if not, what the differences are? Thanks, Richard
On Mon, 23 Dec 2024 at 11:03, Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote: > Would you be able to check if this work meets the criteria set out > there and if not, what the differences are? I'd also add that this would benefit from a demonstration with one of the real recipes go/rust recipes in oe-core: basically it would be good to push a branch of poky somewhere public, and provide instructions on how to see the new fetchers in action, and observe their benefits. Alex
Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org: > On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> >> >> The patch series improves the fetcher support for tightly coupled >> package manager (npm, go and cargo). It adds support for embedded >> dependency fetcher via a common dependency mixin. The patch series >> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a >> fetcher for go.sum and cargo.lock files. The dependency mixin contains >> two stages. The first stage locates a local specification file or >> fetches an archive or git repository with a specification file. The >> second stage resolves the dependency URLs from the specification file >> and fetches the dependencies. >> >> SRC_URI = "<type>://npm-shrinkwrap.json" >> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" >> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" >> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" >> >> Additionally, the patch series reworks the npm fetcher to work without a >> npm binary and external package repository. It adds support for a common >> dependency name and version schema to integrate the dependencies into >> the SBOM. > This certainly sounds promising, thanks for working on it. It will take > me a bit of time to digest the changes. > > A while back I was asked to document the constraints the fetchers > operate within and I documented this here: > > https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README > > Would you be able to check if this work meets the criteria set out > there and if not, what the differences are? The fetchers inheritance existing fetchers and reuse existent functionality. The npm fetcher inheritance the wget fetcher and only override the urldata_init and latest_versionstring function. The reworked urldata_init function preprocess the url and works without internet access. The dependency mixin is inspired by the gitsm fetcher. The cargolock, gosum and npmsw fetcher inherit the local, wget and git fetcher. They forward the function calls to the parent class, process the dependency specification file and handle the dependencies if needed. Thereby the content of the specification file is translated into source urls for existing fetchers and saved inside a proxy object. The user has to call the download function to download the main source with specification file and all dependencies. The dependencies are downloaded via existing fetchers. Because of the reuse of existing fetchers all criteria should be satisfied by the new fetcher or need to be fixed inside the existing fetchers.
On Thu, 2025-01-02 at 09:55 +0100, Stefan Herbrechtsmeier wrote: > Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org: > > On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: > > > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > > > > > > The patch series improves the fetcher support for tightly coupled > > > package manager (npm, go and cargo). It adds support for embedded > > > dependency fetcher via a common dependency mixin. The patch series > > > reworks the npm-shrinkwrap.json (package-lock.json) support and adds a > > > fetcher for go.sum and cargo.lock files. The dependency mixin contains > > > two stages. The first stage locates a local specification file or > > > fetches an archive or git repository with a specification file. The > > > second stage resolves the dependency URLs from the specification file > > > and fetches the dependencies. > > > > > > SRC_URI = "<type>://npm-shrinkwrap.json" > > > SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" > > > SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" > > > SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" > > > > > > Additionally, the patch series reworks the npm fetcher to work without a > > > npm binary and external package repository. It adds support for a common > > > dependency name and version schema to integrate the dependencies into > > > the SBOM. > > This certainly sounds promising, thanks for working on it. It will take > > me a bit of time to digest the changes. > > > > A while back I was asked to document the constraints the fetchers > > operate within and I documented this here: > > > > https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README > > > > Would you be able to check if this work meets the criteria set out > > there and if not, what the differences are? > > The fetchers inheritance existing fetchers and reuse existent > functionality. The npm fetcher inheritance the wget fetcher and only > override the urldata_init and latest_versionstring function. The > reworked urldata_init function preprocess the url and works without > internet access. The dependency mixin is inspired by the gitsm fetcher. > The cargolock, gosum and npmsw fetcher inherit the local, wget and git > fetcher. They forward the function calls to the parent class, process > the dependency specification file and handle the dependencies if needed. > Thereby the content of the specification file is translated into source > urls for existing fetchers and saved inside a proxy object. The user has > to call the download function to download the main source with > specification file and all dependencies. The dependencies are downloaded > via existing fetchers. > > Because of the reuse of existing fetchers all criteria should be > satisfied by the new fetcher or need to be fixed inside the existing > fetchers. Even if you forward everything to the parent API, there are ways you could use it such that the parent class meets the criteria but the dervived one does not. I'm trying to aid the review process by asking those questions, it will just take longer if I have to work this out myself. The other question I'm wondering about is compatibility and how we change the way urls are working. Do these changes need a flag day where recipes need to be updated to match? If so, how do we best handle that? Is the user going to get errors they can easily fix or how is that going to work? Cheers, Richard
Am 02.01.2025 um 10:32 schrieb Richard Purdie: > On Thu, 2025-01-02 at 09:55 +0100, Stefan Herbrechtsmeier wrote: >> Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org: >>> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >>>> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> >>>> >>>> The patch series improves the fetcher support for tightly coupled >>>> package manager (npm, go and cargo). It adds support for embedded >>>> dependency fetcher via a common dependency mixin. The patch series >>>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a >>>> fetcher for go.sum and cargo.lock files. The dependency mixin contains >>>> two stages. The first stage locates a local specification file or >>>> fetches an archive or git repository with a specification file. The >>>> second stage resolves the dependency URLs from the specification file >>>> and fetches the dependencies. >>>> >>>> SRC_URI = "<type>://npm-shrinkwrap.json" >>>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" >>>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" >>>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" >>>> >>>> Additionally, the patch series reworks the npm fetcher to work without a >>>> npm binary and external package repository. It adds support for a common >>>> dependency name and version schema to integrate the dependencies into >>>> the SBOM. >>> This certainly sounds promising, thanks for working on it. It will take >>> me a bit of time to digest the changes. >>> >>> A while back I was asked to document the constraints the fetchers >>> operate within and I documented this here: >>> >>> https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README >>> >>> Would you be able to check if this work meets the criteria set out >>> there and if not, what the differences are? >> The fetchers inheritance existing fetchers and reuse existent >> functionality. The npm fetcher inheritance the wget fetcher and only >> override the urldata_init and latest_versionstring function. The >> reworked urldata_init function preprocess the url and works without >> internet access. The dependency mixin is inspired by the gitsm fetcher. >> The cargolock, gosum and npmsw fetcher inherit the local, wget and git >> fetcher. They forward the function calls to the parent class, process >> the dependency specification file and handle the dependencies if needed. >> Thereby the content of the specification file is translated into source >> urls for existing fetchers and saved inside a proxy object. The user has >> to call the download function to download the main source with >> specification file and all dependencies. The dependencies are downloaded >> via existing fetchers. >> >> Because of the reuse of existing fetchers all criteria should be >> satisfied by the new fetcher or need to be fixed inside the existing >> fetchers. > Even if you forward everything to the parent API, there are ways you > could use it such that the parent class meets the criteria but the > dervived one does not. I'm trying to aid the review process by asking > those questions, it will just take longer if I have to work this out > myself. I don't see a reason why the fetchers shouldn't meet the constraints. The fetchers were designed to fulfill the constraints. > The other question I'm wondering about is compatibility and how we > change the way urls are working. Do these changes need a flag day where > recipes need to be updated to match? If so, how do we best handle that? > Is the user going to get errors they can easily fix or how is that > going to work? The fetcher should be backward compatible for recipes. I add warnings which propose the desired changes: Parameter 'package' in '<url>' is deprecated. Please use 'dn' parameter instead. Parameter 'version' in '<url>' is deprecated. Please use 'dv' parameter instead. If we have an agreement about a common schema for repository host, package name and package version I will update the crate and gomod fetcher in an backward compatible way too. Hopefully most users will switch to the new fetcher instead of updating there own tools to generate recipes and include files. The only desired incompatible changes are the remove of the old npm-shrinkwrap.json format and the support for "latest" version in the npm fetcher. All upstream supported npm versions supports the new format and the user could update the package lock file via npm. Like AUTOREV the "latest" version for npm leads to many problems and the usability should be very low. I leave the function which are used by oe-core in the npm and npmsw fetcher. They should be moved into oe-core and afterwards removed from the fetcher.
Am 02.01.2025 um 10:32 schrieb Richard Purdie: > On Thu, 2025-01-02 at 09:55 +0100, Stefan Herbrechtsmeier wrote: >> Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org: >>> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >>>> From: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com> >>>> >>>> The patch series improves the fetcher support for tightly coupled >>>> package manager (npm, go and cargo). It adds support for embedded >>>> dependency fetcher via a common dependency mixin. The patch series >>>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a >>>> fetcher for go.sum and cargo.lock files. The dependency mixin contains >>>> two stages. The first stage locates a local specification file or >>>> fetches an archive or git repository with a specification file. The >>>> second stage resolves the dependency URLs from the specification file >>>> and fetches the dependencies. >>>> >>>> SRC_URI = "<type>://npm-shrinkwrap.json" >>>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" >>>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" >>>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" >>>> >>>> Additionally, the patch series reworks the npm fetcher to work without a >>>> npm binary and external package repository. It adds support for a common >>>> dependency name and version schema to integrate the dependencies into >>>> the SBOM. [SNIP] > I'm trying to aid the review process by asking > those questions, it will just take longer if I have to work this out > myself. Maybe we are able to discuss some design decision without code to simplify the review: The dependency fetcher need to know the path of the dependency specification file. In case of the local fetcher the path is the uri. In case of the git fetcher the path depends on the subdir and destsuffix parameter. In case of the wget the path is unknown. This series requires the parameters striplevel=1 and subdir=${BP} to work. Additionally it doesn't support specification files inside sub directories. Therefore I plan to add a srcdir parameter. Should this parameter be mandatory for the wget fetcher or should the fetcher use the PN or S variable to determine a default value?
On Thu, 2025-01-02 at 14:50 +0100, Stefan Herbrechtsmeier wrote: > > > > I'm trying to aid the review process by asking > > those questions, it will just take longer if I have to work this > > out > > myself. > > > Maybe we are able to discuss some design decision without code to > simplify the review: > > The dependency fetcher need to know the path of the dependency > specification file. In case of the local fetcher the path is the uri. > In case of the git fetcher the path depends on the subdir and > destsuffix parameter. In case of the wget the path is unknown. This > series requires the parameters striplevel=1 and subdir=${BP} to work. > Additionally it doesn't support specification files inside sub > directories. Therefore I plan to add a srcdir parameter. Should this > parameter be mandatory for the wget fetcher or should the fetcher use > the PN or S variable to determine a default value? do_fetch never touches ${S}, it would only touch ${DL_DIR}. For that reason, ${S} is passed as a parameter to do_unpack and is only referenced at that time. wget shouldn't need more information to have a default it already has in the current code so something isn't adding up. I'm still not sure why you'd need both a subdir and srcdir but I think I need to think about this more deeply, FWIW I'm technically still on vacation. Cheers, Richard
Am 02.01.2025 um 15:07 schrieb Richard Purdie: > On Thu, 2025-01-02 at 14:50 +0100, Stefan Herbrechtsmeier wrote: >>> I'm trying to aid the review process by asking >>> those questions, it will just take longer if I have to work this >>> out >>> myself. >>> >> Maybe we are able to discuss some design decision without code to >> simplify the review: >> >> The dependency fetcher need to know the path of the dependency >> specification file. In case of the local fetcher the path is the uri. >> In case of the git fetcher the path depends on the subdir and >> destsuffix parameter. In case of the wget the path is unknown. This >> series requires the parameters striplevel=1 and subdir=${BP} to work. >> Additionally it doesn't support specification files inside sub >> directories. Therefore I plan to add a srcdir parameter. Should this >> parameter be mandatory for the wget fetcher or should the fetcher use >> the PN or S variable to determine a default value? > do_fetch never touches ${S}, it would only touch ${DL_DIR}. For that > reason, ${S} is passed as a parameter to do_unpack and is only > referenced at that time. The do_unpack use the ${UNPACKDIR} and not the ${S}. The ${S} points to the main folder of the source and in most cases contains the main folder of the archive. > wget shouldn't need more information to have a default it already has > in the current code so something isn't adding up. The additional information are needed by the dependency resolution not the wget. The dependency resolution need to temporary unpack the archive to read the dependency specification file inside the archive. > I'm still not sure why you'd need both a subdir and srcdir but I think > I need to think about this more deeply, FWIW I'm technically still on > vacation. The subdir is used to place the archive content inside an arbitrary folder. This is only needed, if you need to place one source into an other source. The srcdir is needed to know the path of the specification file inside the archive. The main folder inside an archive is archive specific. Therefore the S variable uses a common default (${WORKDIR}/${BP}) and it is common to override the variable because the archive uses an other folder name. Additionally the specification file could be located inside a sub folder of the archive. librsvg: SRC_URI = "cargolock+${GNOME_MIRROR}/${GNOMEBN}/${@gnome_verdir("${PV}")}/${GNOMEBN}-${PV}.tar.${GNOME_COMPRESS_TYPE};name=archive;srcdir=${BP}" --> librsvg-2.58.2/Cargo.lock python3-bcrypt: SRC_URI = "cargolock+${@pypi_src_uri(d)};srcdir=${PYPI_PACKAGE}-${PV}/${CARGO_SRC_DIR}" S = "${WORKDIR}/${PYPI_PACKAGE}-${PV}" CARGO_SRC_DIR = "src/_bcrypt" --> bcrypt-4.2.1/src/_bcrypt/Cargo.lock Maybe the srcdir should be named specdir. Or specsuffix if its default value is ${BN}.
On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > > The patch series improves the fetcher support for tightly coupled > package manager (npm, go and cargo). It adds support for embedded > dependency fetcher via a common dependency mixin. The patch series > reworks the npm-shrinkwrap.json (package-lock.json) support and adds a > fetcher for go.sum and cargo.lock files. The dependency mixin contains > two stages. The first stage locates a local specification file or > fetches an archive or git repository with a specification file. The > second stage resolves the dependency URLs from the specification file > and fetches the dependencies. > > SRC_URI = "<type>://npm-shrinkwrap.json" > SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" > SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" > SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" > > Additionally, the patch series reworks the npm fetcher to work without a > npm binary and external package repository. It adds support for a common > dependency name and version schema to integrate the dependencies into > the SBOM. > > = Background > Bitbake has diverse concepts and drawbacks for different tightly coupled > package manager. The Python support uses a recipe per dependency and > generates common fetcher URLs via a python function. The other languages > embed the dependencies inside the recipe. The Node.js support offers a > npmsw fetcher which uses a lock file beside the recipe to generates > multiple common fetcher URLs on the fly and thereby hides the real > download sources. This leads to a single source in the SBOM for example. > The Go support contains two parallel implementations. A vendor-based > solution with a common fetcher and a go-mod-based solution with a gomod > fetcher. The vendor-based solution includes the individual dependencies > into the SRC_URI of the recipe and uses a python function to generate > common fetcher URLs which additional information for the vendor task.The > gomod fetcher uses a proprietary gomod URL. It translates the URL into a > common URL and prepares meta data during unpack. The Rust support > includes the individual dependencies in the SRC_URI of the recipe and > uses proprietary crate URLs. The crate fetcher translates a proprietary > URL into a common fetcher URL and prepares meta data during unpack. The > recipetool does not support the crate and the gomod fetcher. This leads > to missing licenses of the dependencies in the recipe for example > librsvg. > > The steps needed to fetch dependencies for Node.js, Go and Rust are > similar: > 1. Extract the dependencies from a specification file (name, version, > checksum and URL) > 2. Generate proprietary fetcher URIs > a. npm://registry.npmjs.org/;package=glob;version= 10.3.15 > b. gomod://golang.org/x/net;version=v0.9.0 > gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net > c. crate://crates.io/glob/0.3.1 > 3. Generate wget or git fetcher URIs > a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=… > b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=… > git://go.googlesource.com/net;protocol=https; subdir=… > c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=… > 4. Unpack > 5. Create meta files > a. Update lockfile and create tar.gz archives > b. Create go.mod file > Create info, go.mod file and zip archives > c. Create .cargo-checksum.json files > > It looks like the recipetool is not widely used and therefore this patch > series integrates the dependency resolving into the fetcher. After an > agreement on a concept the fetcher could be extended. The fetcher could > download the license information per package and a new build task could > run the license cruncher from the recipetool. I've spent a bit more time thinking about this and looking at the code and I've mixed feelings on it. I can certainly see why you've implemented it this way and it does have a lot of potential but there are also potential risks. My comments (on various elements): With a npm-shrinkwrap.json/package-lock.json/go.sum file, are dependencies always recorded as specific entities with checksums? I'm a little bit worried about how easily you could sneak a "floating" version into this and make the fetcher non-deterministic. Does (or could?) the code detect and error on that? Put another way, could one of these SRC_URIs map to multiple different combinations of underlying component versions? Our existing method effectively hardcodes/expands the lock file into extended SRC_URI entries which makes the specific versions and components really clear. This change abstracts that away into the fetcher and makes it opaque to the user, and much harder for code like the archiver/license/spdx code to fine/handle. I noticed that any fetcher operation has to first expand the lock file using a temporary directory. You're using DL_DIR for that which I suspect isn't a great idea for tmp files. In many cases that will work fine but it is a bit of a performance overhead. I did start wondering if we should cache the lock files in a subdir of DL_DIR to help performance and also give some extra assurance about changing content. The url scheme is clever but also has a potential risk in that you can't really pass parameters to both the top level fetcher and the underlying one. I'm worried that is going to bite us further down the line. > = Open questions > > * Where should we download dependencies? > ** Should we use a folder per fetcher (ex. git and npm)? > ** Should we use the main folder (ex. crate)? > ** Should we translate the name into folder (ex. gomod)? > ** Should we integrate the name into the filename (ex. git)? DL_DIR is meant to be a complete cache of the source so it would need to be downloaded there. Given it maps to the other fetchers, the existing cache mechanisms likely work for these just fine, the open question is on whether the lock/spec files should be cached after extraction. > * Where should we unpack the dependencies? > ** Should we use a folder inside the parent folder (ex. node_modules)? > ** Should we use a fixed folder inside unpackdir > (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? This likely depends on the fetcher as the different mechanisms will have different expectations about how they should be extracted (as npm/etc. would). > * How should we treat archives for package manager caches? > ** Should we unpack the archives to support patching (ex. npm)? > ** Should we copy the packed archive to avoid unpacking and packaging > (ex. gomod)? If there are archives left after do_unpack, which task is going to unpack those? Are we expecting the build process in configure/compile to decompress them? Would those management tools accept things if they were extracted earlier? "unpack" would be the correct time to do it but I can see this getting into conflict with the package manager :/. > This patch series depends on patch series > 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com > ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior"). Those merged thanks. I did wonder if patches 1-5 of this series could be merged separately too as they look reasonable regardless of the rest of the series? Cheers, Richard
Am 06.01.2025 um 12:04 schrieb Richard Purdie: > On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >> From: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com> >> >> The patch series improves the fetcher support for tightly coupled >> package manager (npm, go and cargo). It adds support for embedded >> dependency fetcher via a common dependency mixin. The patch series >> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a >> fetcher for go.sum and cargo.lock files. The dependency mixin contains >> two stages. The first stage locates a local specification file or >> fetches an archive or git repository with a specification file. The >> second stage resolves the dependency URLs from the specification file >> and fetches the dependencies. >> >> SRC_URI = "<type>://npm-shrinkwrap.json" >> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" >> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" >> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" >> >> Additionally, the patch series reworks the npm fetcher to work without a >> npm binary and external package repository. It adds support for a common >> dependency name and version schema to integrate the dependencies into >> the SBOM. >> >> = Background >> Bitbake has diverse concepts and drawbacks for different tightly coupled >> package manager. The Python support uses a recipe per dependency and >> generates common fetcher URLs via a python function. The other languages >> embed the dependencies inside the recipe. The Node.js support offers a >> npmsw fetcher which uses a lock file beside the recipe to generates >> multiple common fetcher URLs on the fly and thereby hides the real >> download sources. This leads to a single source in the SBOM for example. >> The Go support contains two parallel implementations. A vendor-based >> solution with a common fetcher and a go-mod-based solution with a gomod >> fetcher. The vendor-based solution includes the individual dependencies >> into the SRC_URI of the recipe and uses a python function to generate >> common fetcher URLs which additional information for the vendor task.The >> gomod fetcher uses a proprietary gomod URL. It translates the URL into a >> common URL and prepares meta data during unpack. The Rust support >> includes the individual dependencies in the SRC_URI of the recipe and >> uses proprietary crate URLs. The crate fetcher translates a proprietary >> URL into a common fetcher URL and prepares meta data during unpack. The >> recipetool does not support the crate and the gomod fetcher. This leads >> to missing licenses of the dependencies in the recipe for example >> librsvg. >> >> The steps needed to fetch dependencies for Node.js, Go and Rust are >> similar: >> 1. Extract the dependencies from a specification file (name, version, >> checksum and URL) >> 2. Generate proprietary fetcher URIs >> a. npm://registry.npmjs.org/;package=glob;version= 10.3.15 >> b. gomod://golang.org/x/net;version=v0.9.0 >> gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net >> c. crate://crates.io/glob/0.3.1 >> 3. Generate wget or git fetcher URIs >> a.https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=… >> b.https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=… >> git://go.googlesource.com/net;protocol=https; subdir=… >> c.https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=… >> 4. Unpack >> 5. Create meta files >> a. Update lockfile and create tar.gz archives >> b. Create go.mod file >> Create info, go.mod file and zip archives >> c. Create .cargo-checksum.json files >> >> It looks like the recipetool is not widely used and therefore this patch >> series integrates the dependency resolving into the fetcher. After an >> agreement on a concept the fetcher could be extended. The fetcher could >> download the license information per package and a new build task could >> run the license cruncher from the recipetool. > I've spent a bit more time thinking about this and looking at the code > and I've mixed feelings on it.I can certainly see why you've > implemented it this way and it does have a lot of potential but there > are also potential risks. Thank you very much for your feedback. > My comments (on various elements): > > With a npm-shrinkwrap.json/package-lock.json/go.sum file, are > dependencies always recorded as specific entities with checksums? Yes, every dependency contains a fixed version and a checksum. The purpose of the file is integrity and reproducibility. > I'm a > little bit worried about how easily you could sneak a "floating" > version into this and make the fetcher non-deterministic. Does (or > could?) the code detect and error on that? We could raise an error if a checksum is missing in the dependency specification file or make the checksum mandatory for the dependency fetcher. Furthermore we could inspect the dependency URLs to detect a misuse of the file like a latest string for the version. > Put another way, could one of these SRC_URIs map to multiple different > combinations of underlying component versions? If you mean the extracted SRC_URI for a single dependency from the dependency specification file (ex. npm-shrinkwrap.json) it could use special URLs to map to the latest version. But this is a missus of the dependency specification file and could be detected. The tools generate files with fixed versions always because a floating version with a fixed checksum make no senses. > Our existing method effectively hardcodes/expands the lock file into > extended SRC_URI entries which makes the specific versions and > components really clear. This change abstracts that away into the > fetcher and makes it opaque to the user, and much harder for code like > the archiver/license/spdx code to fine/handle. Really? Let's use the crate fetcher as an example. At the moment the cargo-update-recipe-crates class extract the URI and checksum from the Cargo.lock. The class ignores the licenses and they leads to missing licenses in the recipe. The spdx files contains bitbake specific fetcher URLs only which are unknown outside of bitbake. I also thought it would make sense to generate recipes from the dependency specification files and therefore worked on the recipetool previous. But it looks like the tool isn't really used and I'm afraid nobody will use the recipe to fix dependencies. In most cases it is easy to update a dependency in the native tooling and only provide an updated dependency specification file. I have a WIP to integrate the the dependencies into the spdx . This uses the expanded_urldata / implicit_urldata function to add the dependencies to the process list of archiver and spdx. https://github.com/weidmueller/poky/tree/feature/dependency-fetcher Regarding the license we could migrate the functionality from recipetool into a class and detect the licenses at build time. Theoretically the fetcher could fetch the license from the package manager repository but we have to trust the repository because we have no checksum to detect changes. Maybe we could integrate tools like Syft or ScanCode to detect the licenses at build time. At the moment the best solution is to make sure that the SBOM contains the name and version of the dependencies and let other tools handle the license via SBOM for now. Therefore I propose a common scheme to define the dependency name (dn) and version (dv) in the SRC_URI. > I noticed that any fetcher operation has to first expand the lock file > using a temporary directory. I follow gitsm and open for suggestions. The expand happens only once per fetcher object. The sub fechter object is saved in the proxy variable. > You're using DL_DIR for that which I > suspect isn't a great idea for tmp files. Take over from gitsm. > In many cases that will work > fine but it is a bit of a performance overhead. > > I did start wondering if we should cache the lock files in a subdir of > DL_DIR to help performance and also give some extra assurance about > changing content. This would be possible. I assume the best would be another sub SRC_URI to avoid code duplication for the locking and change detection. > The url scheme is clever but also has a potential risk in that you > can't really pass parameters to both the top level fetcher and the > underlying one. I'm worried that is going to bite us further down the > line. At the moment I don't see a real problem but maybe you are right. The existing language specific fetcher use fixed paths for there downloads. What do you propose? Should the fetcher skip the unpack of the source or should we introduce a sub fetcher which uses the download from an other SRC_URI entry. The two entries could be linked via the name parameter. This approach could be combined with your suggestion above. The new fetcher will unpack a lock file from an other (default) download. >> = Open questions >> >> * Where should we download dependencies? >> ** Should we use a folder per fetcher (ex. git and npm)? >> ** Should we use the main folder (ex. crate)? >> ** Should we translate the name into folder (ex. gomod)? >> ** Should we integrate the name into the filename (ex. git)? > DL_DIR is meant to be a complete cache of the source so it would need > to be downloaded there. Given it maps to the other fetchers, the > existing cache mechanisms likely work for these just fine, the open > question is on whether the lock/spec files should be cached after > extraction. You misunderstand the question. Its about the downloadfilename parameter. At the moment some fetcher use sub folder inside DL_DIR and others use the main folder. It looks like every fetcher has its own concept to handle file collision between different fetchers. The git and npm fetcher use there own folder, the crate fetcher use its own .crate file prefix, the gomod fetcher translate the URL into multiple folders and the git fetcher translate the URL into a single folder name. >> * Where should we unpack the dependencies? >> ** Should we use a folder inside the parent folder (ex. node_modules)? >> ** Should we use a fixed folder inside unpackdir >> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? > This likely depends on the fetcher as the different mechanisms will > have different expectations about how they should be extracted (as > npm/etc. would). It depends on the fetcher but the fetcher could use the same approach. At the moment every fetcher use a different approach. The crate fetcher use a fixed value. The gomod fetcher uses a variable (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). Furthermore the gomod fetcher override the common subdir parameter. >> * How should we treat archives for package manager caches? >> ** Should we unpack the archives to support patching (ex. npm)? >> ** Should we copy the packed archive to avoid unpacking and packaging >> (ex. gomod)? > If there are archives left after do_unpack, which task is going to > unpack those? Are we expecting the build process in configure/compile > to decompress them? Would those management tools accept things if they > were extracted earlier? "unpack" would be the correct time to do it but > I can see this getting into conflict with the package manager :/. Most package manager expect archives. In the npm case the archive is unpack by the fetcher and packed by thenpm.bbclass to support patching. The gomod fetcher doesn't unpack the downloaded archive and the gomodgit fetcher create archives from git folders during unpack. It would be possible to always keep the archives or always extract the archives and recreate archives during build. It is a decision between performance and patchability. At the moment it is complicated to work with the different fetcher because every fetcher use a different concept and it is unclear what is the desired approach. >> This patch series depends on patch series >> 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com >> ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior"). > Those merged thanks. Thanks. > I did wonder if patches 1-5 of this series could > be merged separately too as they look reasonable regardless of the rest > of the series? Sure. Should I resend the patches as separate series? Regards Stefan
Am 25.12.2024 um 16:17 schrieb Alexander Kanavin: > On Mon, 23 Dec 2024 at 11:03, Richard Purdie via > lists.openembedded.org > <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote: >> Would you be able to check if this work meets the criteria set out >> there and if not, what the differences are? > I'd also add that this would benefit from a demonstration with one of > the real recipes go/rust recipes in oe-core: basically it would be > good to push a branch of poky somewhere public, and provide > instructions on how to see the new fetchers in action, and observe > their benefits. https://github.com/yoctoproject/poky/compare/master...weidmueller:poky:feature/dependency-fetcher I have migrate the crate recipes to the new fetcher and improve the spdx 2.2 class to include the name and version of the crate dependencies. You have to inherit the create-spdx-2.2 class and build the librsvg recipe to test the new fetcher.
On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: > Am 06.01.2025 um 12:04 schrieb Richard Purdie: > > > > > My comments (on various elements): > > > > With a npm-shrinkwrap.json/package-lock.json/go.sum file, are > > dependencies always recorded as specific entities with checksums? > > > Yes, every dependency contains a fixed version and a checksum. The > purpose of the file is integrity and reproducibility. Thanks for confirming, that hasn't always been the case for some of these package management systems! > > I'm a > > little bit worried about how easily you could sneak a "floating" > > version into this and make the fetcher non-deterministic. Does (or > > could?) the code detect and error on that? > > We could raise an error if a checksum is missing in the dependency > specification file or make the checksum mandatory for the dependency > fetcher. Furthermore we could inspect the dependency URLs to detect > a misuse of the file like a latest string for the version. I think adding such an error would be a requirement for merging this. > > Put another way, could one of these SRC_URIs map to multiple > > different > > combinations of underlying component versions? > > If you mean the extracted SRC_URI for a single dependency from the > dependency specification file (ex. npm-shrinkwrap.json) it could use > special URLs to map to the latest version. But this is a missus of > the dependency specification file and could be detected. The tools > generate files with fixed versions always because a floating version > with a fixed checksum make no senses. Even if it shouldn't happen, we need to detect and error for this case as it would become very problematic for us. > > Our existing method effectively hardcodes/expands the lock file > > into > > extended SRC_URI entries which makes the specific versions and > > components really clear. This change abstracts that away into the > > fetcher and makes it opaque to the user, and much harder for code > > like > > the archiver/license/spdx code to fine/handle. > > Really? Let's use the crate fetcher as an example. At the moment the > cargo-update-recipe-crates class extract the URI and checksum from > the Cargo.lock. The class ignores the licenses and they leads to > missing licenses in the recipe. The spdx files contains bitbake > specific fetcher URLs only which are unknown outside of bitbake. I guess what I'm trying to say is that people generally easily understand the explicit expanded urls. Whilst that class does ignore license handling, the hope was that it would get added, it is certainly possible to fix that. > I also thought it would make sense to generate recipes from the > dependency specification files and therefore worked on the recipetool > previous. But it looks like the tool isn't really used and I'm afraid > nobody will use the recipe to fix dependencies. In most cases it is > easy to update a dependency in the native tooling and only provide an > updated dependency specification file. I think people have wanted a single simple command to translate the specification file into our recipe format to update the recipe. For various reasons people didn't seem to find the recipetool approach was working and created the task workflow based one. There are pros and cons to both and I don't have a strong preference. I would like to see something which makes it clear to users what is going on though and is simple to use. People do intuitively understand a .inc file with a list of urls in it. There are challenges in updating it. This other approach is not as intuitive as everything is abstracted out of sight. One thing for example which worries me is how are the license fields in the recipe going to be updated? Currently, if we teach the class, it can set LICENSE variables appropriately. With the new approach, you don't know the licenses until after unpack has run. Yes it can write it into the SPDX, but it won't work for something like the layer index or forms of analysis which don't build things. This does also extend to vulnerability analysis since we can't know what is in a given recipe without actually unpacking it. For example we could know crate XXX at version YYY has a CVE but we can't tell if a recipe uses that crate until after do_unpack, or at least not without expandurl. > I have a WIP to integrate the the dependencies into the spdx . This > uses the expanded_urldata / implicit_urldata function to add the > dependencies to the process list of archiver and spdx. > > https://github.com/weidmueller/poky/tree/feature/dependency-fetcher > > Regarding the license we could migrate the functionality from > recipetool into a class and detect the licenses at build time. > Theoretically the fetcher could fetch the license from the package > manager repository but we have to trust the repository because we > have no checksum to detect changes. Maybe we could integrate tools > like Syft or ScanCode to detect the licenses at build time. At the > moment the best solution is to make sure that the SBOM contains the > name and version of the dependencies and let other tools handle the > license via SBOM for now. Therefore I propose a common scheme to > define the dependency name (dn) and version (dv) in the SRC_URI. We could compare what licenses the package manager is showing us with what is in the recipe and error if different. There would then need to be a command to update the licenses in the recipe (in much the way urls currently get updated). > > > > I noticed that any fetcher operation has to first expand the lock > > file > > using a temporary directory. > > > I follow gitsm and open for suggestions. The expand happens only > once per fetcher object. The sub fechter object is saved in the proxy > variable. That fetcher object has to be recreated in every task or task context using the fetcher. > > > > You're using DL_DIR for that which I > > suspect isn't a great idea for tmp files. > > > Take over from gitsm. Probably not the best fetcher and I'd say gitsm should be fixed. > > > > In many cases that will work > > fine but it is a bit of a performance overhead. > > > > I did start wondering if we should cache the lock files in a subdir > > of > > DL_DIR to help performance and also give some extra assurance about > > changing content. > > > This would be possible. I assume the best would be another sub > SRC_URI to avoid code duplication for the locking and change > detection. Probably, I did wonder if the mixin could cover that abstraction/caching. > > > > The url scheme is clever but also has a potential risk in that you > > can't really pass parameters to both the top level fetcher and the > > underlying one. I'm worried that is going to bite us further down > > the > > line. > > > > At the moment I don't see a real problem but maybe you are right. The > existing language specific fetcher use fixed paths for there > downloads. > > What do you propose? Should the fetcher skip the unpack of the > source or should we introduce a sub fetcher which uses the download > from an other SRC_URI entry. The two entries could be linked via the > name parameter. This approach could be combined with your suggestion > above. The new fetcher will unpack a lock file from an other > (default) download. > I'm not really sure what is best right now. I'm trying to spell out the pros/cons of what is going on here in the hope it encourages others to give feedback as well. I agree there isn't a problem right now but I worry there soon will be by mixing two things together like this. The way we handle git protocol does cause us friction with other urls schemes already. > > > = Open questions > > > > > > * Where should we download dependencies? > > > ** Should we use a folder per fetcher (ex. git and npm)? > > > ** Should we use the main folder (ex. crate)? > > > ** Should we translate the name into folder (ex. gomod)? > > > ** Should we integrate the name into the filename (ex. git)? > > > > > > > DL_DIR is meant to be a complete cache of the source so it would > > need > > to be downloaded there. Given it maps to the other fetchers, the > > existing cache mechanisms likely work for these just fine, the open > > question is on whether the lock/spec files should be cached after > > extraction. > > > > You misunderstand the question. Its about the downloadfilename > parameter. At the moment some fetcher use sub folder inside DL_DIR > and others use the main folder. It looks like every fetcher has its > own concept to handle file collision between different fetchers. The > git and npm fetcher use there own folder, the crate fetcher use its > own .crate file prefix, the gomod fetcher translate the URL into > multiple folders and the git fetcher translate the URL into a single > folder name. That makes more sense. The layout is partially legacy. The wget and local fetchers were first and hence go directly into DL_DIR. git/svn were separated out into their own directories with a plan to have a directory per fetcher. That didn't always work out with each newer fetcher. Each fetcher does have to handle a unique naming of its urls as only the specific fetcher can know all the urls parameters and which ones affect the output vs which ones don't. > > > * Where should we unpack the dependencies? > > > ** Should we use a folder inside the parent folder (ex. > > > node_modules)? > > > ** Should we use a fixed folder inside unpackdir > > > (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? > > > > > > > This likely depends on the fetcher as the different mechanisms will > > have different expectations about how they should be extracted (as > > npm/etc. would). > > > > It depends on the fetcher but the fetcher could use the same > approach. At the moment every fetcher use a different approach. The > crate fetcher use a fixed value. The gomod fetcher uses a variable > (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). > Furthermore the gomod fetcher override the common subdir parameter. I think we really need to standardise that if we can. Each new fetcher has claimed a certain approach is effectively required by the package manager. > > > > * How should we treat archives for package manager caches? > > > ** Should we unpack the archives to support patching (ex. npm)? > > > ** Should we copy the packed archive to avoid unpacking and > > > packaging > > > (ex. gomod)? > > > > > > > If there are archives left after do_unpack, which task is going to > > unpack those? Are we expecting the build process in > > configure/compile > > to decompress them? Would those management tools accept things if > > they > > were extracted earlier? "unpack" would be the correct time to do it > > but > > I can see this getting into conflict with the package manager :/. > > > > Most package manager expect archives. In the npm case the archive is > unpack by the fetcher and packed by thenpm.bbclass to support > patching. The gomod fetcher doesn't unpack the downloaded archive and > the gomodgit fetcher create archives from git folders during unpack. > It would be possible to always keep the archives or always extract > the archives and recreate archives during build. It is a decision > between performance and patchability. > > At the moment it is complicated to work with the different fetcher > because every fetcher use a different concept and it is unclear what > is the desired approach. This is a challenge. Can we handle the unpacking with the package manager as a specific step or does it have to be combined with other steps like configure/compile? > > > > I did wonder if patches 1-5 of this series could > > be merged separately too as they look reasonable regardless of the > > rest > > of the series? > > > > Sure. Should I resend the patches as separate series? Yes please, that would then let us remove the bits we can easily review/sort and focus on this other part. Cheers, Richard
Am 06.01.2025 um 16:30 schrieb Richard Purdie: > On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: >> Am 06.01.2025 um 12:04 schrieb Richard Purdie: >> >>> My comments (on various elements): >>> >>> With a npm-shrinkwrap.json/package-lock.json/go.sum file, are >>> dependencies always recorded as specific entities with checksums? >>> >> Yes, every dependency contains a fixed version and a checksum. The >> purpose of the file is integrity and reproducibility. > Thanks for confirming, that hasn't always been the case for some of > these package management systems! > >>> I'm a >>> little bit worried about how easily you could sneak a "floating" >>> version into this and make the fetcher non-deterministic. Does (or >>> could?) the code detect and error on that? >> We could raise an error if a checksum is missing in the dependency >> specification file or make the checksum mandatory for the dependency >> fetcher. Furthermore we could inspect the dependency URLs to detect >> a misuse of the file like a latest string for the version. > > I think adding such an error would be a requirement for merging this. Should the dependency fetcher (ex. npmsw) or the language specific fetcher (ex. npm) fail if the version points to a latest version? >>> Put another way, could one of these SRC_URIs map to multiple >>> different >>> combinations of underlying component versions? >> If you mean the extracted SRC_URI for a single dependency from the >> dependency specification file (ex. npm-shrinkwrap.json) it could use >> special URLs to map to the latest version. But this is a missus of >> the dependency specification file and could be detected. The tools >> generate files with fixed versions always because a floating version >> with a fixed checksum make no senses. > Even if it shouldn't happen, we need to detect and error for this case > as it would become very problematic for us. Okay. Should we disallow a dynamic version for package manager downloads generally or do you see a reasonable use case? >>> Our existing method effectively hardcodes/expands the lock file >>> into >>> extended SRC_URI entries which makes the specific versions and >>> components really clear. This change abstracts that away into the >>> fetcher and makes it opaque to the user, and much harder for code >>> like >>> the archiver/license/spdx code to fine/handle. >> Really? Let's use the crate fetcher as an example. At the moment the >> cargo-update-recipe-crates class extract the URI and checksum from >> the Cargo.lock. The class ignores the licenses and they leads to >> missing licenses in the recipe. The spdx files contains bitbake >> specific fetcher URLs only which are unknown outside of bitbake. > I guess what I'm trying to say is that people generally easily > understand the explicit expanded urls. Whilst that class does ignore > license handling, the hope was that it would get added, it is certainly > possible to fix that. I will check how complicated it is to extract the license information from the package registries. >> I also thought it would make sense to generate recipes from the >> dependency specification files and therefore worked on the recipetool >> previous. But it looks like the tool isn't really used and I'm afraid >> nobody will use the recipe to fix dependencies. In most cases it is >> easy to update a dependency in the native tooling and only provide an >> updated dependency specification file. > I think people have wanted a single simple command to translate the > specification file into our recipe format to update the recipe. For > various reasons people didn't seem to find the recipetool approach was > working and created the task workflow based one. There are pros and > cons to both and I don't have a strong preference. I would like to see > something which makes it clear to users what is going on though and is > simple to use. > > People do intuitively understand a .inc file with a list of urls in it. > There are challenges in updating it. > > This other approach is not as intuitive as everything is abstracted out > of sight. > > One thing for example which worries me is how are the license fields in > the recipe going to be updated? > > Currently, if we teach the class, it can set LICENSE variables > appropriately. With the new approach, you don't know the licenses until > after unpack has run. Yes it can write it into the SPDX, but it won't > work for something like the layer index or forms of analysis which > don't build things. > > This does also extend to vulnerability analysis since we can't know > what is in a given recipe without actually unpacking it. For example we > could know crate XXX at version YYY has a CVE but we can't tell if a > recipe uses that crate until after do_unpack, or at least not without > expandurl. The main question is if the meta data should contain all information. If yes, we shouldn't allow any fetcher which requires an external source. This should include the gitsm fetcher and we should replace the single SRC_URI with multiple git SRC_URIs. We can go even further and forbid specific package manager fetchers and use plain https or git SRC_URIs. The python and go-vendor fetcher use this approach. Alternative we allow dependency fetchers and require that the meta data be always used via bitbake. In this case we could extend the meta data via the fetcher. In both cases it is possible to produce the same meta data. It doesn't matter if we use recipetool, devtool, bbclasses or fetcher. In any case we could resolve the SRC_URIs, checksums or srcrev from a file. The license information could be fetched from the package repositories without integrity checks or could be extracted from the individual package description file inside the downloaded sources (ex. npm). We should skip the license detection from license files for now because they generate manual work and could be discuses later. The recipe approach has the advantage that it uses fixed licenses and that license changes could be (theoretical) reviewed during recipe update. In contrast the fetcher approach reduces the update procedure to a simple file rename or SRCREV update (ex. gitsm). Furthermore, the user could simply place a file beside the recipe to update the dependencies. Could we realize the same via devtool integration and a patch? We have different solutions between the languages (ex. npmsw vs crate vs pypi) and even inside the languages (ex. go-vendor vs gomod). I would like to unify the dependency support. It doesn't matter if we decide to use the bitbake fetcher or a bitbake / devtool command for the dependency and license resolution. >> I have a WIP to integrate the the dependencies into the spdx . This >> uses the expanded_urldata / implicit_urldata function to add the >> dependencies to the process list of archiver and spdx. >> >> https://github.com/weidmueller/poky/tree/feature/dependency-fetcher >> >> Regarding the license we could migrate the functionality from >> recipetool into a class and detect the licenses at build time. >> Theoretically the fetcher could fetch the license from the package >> manager repository but we have to trust the repository because we >> have no checksum to detect changes. Maybe we could integrate tools >> like Syft or ScanCode to detect the licenses at build time. At the >> moment the best solution is to make sure that the SBOM contains the >> name and version of the dependencies and let other tools handle the >> license via SBOM for now. Therefore I propose a common scheme to >> define the dependency name (dn) and version (dv) in the SRC_URI. > We could compare what licenses the package manager is showing us with > what is in the recipe and error if different. There would then need to > be a command to update the licenses in the recipe (in much the way urls > currently get updated). Either we request the licenses from the package manager during package update or during fetch. I wouldn't do both. Instead I would analyze the the license file during build and compare the detected license with the recipe or fetcher generated licenses. But the license detection from files is an other topic and I would like to postpone it for now. >>> >>> I noticed that any fetcher operation has to first expand the lock >>> file >>> using a temporary directory. >>> >> I follow gitsm and open for suggestions. The expand happens only >> once per fetcher object. The sub fechter object is saved in the proxy >> variable. > That fetcher object has to be recreated in every task or task context > using the fetcher. Okay. In this case it makes sense to cache the resolved URIs. >>> You're using DL_DIR for that which I >>> suspect isn't a great idea for tmp files. >>> >> Take over from gitsm. > Probably not the best fetcher and I'd say gitsm should be fixed. I don't see a reason why the gitsm fetcher shouldn't handled like the other dependency fetcher. We could update the handler after we have a decision for the dependency fetchers. >>> In many cases that will work >>> fine but it is a bit of a performance overhead. >>> >>> I did start wondering if we should cache the lock files in a subdir >>> of >>> DL_DIR to help performance and also give some extra assurance about >>> changing content. >>> >> This would be possible. I assume the best would be another sub >> SRC_URI to avoid code duplication for the locking and change >> detection. > Probably, I did wonder if the mixin could cover that > abstraction/caching. That should be possible. >>> The url scheme is clever but also has a potential risk in that you >>> can't really pass parameters to both the top level fetcher and the >>> underlying one. I'm worried that is going to bite us further down >>> the >>> line. >>> >> >> At the moment I don't see a real problem but maybe you are right. The >> existing language specific fetcher use fixed paths for there >> downloads. >> >> What do you propose? Should the fetcher skip the unpack of the >> source or should we introduce a sub fetcher which uses the download >> from an other SRC_URI entry. The two entries could be linked via the >> name parameter. This approach could be combined with your suggestion >> above. The new fetcher will unpack a lock file from an other >> (default) download. >> > I'm not really sure what is best right now. I'm trying to spell out the > pros/cons of what is going on here in the hope it encourages others to > give feedback as well. I agree there isn't a problem right now but I > worry there soon will be by mixing two things together like this. The > way we handle git protocol does cause us friction with other urls > schemes already. The dependency fetcher could simple skip the unpack. In this case the user needs to use a variable to pass the same URL to the git and dependency fetcher or we could provide a python function to generate two SRC_URI with the same base URL. >>>> = Open questions >>>> >>>> * Where should we download dependencies? >>>> ** Should we use a folder per fetcher (ex. git and npm)? >>>> ** Should we use the main folder (ex. crate)? >>>> ** Should we translate the name into folder (ex. gomod)? >>>> ** Should we integrate the name into the filename (ex. git)? >>>> >>> >>> DL_DIR is meant to be a complete cache of the source so it would >>> need >>> to be downloaded there. Given it maps to the other fetchers, the >>> existing cache mechanisms likely work for these just fine, the open >>> question is on whether the lock/spec files should be cached after >>> extraction. >>> >> >> You misunderstand the question. Its about the downloadfilename >> parameter. At the moment some fetcher use sub folder inside DL_DIR >> and others use the main folder. It looks like every fetcher has its >> own concept to handle file collision between different fetchers. The >> git and npm fetcher use there own folder, the crate fetcher use its >> own .crate file prefix, the gomod fetcher translate the URL into >> multiple folders and the git fetcher translate the URL into a single >> folder name. > That makes more sense. The layout is partially legacy. The wget and > local fetchers were first and hence go directly into DL_DIR. git/svn > were separated out into their own directories with a plan to have a > directory per fetcher. That didn't always work out with each newer > fetcher. Each fetcher does have to handle a unique naming of its urls > as only the specific fetcher can know all the urls parameters and which > ones affect the output vs which ones don't. This doesn't explain why the npm but not the gomod and crate fetcher use a sub folder. All fetchers are based on the wget fetcher. >>>> * Where should we unpack the dependencies? >>>> ** Should we use a folder inside the parent folder (ex. >>>> node_modules)? >>>> ** Should we use a fixed folder inside unpackdir >>>> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? >>>> >>> >>> This likely depends on the fetcher as the different mechanisms will >>> have different expectations about how they should be extracted (as >>> npm/etc. would). >>> >> >> It depends on the fetcher but the fetcher could use the same >> approach. At the moment every fetcher use a different approach. The >> crate fetcher use a fixed value. The gomod fetcher uses a variable >> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). >> Furthermore the gomod fetcher override the common subdir parameter. > I think we really need to standardise that if we can. Each new fetcher > has claimed a certain approach is effectively required by the package > manager. What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? >>>> * How should we treat archives for package manager caches? >>>> ** Should we unpack the archives to support patching (ex. npm)? >>>> ** Should we copy the packed archive to avoid unpacking and >>>> packaging >>>> (ex. gomod)? >>>> >>> >>> If there are archives left after do_unpack, which task is going to >>> unpack those? Are we expecting the build process in >>> configure/compile >>> to decompress them? Would those management tools accept things if >>> they >>> were extracted earlier? "unpack" would be the correct time to do it >>> but >>> I can see this getting into conflict with the package manager :/. >>> >> >> Most package manager expect archives. In the npm case the archive is >> unpack by the fetcher and packed by thenpm.bbclass to support >> patching. The gomod fetcher doesn't unpack the downloaded archive and >> the gomodgit fetcher create archives from git folders during unpack. >> It would be possible to always keep the archives or always extract >> the archives and recreate archives during build. It is a decision >> between performance and patchability. >> >> At the moment it is complicated to work with the different fetcher >> because every fetcher use a different concept and it is unclear what >> is the desired approach. > This is a challenge. Can we handle the unpacking with the package > manager as a specific step or does it have to be combined with other > steps like configure/compile? It looks like this is possible: cargo fetch go mod vendor npm install I suspect you're thinking about using the package manager in do_unpack to unpack the archives and patch the unpacked archives afterwards? >>> I did wonder if patches 1-5 of this series could >>> be merged separately too as they look reasonable regardless of the >>> rest >>> of the series? >>> >> >> Sure. Should I resend the patches as separate series? > Yes please, that would then let us remove the bits we can easily > review/sort and focus on this other part. Done. I will also resend the go h1 checksum commit separate because it could be useful for the gomod fetcher. Should I also move the dn / dv parameter patches to a separate series because it could be useful without the dependency fetcher. I could add the parameters to the fetchers in a backward compatible way. Regards Stefan
On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote: > Am 06.01.2025 um 16:30 schrieb Richard Purdie: > > On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: > > > > I'm a little bit worried about how easily you could sneak a > > > > "floating" version into this and make the fetcher non- > > > > deterministic. Does (or could?) the code detect and error on > > > > that? > > > > > > > We could raise an error if a checksum is missing in the > > > dependency specification file or make the checksum mandatory for > > > the dependency fetcher. Furthermore we could inspect the > > > dependency URLs to detect a misuse of the file like a latest > > > string for the version. > > > > > > > I think adding such an error would be a requirement for merging > > this. > > > Should the dependency fetcher (ex. npmsw) or the language specific > fetcher (ex. npm) fail if the version points to a latest version? I think right now it has to error to try and reduce complexity. It is possible to support such things but you have to pass that version information back up the stack so that PV represents the different versions and that is a new level of complexity. I guess we should consider how you could theoretically support it as that might influence the design. With multiple git repos in SRC_URI for example, we end up adding multiple shortened shas to construct a PV so that if any change, PV changes. We also have to add an incrementing integer so that on opkg/dpkg/rpm operations work and versions sort. > > > > Put another way, could one of these SRC_URIs map to multiple > > > > different combinations of underlying component versions? > > > > > > If you mean the extracted SRC_URI for a single dependency from > > > the dependency specification file (ex. npm-shrinkwrap.json) it > > > could use special URLs to map to the latest version. But this is > > > a missus of the dependency specification file and could be > > > detected. The tools generate files with fixed versions always > > > because a floating version with a fixed checksum make no senses. > > > > Even if it shouldn't happen, we need to detect and error for this > > case as it would become very problematic for us. > > > Okay. Should we disallow a dynamic version for package manager > downloads generally or do you see a reasonable use case? See above. > > > > > > I also thought it would make sense to generate recipes from the > > > dependency specification files and therefore worked on the > > > recipetool > > > previous. But it looks like the tool isn't really used and I'm > > > afraid > > > nobody will use the recipe to fix dependencies. In most cases it > > > is > > > easy to update a dependency in the native tooling and only > > > provide an > > > updated dependency specification file. > > > > > > > I think people have wanted a single simple command to translate the > > specification file into our recipe format to update the recipe. For > > various reasons people didn't seem to find the recipetool approach > > was working and created the task workflow based one. There are pros > > and cons to both and I don't have a strong preference. I would like > > to see something which makes it clear to users what is going on > > though and is simple to use. > > > > People do intuitively understand a .inc file with a list of urls in > > it. There are challenges in updating it. > > > > This other approach is not as intuitive as everything is abstracted > > out of sight. > > > > One thing for example which worries me is how are the license > > fields in the recipe going to be updated? > > > > Currently, if we teach the class, it can set LICENSE variables > > appropriately. With the new approach, you don't know the licenses > > until > > after unpack has run. Yes it can write it into the SPDX, but it > > won't > > work for something like the layer index or forms of analysis which > > don't build things. > > > > This does also extend to vulnerability analysis since we can't know > > what is in a given recipe without actually unpacking it. For > > example we > > could know crate XXX at version YYY has a CVE but we can't tell if > > a > > recipe uses that crate until after do_unpack, or at least not > > without > > expandurl. > > > > The main question is if the meta data should contain all information. > If yes, we shouldn't allow any fetcher which requires an external > source. This should include the gitsm fetcher and we should replace > the single SRC_URI with multiple git SRC_URIs. If we had tooling that supported that well we could certainly consider it. It isn't straight forward as you can have a git repo containing submodules which then themselves contain submodules which can then contain more levels of submodules. There are therefore multiple levels of expansion possible. > We can go even further and forbid specific package manager fetchers > and use plain https or git SRC_URIs. The python and go-vendor fetcher > use this approach. > > Alternative we allow dependency fetchers and require that the meta > data be always used via bitbake. In this case we could extend the > meta data via the fetcher. > > In both cases it is possible to produce the same meta data. It > doesn't matter if we use recipetool, devtool, bbclasses or fetcher. > In any case we could resolve the SRC_URIs, checksums or srcrev from a > file. The license information could be fetched from the package > repositories without integrity checks or could be extracted from the > individual package description file inside the downloaded sources > (ex. npm). We should skip the license detection from license files > for now because they generate manual work and could be discuses > later. That was the reason the current task based approach doesn't use them, yet! I mention it just to highlight that it can be solved either way, the approach doesn't really change what we need to do. The bigger concern is having information available in the metadata which I think we need do to some level regardless of which approach we choose. > The recipe approach has the advantage that it uses fixed licenses and > that license changes could be (theoretical) reviewed during recipe > update. FWIW that is an important use case and one of our general strengths. We can only do that as the license information is written in recipes and can be compared at update time. > In contrast the fetcher approach reduces the update procedure to a > simple file rename or SRCREV update (ex. gitsm). Furthermore, the > user could simply place a file beside the recipe to update the > dependencies. Could we realize the same via devtool integration and a > patch? This is effectively what the task based approach is aiming for currently. I think the idea was that we could have devtool/recipetool integration around that update task, a task was just a convenient way to capture the code to do it and get things working without needing the tool to be finished. > We have different solutions between the languages (ex. npmsw vs > crate vs pypi) and even inside the languages (ex. go-vendor vs > gomod). I would like to unify the dependency support. It doesn't > matter if we decide to use the bitbake fetcher or a bitbake / devtool > command for the dependency and license resolution. I do very much prefer having one good way of doing things rather than multiple ways of doing things, each with a potential drawback. I'm therefore broadly in favour of doing that as long as we don't upset too much existing mindshare along the way. > > > > > > I have a WIP to integrate the the dependencies into the spdx . > > > This > > > uses the expanded_urldata / implicit_urldata function to add the > > > dependencies to the process list of archiver and spdx. > > > > > > https://github.com/weidmueller/poky/tree/feature/dependency- > > > fetcher > > > > > > Regarding the license we could migrate the functionality from > > > recipetool into a class and detect the licenses at build time. > > > Theoretically the fetcher could fetch the license from the > > > package > > > manager repository but we have to trust the repository because we > > > have no checksum to detect changes. Maybe we could integrate > > > tools > > > like Syft or ScanCode to detect the licenses at build time. At > > > the > > > moment the best solution is to make sure that the SBOM contains > > > the > > > name and version of the dependencies and let other tools handle > > > the > > > license via SBOM for now. Therefore I propose a common scheme to > > > define the dependency name (dn) and version (dv) in the SRC_URI. > > > > > > > We could compare what licenses the package manager is showing us > > with > > what is in the recipe and error if different. There would then need > > to > > be a command to update the licenses in the recipe (in much the way > > urls > > currently get updated). > > > > Either we request the licenses from the package manager during > package update or during fetch. I wouldn't do both. Instead I would > analyze the the license file during build and compare the detected > license with the recipe or fetcher generated licenses. But the > license detection from files is an other topic and I would like to > postpone it for now. Agreed, I mention it just to highlight that supporting them does have impact on the design so any solution needs to ultimately be able to support it. > > > > You're using DL_DIR for that which I > > > > suspect isn't a great idea for tmp files. > > > > > > Take over from gitsm. > > > > Probably not the best fetcher and I'd say gitsm should be fixed. > > I don't see a reason why the gitsm fetcher shouldn't handled like the > other dependency fetcher. We could update the handler after we have a > decision for the dependency fetchers. In principle perhaps but as mentioned above, gitsm has its own challenges. > > > > The url scheme is clever but also has a potential risk in that you > > > > can't really pass parameters to both the top level fetcher and the > > > > underlying one. I'm worried that is going to bite us further down > > > > the > > > > line. > > > > > > At the moment I don't see a real problem but maybe you are right. The > > > existing language specific fetcher use fixed paths for there > > > downloads. > > > > > > What do you propose? Should the fetcher skip the unpack of the > > > source or should we introduce a sub fetcher which uses the download > > > from an other SRC_URI entry. The two entries could be linked via the > > > name parameter. This approach could be combined with your suggestion > > > above. The new fetcher will unpack a lock file from an other > > > (default) download. > > > > I'm not really sure what is best right now. I'm trying to spell out the > > pros/cons of what is going on here in the hope it encourages others to > > give feedback as well. I agree there isn't a problem right now but I > > worry there soon will be by mixing two things together like this. The > > way we handle git protocol does cause us friction with other urls > > schemes already. > > The dependency fetcher could simple skip the unpack. In this case the > user needs to use a variable to pass the same URL to the git and > dependency fetcher or we could provide a python function to generate > two SRC_URI with the same base URL. > I'm starting to wonder about a slightly different approach, basically an optional generated file alongside a recipe which contains "expanded" information which is effectively expensive to generate (in computation or resource like network access/process terms). We could teach bitbake a new phase of parsing where it generated them if missing. There are some other pieces of information which we know during the build process which it would be helpful to know earlier (e.g. which packages a recipe generates). I've wondered about this for a long time and the fetcher issues remind me of it again. It would be a big change with advantages and drawbacks. I think it would put more pressure on a layer maintainer as they'd have to computationally keep this up to date and it would complicate the patch workflow (who should send/regen the files?). I'm putting the idea there, I'm not saying I think we should do it, I'm just considering options. > = Open questions > > > > > > > > > > * Where should we download dependencies? > > > > > ** Should we use a folder per fetcher (ex. git and npm)? > > > > > ** Should we use the main folder (ex. crate)? > > > > > ** Should we translate the name into folder (ex. gomod)? > > > > > ** Should we integrate the name into the filename (ex. git)? > > > > > > > > > > > > > > > > > > > > > > DL_DIR is meant to be a complete cache of the source so it would > > > > need > > > > to be downloaded there. Given it maps to the other fetchers, the > > > > existing cache mechanisms likely work for these just fine, the open > > > > question is on whether the lock/spec files should be cached after > > > > extraction. > > > > > > You misunderstand the question. Its about the downloadfilename > > > parameter. At the moment some fetcher use sub folder inside DL_DIR > > > and others use the main folder. It looks like every fetcher has its > > > own concept to handle file collision between different fetchers. The > > > git and npm fetcher use there own folder, the crate fetcher use its > > > own .crate file prefix, the gomod fetcher translate the URL into > > > multiple folders and the git fetcher translate the URL into a single > > > folder name. > > > > That makes more sense. The layout is partially legacy. The wget and > > local fetchers were first and hence go directly into DL_DIR. git/svn > > were separated out into their own directories with a plan to have a > > directory per fetcher. That didn't always work out with each newer > > fetcher. Each fetcher does have to handle a unique naming of its urls > > as only the specific fetcher can know all the urls parameters and which > > ones affect the output vs which ones don't. > > > This doesn't explain why the npm but not the gomod and crate fetcher > use a sub folder. All fetchers are based on the wget fetcher. That is probably "my fault". Put yourself in my position. You get a ton of different patches, all touching very varied aspects of the system. When reviewing them you have to try and remember the original design decisions, the future directions, the ways things broke in the past, a desire to try and have clean consistent APIs and so on. I have tried very hard to move things in a direction where things incrementally improve, without unnecessarily blocking new features. It means that things that merge often aren't perfect. We've tried a few different approaches with the newer programming languages and each approach has had pros and cons. The inconsistency is probably as I missed something in review. Sorry :(. I only have finite time. There are few people who seem to want to dive in and help with review of patches like these. I did ask some people yesterday, one told me they simply couldn't understand these patches. I'm doing my best to ask the right questions, try and help others understand them, ensure my own concerns I can identify are resolved and I don't want to de-motivate you on this work either, I think the idea of improving this is great and I'd love to see it. Equally, I'm also the first person everyone will complain to if we change something and it causes problems for people. So the explanation is probably I just missed something in review at some point. The intent was to separate out the fetcher output going forward (unless it makes sense to be shared). FWIW there are multiple things which bother me about the existing fetcher storage layout but that is a different discussion. > > > > > > > > > > * Where should we unpack the dependencies? > > > > > ** Should we use a folder inside the parent folder (ex. > > > > > node_modules)? > > > > > ** Should we use a fixed folder inside unpackdir > > > > > (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? > > > > > > > > This likely depends on the fetcher as the different mechanisms will > > > > have different expectations about how they should be extracted (as > > > > npm/etc. would). > > > > > > > > > It depends on the fetcher but the fetcher could use the same > > > approach. At the moment every fetcher use a different approach. The > > > crate fetcher use a fixed value. The gomod fetcher uses a variable > > > (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). > > > Furthermore the gomod fetcher override the common subdir parameter. > > > > I think we really need to standardise that if we can. Each new fetcher > > has claimed a certain approach is effectively required by the package > > manager. > > What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? I suspect we need a default via a variable and then the option to change the default via parameters. The default value should be a bitbake fetcher namespaced control variable. I'm wary of making a definitive statement saying X if that isn't going to make sense for some backend though. I simply don't have enough knowledge of them all, which is why you see me being reluctant to make definitive statements about design. > > > > > * How should we treat archives for package manager caches? > > > > > ** Should we unpack the archives to support patching (ex. npm)? > > > > > ** Should we copy the packed archive to avoid unpacking and > > > > > packaging > > > > > (ex. gomod)? > > > > > > > > > If there are archives left after do_unpack, which task is going > > > > to unpack those? Are we expecting the build process in > > > > configure/compile to decompress them? Would those management > > > > tools accept things if they were extracted earlier? "unpack" > > > > would be the correct time to do it but I can see this getting > > > > into conflict with the package manager :/. > > > > > > Most package manager expect archives. In the npm case the archive is > > > unpack by the fetcher and packed by thenpm.bbclass to support > > > patching. The gomod fetcher doesn't unpack the downloaded archive and > > > the gomodgit fetcher create archives from git folders during unpack. > > > It would be possible to always keep the archives or always extract > > > the archives and recreate archives during build. It is a decision > > > between performance and patchability. > > > > > > At the moment it is complicated to work with the different fetcher > > > because every fetcher use a different concept and it is unclear what > > > is the desired approach. > > > > This is a challenge. Can we handle the unpacking with the package > > manager as a specific step or does it have to be combined with other > > steps like configure/compile? > > > It looks like this is possible: > cargo fetch > go mod vendor > npm install > > I suspect you're thinking about using the package manager in > do_unpack to unpack the archives and patch the unpacked archives > afterwards? I'm wondering about it, yes. I know we've had challenges with patching rust modules for example so this isn't a theoretical problem :/. > > > > I did wonder if patches 1-5 of this series could be merged > > > > separately too as they look reasonable regardless of the rest > > > > of the series? > > > > > > Sure. Should I resend the patches as separate series? > > > > Yes please, that would then let us remove the bits we can easily > > review/sort and focus on this other part. > > > Done. Thanks. > I will also resend the go h1 checksum commit separate because it > could be useful for the gomod fetcher. Yes, I was waiting for a new version of that one with the naming tweaked. > Should I also move the dn / dv parameter patches to a separate series > because it could be useful without the dependency fetcher. I could > add the parameters to the fetchers in a backward compatible way. I need to think more about that one... Cheers, Richard
Am 07.01.2025 um 12:01 schrieb Richard Purdie: > On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote: >> Am 06.01.2025 um 16:30 schrieb Richard Purdie: >>> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: >>>>> I'm a little bit worried about how easily you could sneak a >>>>> "floating" version into this and make the fetcher non- >>>>> deterministic. Does (or could?) the code detect and error on >>>>> that? >>>>> >>>> We could raise an error if a checksum is missing in the >>>> dependency specification file or make the checksum mandatory for >>>> the dependency fetcher. Furthermore we could inspect the >>>> dependency URLs to detect a misuse of the file like a latest >>>> string for the version. >>>> >>> I think adding such an error would be a requirement for merging >>> this. >>> >> Should the dependency fetcher (ex. npmsw) or the language specific >> fetcher (ex. npm) fail if the version points to a latest version? > I think right now it has to error to try and reduce complexity. It is > possible to support such things but you have to pass that version > information back up the stack so that PV represents the different > versions and that is a new level of complexity. > > I guess we should consider how you could theoretically support it as > that might influence the design. With multiple git repos in SRC_URI for > example, we end up adding multiple shortened shas to construct a PV so > that if any change, PV changes. We also have to add an incrementing > integer so that on opkg/dpkg/rpm operations work and versions sort. Okay. In this case we should add the checks to the dependency resolution. Thereby we prohibit dynamic versions for the dependencies and allows users to add support for it to the fetcher of the package manager. >>>>> Put another way, could one of these SRC_URIs map to multiple >>>>> different combinations of underlying component versions? >>>> If you mean the extracted SRC_URI for a single dependency from >>>> the dependency specification file (ex. npm-shrinkwrap.json) it >>>> could use special URLs to map to the latest version. But this is >>>> a missus of the dependency specification file and could be >>>> detected. The tools generate files with fixed versions always >>>> because a floating version with a fixed checksum make no senses. >>> Even if it shouldn't happen, we need to detect and error for this >>> case as it would become very problematic for us. >>> >> Okay. Should we disallow a dynamic version for package manager >> downloads generally or do you see a reasonable use case? > See above. > >>>> I also thought it would make sense to generate recipes from the >>>> dependency specification files and therefore worked on the >>>> recipetool >>>> previous. But it looks like the tool isn't really used and I'm >>>> afraid >>>> nobody will use the recipe to fix dependencies. In most cases it >>>> is >>>> easy to update a dependency in the native tooling and only >>>> provide an >>>> updated dependency specification file. >>>> >>> >>> I think people have wanted a single simple command to translate the >>> specification file into our recipe format to update the recipe. For >>> various reasons people didn't seem to find the recipetool approach >>> was working and created the task workflow based one. There are pros >>> and cons to both and I don't have a strong preference. I would like >>> to see something which makes it clear to users what is going on >>> though and is simple to use. >>> >>> People do intuitively understand a .inc file with a list of urls in >>> it. There are challenges in updating it. >>> >>> This other approach is not as intuitive as everything is abstracted >>> out of sight. >>> >>> One thing for example which worries me is how are the license >>> fields in the recipe going to be updated? >>> >>> Currently, if we teach the class, it can set LICENSE variables >>> appropriately. With the new approach, you don't know the licenses >>> until >>> after unpack has run. Yes it can write it into the SPDX, but it >>> won't >>> work for something like the layer index or forms of analysis which >>> don't build things. >>> >>> This does also extend to vulnerability analysis since we can't know >>> what is in a given recipe without actually unpacking it. For >>> example we >>> could know crate XXX at version YYY has a CVE but we can't tell if >>> a >>> recipe uses that crate until after do_unpack, or at least not >>> without >>> expandurl. >>> >> >> The main question is if the meta data should contain all information. >> If yes, we shouldn't allow any fetcher which requires an external >> source. This should include the gitsm fetcher and we should replace >> the single SRC_URI with multiple git SRC_URIs. > > If we had tooling that supported that well we could certainly consider > it. It isn't straight forward as you can have a git repo containing > submodules which then themselves contain submodules which can then > contain more levels of submodules. There are therefore multiple levels > of expansion possible. Okay. That makes the git submodule special in compare to the other dependency fetcher. >> We can go even further and forbid specific package manager fetchers >> and use plain https or git SRC_URIs. The python and go-vendor fetcher >> use this approach. >> >> Alternative we allow dependency fetchers and require that the meta >> data be always used via bitbake. In this case we could extend the >> meta data via the fetcher. >> >> In both cases it is possible to produce the same meta data. It >> doesn't matter if we use recipetool, devtool, bbclasses or fetcher. >> In any case we could resolve the SRC_URIs, checksums or srcrev from a >> file. The license information could be fetched from the package >> repositories without integrity checks or could be extracted from the >> individual package description file inside the downloaded sources >> (ex. npm). We should skip the license detection from license files >> for now because they generate manual work and could be discuses >> later. > That was the reason the current task based approach doesn't use them, > yet! I mention it just to highlight that it can be solved either way, > the approach doesn't really change what we need to do. The bigger > concern is having information available in the metadata which I think > we need do to some level regardless of which approach we choose. > >> The recipe approach has the advantage that it uses fixed licenses and >> that license changes could be (theoretical) reviewed during recipe >> update. > FWIW that is an important use case and one of our general strengths. We > can only do that as the license information is written in recipes and > can be compared at update time. Does this apply to the license of the every individual dependency or only to the combined license? >> In contrast the fetcher approach reduces the update procedure to a >> simple file rename or SRCREV update (ex. gitsm). Furthermore, the >> user could simply place a file beside the recipe to update the >> dependencies. Could we realize the same via devtool integration and a >> patch? > This is effectively what the task based approach is aiming for > currently. I think the idea was that we could have devtool/recipetool > integration around that update task, a task was just a convenient way > to capture the code to do it and get things working without needing the > tool to be finished. What is the task based approach? `bitbake -c update xyz`? >> We have different solutions between the languages (ex. npmsw vs >> crate vs pypi) and even inside the languages (ex. go-vendor vs >> gomod). I would like to unify the dependency support. It doesn't >> matter if we decide to use the bitbake fetcher or a bitbake / devtool >> command for the dependency and license resolution. > I do very much prefer having one good way of doing things rather than > multiple ways of doing things, each with a potential drawback. I'm > therefore broadly in favour of doing that as long as we don't upset too > much existing mindshare along the way. Okay >>>> >>>> I have a WIP to integrate the the dependencies into the spdx . >>>> This >>>> uses the expanded_urldata / implicit_urldata function to add the >>>> dependencies to the process list of archiver and spdx. >>>> >>>> https://github.com/weidmueller/poky/tree/feature/dependency- >>>> fetcher >>>> >>>> Regarding the license we could migrate the functionality from >>>> recipetool into a class and detect the licenses at build time. >>>> Theoretically the fetcher could fetch the license from the >>>> package >>>> manager repository but we have to trust the repository because we >>>> have no checksum to detect changes. Maybe we could integrate >>>> tools >>>> like Syft or ScanCode to detect the licenses at build time. At >>>> the >>>> moment the best solution is to make sure that the SBOM contains >>>> the >>>> name and version of the dependencies and let other tools handle >>>> the >>>> license via SBOM for now. Therefore I propose a common scheme to >>>> define the dependency name (dn) and version (dv) in the SRC_URI. >>>> >>> >>> We could compare what licenses the package manager is showing us >>> with >>> what is in the recipe and error if different. There would then need >>> to >>> be a command to update the licenses in the recipe (in much the way >>> urls >>> currently get updated). >>> >> >> Either we request the licenses from the package manager during >> package update or during fetch. I wouldn't do both. Instead I would >> analyze the the license file during build and compare the detected >> license with the recipe or fetcher generated licenses. But the >> license detection from files is an other topic and I would like to >> postpone it for now. > Agreed, I mention it just to highlight that supporting them does have > impact on the design so any solution needs to ultimately be able to > support it. > >>>>> You're using DL_DIR for that which I >>>>> suspect isn't a great idea for tmp files. >>>> Take over from gitsm. >>> Probably not the best fetcher and I'd say gitsm should be fixed. >> I don't see a reason why the gitsm fetcher shouldn't handled like the >> other dependency fetcher. We could update the handler after we have a >> decision for the dependency fetchers. > > In principle perhaps but as mentioned above, gitsm has its own challenges. Based on your feedback I have the feeling that a dependency fetcher isn't the correct solution. The fetcher makes it impossible to review changes during recipe update. Additionally it needs caching for the resolved fetch and license data. The alternative is to create an inc file with SRC_URIs, checksums, SRCREVs and LICENSE. Any recommendation how to integrate the dependency resolution and inc creation into oe-core? >>>>> The url scheme is clever but also has a potential risk in that you >>>>> can't really pass parameters to both the top level fetcher and the >>>>> underlying one. I'm worried that is going to bite us further down >>>>> the >>>>> line. >>>> At the moment I don't see a real problem but maybe you are right. The >>>> existing language specific fetcher use fixed paths for there >>>> downloads. >>>> >>>> What do you propose? Should the fetcher skip the unpack of the >>>> source or should we introduce a sub fetcher which uses the download >>>> from an other SRC_URI entry. The two entries could be linked via the >>>> name parameter. This approach could be combined with your suggestion >>>> above. The new fetcher will unpack a lock file from an other >>>> (default) download. >>> >>> I'm not really sure what is best right now. I'm trying to spell out the >>> pros/cons of what is going on here in the hope it encourages others to >>> give feedback as well. I agree there isn't a problem right now but I >>> worry there soon will be by mixing two things together like this. The >>> way we handle git protocol does cause us friction with other urls >>> schemes already. >> The dependency fetcher could simple skip the unpack. In this case the >> user needs to use a variable to pass the same URL to the git and >> dependency fetcher or we could provide a python function to generate >> two SRC_URI with the same base URL. >> > I'm starting to wonder about a slightly different approach, basically > an optional generated file alongside a recipe which contains "expanded" > information which is effectively expensive to generate (in computation > or resource like network access/process terms). We could teach bitbake > a new phase of parsing where it generated them if missing. There are > some other pieces of information which we know during the build process > which it would be helpful to know earlier (e.g. which packages a recipe > generates). I've wondered about this for a long time and the fetcher > issues remind me of it again. It would be a big change with advantages > and drawbacks. I think it would put more pressure on a layer maintainer > as they'd have to computationally keep this up to date and it would > complicate the patch workflow (who should send/regen the files?). I'm > putting the idea there, I'm not saying I think we should do it, I'm > just considering options. Do you mean like a cache or like the inc files? Is the file totally auto generated or is manual editing acceptable? >> = Open questions >>>>>> * Where should we download dependencies? >>>>>> ** Should we use a folder per fetcher (ex. git and npm)? >>>>>> ** Should we use the main folder (ex. crate)? >>>>>> ** Should we translate the name into folder (ex. gomod)? >>>>>> ** Should we integrate the name into the filename (ex. git)? >>>>>> >>>>>> >>>>> >>>>> >>>>> DL_DIR is meant to be a complete cache of the source so it would >>>>> need >>>>> to be downloaded there. Given it maps to the other fetchers, the >>>>> existing cache mechanisms likely work for these just fine, the open >>>>> question is on whether the lock/spec files should be cached after >>>>> extraction. >>>> >>>> You misunderstand the question. Its about the downloadfilename >>>> parameter. At the moment some fetcher use sub folder inside DL_DIR >>>> and others use the main folder. It looks like every fetcher has its >>>> own concept to handle file collision between different fetchers. The >>>> git and npm fetcher use there own folder, the crate fetcher use its >>>> own .crate file prefix, the gomod fetcher translate the URL into >>>> multiple folders and the git fetcher translate the URL into a single >>>> folder name. >>> That makes more sense. The layout is partially legacy. The wget and >>> local fetchers were first and hence go directly into DL_DIR. git/svn >>> were separated out into their own directories with a plan to have a >>> directory per fetcher. That didn't always work out with each newer >>> fetcher. Each fetcher does have to handle a unique naming of its urls >>> as only the specific fetcher can know all the urls parameters and which >>> ones affect the output vs which ones don't. >>> >> This doesn't explain why the npm but not the gomod and crate fetcher >> use a sub folder. All fetchers are based on the wget fetcher. > That is probably "my fault". Put yourself in my position. You get a ton > of different patches, all touching very varied aspects of the system. > When reviewing them you have to try and remember the original design > decisions, the future directions, the ways things broke in the past, a > desire to try and have clean consistent APIs and so on. I have tried > very hard to move things in a direction where things incrementally > improve, without unnecessarily blocking new features. It means that > things that merge often aren't perfect. We've tried a few different > approaches with the newer programming languages and each approach has > had pros and cons. The inconsistency is probably as I missed something > in review. Sorry :(. Sorry, I don't want to criticism you. I see that you have a lot of work. I want to understand the reasons for the actual design and how it should look like. > I only have finite time. There are few people who seem to want to dive > in and help with review of patches like these. I did ask some people > yesterday, one told me they simply couldn't understand these patches. What can I do to improve the review? > I'm doing my best to ask the right questions, try and help others > understand them, ensure my own concerns I can identify are resolved and > I don't want to de-motivate you on this work either, I think the idea > of improving this is great and I'd love to see it. Equally, I'm also > the first person everyone will complain to if we change something and > it causes problems for people. > > So the explanation is probably I just missed something in review at > some point. The intent was to separate out the fetcher output going > forward (unless it makes sense to be shared). > > FWIW there are multiple things which bother me about the existing > fetcher storage layout but that is a different discussion. Okay. >>>>>> * Where should we unpack the dependencies? >>>>>> ** Should we use a folder inside the parent folder (ex. >>>>>> node_modules)? >>>>>> ** Should we use a fixed folder inside unpackdir >>>>>> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? >>>>> >>>>> This likely depends on the fetcher as the different mechanisms will >>>>> have different expectations about how they should be extracted (as >>>>> npm/etc. would). >>>> >>>> It depends on the fetcher but the fetcher could use the same >>>> approach. At the moment every fetcher use a different approach. The >>>> crate fetcher use a fixed value. The gomod fetcher uses a variable >>>> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). >>>> Furthermore the gomod fetcher override the common subdir parameter. >>> I think we really need to standardise that if we can. Each new fetcher >>> has claimed a certain approach is effectively required by the package >>> manager. > >>> What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? > I suspect we need a default via a variable and then the option to > change the default via parameters. The default value should be a > bitbake fetcher namespaced control variable. > > I'm wary of making a definitive statement saying X if that isn't going > to make sense for some backend though. I simply don't have enough > knowledge of them all, which is why you see me being reluctant to make > definitive statements about design. Okay. >>>>>> * How should we treat archives for package manager caches? >>>>>> ** Should we unpack the archives to support patching (ex. npm)? >>>>>> ** Should we copy the packed archive to avoid unpacking and >>>>>> packaging >>>>>> (ex. gomod)? >>>>>> >>>>> If there are archives left after do_unpack, which task is going >>>>> to unpack those? Are we expecting the build process in >>>>> configure/compile to decompress them? Would those management >>>>> tools accept things if they were extracted earlier? "unpack" >>>>> would be the correct time to do it but I can see this getting >>>>> into conflict with the package manager :/. >>>> >>>> Most package manager expect archives. In the npm case the archive is >>>> unpack by the fetcher and packed by thenpm.bbclass to support >>>> patching. The gomod fetcher doesn't unpack the downloaded archive and >>>> the gomodgit fetcher create archives from git folders during unpack. >>>> It would be possible to always keep the archives or always extract >>>> the archives and recreate archives during build. It is a decision >>>> between performance and patchability. >>>> >>>> At the moment it is complicated to work with the different fetcher >>>> because every fetcher use a different concept and it is unclear what >>>> is the desired approach. >>> >>> This is a challenge. Can we handle the unpacking with the package >>> manager as a specific step or does it have to be combined with other >>> steps like configure/compile? >>> >> It looks like this is possible: >> cargo fetch >> go mod vendor >> npm install >> >> I suspect you're thinking about using the package manager in >> do_unpack to unpack the archives and patch the unpacked archives >> afterwards? > I'm wondering about it, yes. I know we've had challenges with patching > rust modules for example so this isn't a theoretical problem :/. It is an interesting idea because most package manager check the integrity before unpack. Additionally it should simplify and speed up the npm build because it removes the repack of the packages. The problem is that we need an additional task to patch the dependency specification file and to unpack the file. >>>>> I did wonder if patches 1-5 of this series could be merged >>>>> separately too as they look reasonable regardless of the rest >>>>> of the series? >>>> >>>> Sure. Should I resend the patches as separate series? >>> Yes please, that would then let us remove the bits we can easily >>> review/sort and focus on this other part. >>> >> Done. > Thanks. > >> I will also resend the go h1 checksum commit separate because it >> could be useful for the gomod fetcher. > Yes, I was waiting for a new version of that one with the naming tweaked. Done. >> Should I also move the dn / dv parameter patches to a separate series >> because it could be useful without the dependency fetcher. I could >> add the parameters to the fetchers in a backward compatible way. > I need to think more about that one... The motivation is to include the dependencies with name, version, license and cpe into the SBOM. Regards Stefan
Hi all, I'm going to reply at this point in the thread to at least let everyone know that I've been reading along, but honestly can't say if a few questions that I have have been asked (and answered). The biggest use case that I have for the layers and recipes that I maintain is about being able to both "easily" patch or update vendor/dependencies of the main application build. It was unclear to me how I'd do that with these changes. For the copied/extracted dependencies, I can see that you'd just be able to figure out where they were extracted (and I see the discussions on where to extract/store some of the files) and then write a patch as you would with any recipe. But would there be a way to patch the dependency "lock file" ? I definitely don't see a way that I'd be able to tweak a source hash and have an updated dependency pulled in .. but I could have easily missed that. Those are the primary reasons why I'll stay with explicitly listed / visible dependencies, unless something similar is available in a re-worked / unified fetcher. I prefer the translation to git, so I have debug source for vendor dependencies as well as a well travelled path to mirror and archive the source, but something like the update task of rust is at least explicit and visible to me, so I can also use it without too many issues. Bruce On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via lists.openembedded.org <stefan.herbrechtsmeier-oss= weidmueller.com@lists.openembedded.org> wrote: > Am 07.01.2025 um 12:01 schrieb Richard Purdie: > > On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote: > > Am 06.01.2025 um 16:30 schrieb Richard Purdie: > > On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: > > I'm a little bit worried about how easily you could sneak a > "floating" version into this and make the fetcher non- > deterministic. Does (or could?) the code detect and error on > that? > > > We could raise an error if a checksum is missing in the > dependency specification file or make the checksum mandatory for > the dependency fetcher. Furthermore we could inspect the > dependency URLs to detect a misuse of the file like a latest > string for the version. > > > I think adding such an error would be a requirement for merging > this. > > > Should the dependency fetcher (ex. npmsw) or the language specific > fetcher (ex. npm) fail if the version points to a latest version? > > I think right now it has to error to try and reduce complexity. It is > possible to support such things but you have to pass that version > information back up the stack so that PV represents the different > versions and that is a new level of complexity. > > I guess we should consider how you could theoretically support it as > that might influence the design. With multiple git repos in SRC_URI for > example, we end up adding multiple shortened shas to construct a PV so > that if any change, PV changes. We also have to add an incrementing > integer so that on opkg/dpkg/rpm operations work and versions sort. > > Okay. In this case we should add the checks to the dependency resolution. > Thereby we prohibit dynamic versions for the dependencies and allows users > to add support for it to the fetcher of the package manager. > > Put another way, could one of these SRC_URIs map to multiple > different combinations of underlying component versions? > > If you mean the extracted SRC_URI for a single dependency from > the dependency specification file (ex. npm-shrinkwrap.json) it > could use special URLs to map to the latest version. But this is > a missus of the dependency specification file and could be > detected. The tools generate files with fixed versions always > because a floating version with a fixed checksum make no senses. > > Even if it shouldn't happen, we need to detect and error for this > case as it would become very problematic for us. > > > Okay. Should we disallow a dynamic version for package manager > downloads generally or do you see a reasonable use case? > > See above. > > > I also thought it would make sense to generate recipes from the > dependency specification files and therefore worked on the > recipetool > previous. But it looks like the tool isn't really used and I'm > afraid > nobody will use the recipe to fix dependencies. In most cases it > is > easy to update a dependency in the native tooling and only > provide an > updated dependency specification file. > > > > I think people have wanted a single simple command to translate the > specification file into our recipe format to update the recipe. For > various reasons people didn't seem to find the recipetool approach > was working and created the task workflow based one. There are pros > and cons to both and I don't have a strong preference. I would like > to see something which makes it clear to users what is going on > though and is simple to use. > > People do intuitively understand a .inc file with a list of urls in > it. There are challenges in updating it. > > This other approach is not as intuitive as everything is abstracted > out of sight. > > One thing for example which worries me is how are the license > fields in the recipe going to be updated? > > Currently, if we teach the class, it can set LICENSE variables > appropriately. With the new approach, you don't know the licenses > until > after unpack has run. Yes it can write it into the SPDX, but it > won't > work for something like the layer index or forms of analysis which > don't build things. > > This does also extend to vulnerability analysis since we can't know > what is in a given recipe without actually unpacking it. For > example we > could know crate XXX at version YYY has a CVE but we can't tell if > a > recipe uses that crate until after do_unpack, or at least not > without > expandurl. > > > > The main question is if the meta data should contain all information. > If yes, we shouldn't allow any fetcher which requires an external > source. This should include the gitsm fetcher and we should replace > the single SRC_URI with multiple git SRC_URIs. > > If we had tooling that supported that well we could certainly consider > it. It isn't straight forward as you can have a git repo containing > submodules which then themselves contain submodules which can then > contain more levels of submodules. There are therefore multiple levels > of expansion possible. > > Okay. That makes the git submodule special in compare to the other > dependency fetcher. > > We can go even further and forbid specific package manager fetchers > and use plain https or git SRC_URIs. The python and go-vendor fetcher > use this approach. > > Alternative we allow dependency fetchers and require that the meta > data be always used via bitbake. In this case we could extend the > meta data via the fetcher. > > In both cases it is possible to produce the same meta data. It > doesn't matter if we use recipetool, devtool, bbclasses or fetcher. > In any case we could resolve the SRC_URIs, checksums or srcrev from a > file. The license information could be fetched from the package > repositories without integrity checks or could be extracted from the > individual package description file inside the downloaded sources > (ex. npm). We should skip the license detection from license files > for now because they generate manual work and could be discuses > later. > > That was the reason the current task based approach doesn't use them, > yet! I mention it just to highlight that it can be solved either way, > the approach doesn't really change what we need to do. The bigger > concern is having information available in the metadata which I think > we need do to some level regardless of which approach we choose. > > > The recipe approach has the advantage that it uses fixed licenses and > that license changes could be (theoretical) reviewed during recipe > update. > > FWIW that is an important use case and one of our general strengths. We > can only do that as the license information is written in recipes and > can be compared at update time. > > Does this apply to the license of the every individual dependency or only > to the combined license? > > In contrast the fetcher approach reduces the update procedure to a > simple file rename or SRCREV update (ex. gitsm). Furthermore, the > user could simply place a file beside the recipe to update the > dependencies. Could we realize the same via devtool integration and a > patch? > > This is effectively what the task based approach is aiming for > currently. I think the idea was that we could have devtool/recipetool > integration around that update task, a task was just a convenient way > to capture the code to do it and get things working without needing the > tool to be finished. > > What is the task based approach? `bitbake -c update xyz`? > > We have different solutions between the languages (ex. npmsw vs > crate vs pypi) and even inside the languages (ex. go-vendor vs > gomod). I would like to unify the dependency support. It doesn't > matter if we decide to use the bitbake fetcher or a bitbake / devtool > command for the dependency and license resolution. > > I do very much prefer having one good way of doing things rather than > multiple ways of doing things, each with a potential drawback. I'm > therefore broadly in favour of doing that as long as we don't upset too > much existing mindshare along the way. > > Okay > > > I have a WIP to integrate the the dependencies into the spdx . > This > uses the expanded_urldata / implicit_urldata function to add the > dependencies to the process list of archiver and spdx. > https://github.com/weidmueller/poky/tree/feature/dependency- > fetcher > > Regarding the license we could migrate the functionality from > recipetool into a class and detect the licenses at build time. > Theoretically the fetcher could fetch the license from the > package > manager repository but we have to trust the repository because we > have no checksum to detect changes. Maybe we could integrate > tools > like Syft or ScanCode to detect the licenses at build time. At > the > moment the best solution is to make sure that the SBOM contains > the > name and version of the dependencies and let other tools handle > the > license via SBOM for now. Therefore I propose a common scheme to > define the dependency name (dn) and version (dv) in the SRC_URI. > > > > We could compare what licenses the package manager is showing us > with > what is in the recipe and error if different. There would then need > to > be a command to update the licenses in the recipe (in much the way > urls > currently get updated). > > > > Either we request the licenses from the package manager during > package update or during fetch. I wouldn't do both. Instead I would > analyze the the license file during build and compare the detected > license with the recipe or fetcher generated licenses. But the > license detection from files is an other topic and I would like to > postpone it for now. > > Agreed, I mention it just to highlight that supporting them does have > impact on the design so any solution needs to ultimately be able to > support it. > > > You're using DL_DIR for that which I > suspect isn't a great idea for tmp files. > > Take over from gitsm. > > Probably not the best fetcher and I'd say gitsm should be fixed. > > I don't see a reason why the gitsm fetcher shouldn't handled like the > other dependency fetcher. We could update the handler after we have a > decision for the dependency fetchers. > > In principle perhaps but as mentioned above, gitsm has its own challenges. > > Based on your feedback I have the feeling that a dependency fetcher isn't > the correct solution. The fetcher makes it impossible to review changes > during recipe update. Additionally it needs caching for the resolved fetch > and license data. > > The alternative is to create an inc file with SRC_URIs, checksums, SRCREVs > and LICENSE. Any recommendation how to integrate the dependency resolution > and inc creation into oe-core? > > The url scheme is clever but also has a potential risk in that you > can't really pass parameters to both the top level fetcher and the > underlying one. I'm worried that is going to bite us further down > the > line. > > At the moment I don't see a real problem but maybe you are right. The > existing language specific fetcher use fixed paths for there > downloads. > > What do you propose? Should the fetcher skip the unpack of the > source or should we introduce a sub fetcher which uses the download > from an other SRC_URI entry. The two entries could be linked via the > name parameter. This approach could be combined with your suggestion > above. The new fetcher will unpack a lock file from an other > (default) download. > > > I'm not really sure what is best right now. I'm trying to spell out the > pros/cons of what is going on here in the hope it encourages others to > give feedback as well. I agree there isn't a problem right now but I > worry there soon will be by mixing two things together like this. The > way we handle git protocol does cause us friction with other urls > schemes already. > > The dependency fetcher could simple skip the unpack. In this case the > user needs to use a variable to pass the same URL to the git and > dependency fetcher or we could provide a python function to generate > two SRC_URI with the same base URL. > > > I'm starting to wonder about a slightly different approach, basically > an optional generated file alongside a recipe which contains "expanded" > information which is effectively expensive to generate (in computation > or resource like network access/process terms). We could teach bitbake > a new phase of parsing where it generated them if missing. There are > some other pieces of information which we know during the build process > which it would be helpful to know earlier (e.g. which packages a recipe > generates). I've wondered about this for a long time and the fetcher > issues remind me of it again. It would be a big change with advantages > and drawbacks. I think it would put more pressure on a layer maintainer > as they'd have to computationally keep this up to date and it would > complicate the patch workflow (who should send/regen the files?). I'm > putting the idea there, I'm not saying I think we should do it, I'm > just considering options. > > Do you mean like a cache or like the inc files? Is the file totally auto > generated or is manual editing acceptable? > > = Open questions > > * Where should we download dependencies? > ** Should we use a folder per fetcher (ex. git and npm)? > ** Should we use the main folder (ex. crate)? > ** Should we translate the name into folder (ex. gomod)? > ** Should we integrate the name into the filename (ex. git)? > > > > > > DL_DIR is meant to be a complete cache of the source so it would > need > to be downloaded there. Given it maps to the other fetchers, the > existing cache mechanisms likely work for these just fine, the open > question is on whether the lock/spec files should be cached after > extraction. > > > You misunderstand the question. Its about the downloadfilename > parameter. At the moment some fetcher use sub folder inside DL_DIR > and others use the main folder. It looks like every fetcher has its > own concept to handle file collision between different fetchers. The > git and npm fetcher use there own folder, the crate fetcher use its > own .crate file prefix, the gomod fetcher translate the URL into > multiple folders and the git fetcher translate the URL into a single > folder name. > > That makes more sense. The layout is partially legacy. The wget and > local fetchers were first and hence go directly into DL_DIR. git/svn > were separated out into their own directories with a plan to have a > directory per fetcher. That didn't always work out with each newer > fetcher. Each fetcher does have to handle a unique naming of its urls > as only the specific fetcher can know all the urls parameters and which > ones affect the output vs which ones don't. > > > This doesn't explain why the npm but not the gomod and crate fetcher > use a sub folder. All fetchers are based on the wget fetcher. > > That is probably "my fault". Put yourself in my position. You get a ton > of different patches, all touching very varied aspects of the system. > When reviewing them you have to try and remember the original design > decisions, the future directions, the ways things broke in the past, a > desire to try and have clean consistent APIs and so on. I have tried > very hard to move things in a direction where things incrementally > improve, without unnecessarily blocking new features. It means that > things that merge often aren't perfect. We've tried a few different > approaches with the newer programming languages and each approach has > had pros and cons. The inconsistency is probably as I missed something > in review. Sorry :(. > > Sorry, I don't want to criticism you. I see that you have a lot of work. I > want to understand the reasons for the actual design and how it should look > like. > > I only have finite time. There are few people who seem to want to dive > in and help with review of patches like these. I did ask some people > yesterday, one told me they simply couldn't understand these patches. > > What can I do to improve the review? > > I'm doing my best to ask the right questions, try and help others > understand them, ensure my own concerns I can identify are resolved and > I don't want to de-motivate you on this work either, I think the idea > of improving this is great and I'd love to see it. Equally, I'm also > the first person everyone will complain to if we change something and > it causes problems for people. > > So the explanation is probably I just missed something in review at > some point. The intent was to separate out the fetcher output going > forward (unless it makes sense to be shared). > > FWIW there are multiple things which bother me about the existing > fetcher storage layout but that is a different discussion. > > Okay. > > * Where should we unpack the dependencies? > ** Should we use a folder inside the parent folder (ex. > node_modules)? > ** Should we use a fixed folder inside unpackdir > (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? > > > This likely depends on the fetcher as the different mechanisms will > have different expectations about how they should be extracted (as > npm/etc. would). > > > It depends on the fetcher but the fetcher could use the same > approach. At the moment every fetcher use a different approach. The > crate fetcher use a fixed value. The gomod fetcher uses a variable > (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). > Furthermore the gomod fetcher override the common subdir parameter. > > I think we really need to standardise that if we can. Each new fetcher > has claimed a certain approach is effectively required by the package > manager. > > What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? > > I suspect we need a default via a variable and then the option to > change the default via parameters. The default value should be a > bitbake fetcher namespaced control variable. > > I'm wary of making a definitive statement saying X if that isn't going > to make sense for some backend though. I simply don't have enough > knowledge of them all, which is why you see me being reluctant to make > definitive statements about design. > > Okay. > > * How should we treat archives for package manager caches? > ** Should we unpack the archives to support patching (ex. npm)? > ** Should we copy the packed archive to avoid unpacking and > packaging > (ex. gomod)? > > > If there are archives left after do_unpack, which task is going > to unpack those? Are we expecting the build process in > configure/compile to decompress them? Would those management > tools accept things if they were extracted earlier? "unpack" > would be the correct time to do it but I can see this getting > into conflict with the package manager :/. > > > Most package manager expect archives. In the npm case the archive is > unpack by the fetcher and packed by thenpm.bbclass to support > patching. The gomod fetcher doesn't unpack the downloaded archive and > the gomodgit fetcher create archives from git folders during unpack. > It would be possible to always keep the archives or always extract > the archives and recreate archives during build. It is a decision > between performance and patchability. > > At the moment it is complicated to work with the different fetcher > because every fetcher use a different concept and it is unclear what > is the desired approach. > > > This is a challenge. Can we handle the unpacking with the package > manager as a specific step or does it have to be combined with other > steps like configure/compile? > > > It looks like this is possible: > cargo fetch > go mod vendor > npm install > > I suspect you're thinking about using the package manager in > do_unpack to unpack the archives and patch the unpacked archives > afterwards? > > I'm wondering about it, yes. I know we've had challenges with patching > rust modules for example so this isn't a theoretical problem :/. > > It is an interesting idea because most package manager check the integrity > before unpack. Additionally it should simplify and speed up the npm build > because it removes the repack of the packages. The problem is that we need > an additional task to patch the dependency specification file and to unpack > the file. > > I did wonder if patches 1-5 of this series could be merged > separately too as they look reasonable regardless of the rest > of the series? > > > Sure. Should I resend the patches as separate series? > > Yes please, that would then let us remove the bits we can easily > review/sort and focus on this other part. > > > Done. > > Thanks. > > > I will also resend the go h1 checksum commit separate because it > could be useful for the gomod fetcher. > > Yes, I was waiting for a new version of that one with the naming tweaked. > > Done. > > Should I also move the dn / dv parameter patches to a separate series > because it could be useful without the dependency fetcher. I could > add the parameters to the fetchers in a backward compatible way. > > I need to think more about that one... > > The motivation is to include the dependencies with name, version, license > and cpe into the SBOM. > > Regards > Stefan > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#16981): > https://lists.openembedded.org/g/bitbake-devel/message/16981 > Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ > bruce.ashfield@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > >
Am 07.01.2025 um 17:58 schrieb Bruce Ashfield: > Hi all, > > I'm going to reply at this point in the thread to at least let > everyone know that I've been reading along, but honestly can't say if > a few questions that I have have been asked (and answered). > > The biggest use case that I have for the layers and recipes that I > maintain is about being able to both "easily" patch or update > vendor/dependencies of the main application build. > > It was unclear to me how I'd do that with these changes. > > For the copied/extracted dependencies, I can see that you'd just be > able to figure out where they were extracted (and I see the > discussions on where to extract/store some of the files) and then > write a patch as you would with any recipe. But would there be a way > to patch the dependency "lock file" ? I definitely don't see a way > that I'd be able to tweak a source hash and have an updated dependency > pulled in .. but I could have easily missed that. You have to provide your own "lock file" and place it beside the recipe. The "lock file" is fetched via the file fetcher and is used to fetch the dependencies. > Those are the primary reasons why I'll stay with explicitly listed / > visible dependencies, unless something similar is available in a > re-worked / unified fetcher. It is impossible to patch the sources inside bitbake. Therefore the dependency resolution must be moved inside a dependency fetch task and an additional dependency patch task need to be added. > I prefer the translation to git, so I have debug source for vendor > dependencies as well as a well travelled path to mirror and archive > the source Do you reference to the go-vendor implementation? Do you mean the vendor directory? The gomod fetcher should support mirror and archive the sources. It should be possible to create a vendor folder from the gomod archives. > , but something like the update task of rust is at least explicit and > visible to me, so I can also use it without too many issues. Do you mean `bitbake -c update_crates recipe-name`? Regards Stefan > On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via > lists.openembedded.org <http://lists.openembedded.org> > <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> wrote: > > Am 07.01.2025 um 12:01 schrieb Richard Purdie: >> On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote: >>> Am 06.01.2025 um 16:30 schrieb Richard Purdie: >>>> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: >>>>>> I'm a little bit worried about how easily you could sneak a >>>>>> "floating" version into this and make the fetcher non- >>>>>> deterministic. Does (or could?) the code detect and error on >>>>>> that? >>>>>> >>>>> We could raise an error if a checksum is missing in the >>>>> dependency specification file or make the checksum mandatory for >>>>> the dependency fetcher. Furthermore we could inspect the >>>>> dependency URLs to detect a misuse of the file like a latest >>>>> string for the version. >>>>> >>>> I think adding such an error would be a requirement for merging >>>> this. >>>> >>> Should the dependency fetcher (ex. npmsw) or the language specific >>> fetcher (ex. npm) fail if the version points to a latest version? >> I think right now it has to error to try and reduce complexity. It is >> possible to support such things but you have to pass that version >> information back up the stack so that PV represents the different >> versions and that is a new level of complexity. >> >> I guess we should consider how you could theoretically support it as >> that might influence the design. With multiple git repos in SRC_URI for >> example, we end up adding multiple shortened shas to construct a PV so >> that if any change, PV changes. We also have to add an incrementing >> integer so that on opkg/dpkg/rpm operations work and versions sort. > > Okay. In this case we should add the checks to the dependency > resolution. Thereby we prohibit dynamic versions for the > dependencies and allows users to add support for it to the fetcher > of the package manager. > >>>>>> Put another way, could one of these SRC_URIs map to multiple >>>>>> different combinations of underlying component versions? >>>>> If you mean the extracted SRC_URI for a single dependency from >>>>> the dependency specification file (ex. npm-shrinkwrap.json) it >>>>> could use special URLs to map to the latest version. But this is >>>>> a missus of the dependency specification file and could be >>>>> detected. The tools generate files with fixed versions always >>>>> because a floating version with a fixed checksum make no senses. >>>> Even if it shouldn't happen, we need to detect and error for this >>>> case as it would become very problematic for us. >>>> >>> Okay. Should we disallow a dynamic version for package manager >>> downloads generally or do you see a reasonable use case? >> See above. >> >>>>> I also thought it would make sense to generate recipes from the >>>>> dependency specification files and therefore worked on the >>>>> recipetool >>>>> previous. But it looks like the tool isn't really used and I'm >>>>> afraid >>>>> nobody will use the recipe to fix dependencies. In most cases it >>>>> is >>>>> easy to update a dependency in the native tooling and only >>>>> provide an >>>>> updated dependency specification file. >>>>> >>>> >>>> I think people have wanted a single simple command to translate the >>>> specification file into our recipe format to update the recipe. For >>>> various reasons people didn't seem to find the recipetool approach >>>> was working and created the task workflow based one. There are pros >>>> and cons to both and I don't have a strong preference. I would like >>>> to see something which makes it clear to users what is going on >>>> though and is simple to use. >>>> >>>> People do intuitively understand a .inc file with a list of urls in >>>> it. There are challenges in updating it. >>>> >>>> This other approach is not as intuitive as everything is abstracted >>>> out of sight. >>>> >>>> One thing for example which worries me is how are the license >>>> fields in the recipe going to be updated? >>>> >>>> Currently, if we teach the class, it can set LICENSE variables >>>> appropriately. With the new approach, you don't know the licenses >>>> until >>>> after unpack has run. Yes it can write it into the SPDX, but it >>>> won't >>>> work for something like the layer index or forms of analysis which >>>> don't build things. >>>> >>>> This does also extend to vulnerability analysis since we can't know >>>> what is in a given recipe without actually unpacking it. For >>>> example we >>>> could know crate XXX at version YYY has a CVE but we can't tell if >>>> a >>>> recipe uses that crate until after do_unpack, or at least not >>>> without >>>> expandurl. >>>> >>> >>> The main question is if the meta data should contain all information. >>> If yes, we shouldn't allow any fetcher which requires an external >>> source. This should include the gitsm fetcher and we should replace >>> the single SRC_URI with multiple git SRC_URIs. >> If we had tooling that supported that well we could certainly consider >> it. It isn't straight forward as you can have a git repo containing >> submodules which then themselves contain submodules which can then >> contain more levels of submodules. There are therefore multiple levels >> of expansion possible. > > Okay. That makes the git submodule special in compare to the other > dependency fetcher. > >>> We can go even further and forbid specific package manager fetchers >>> and use plain https or git SRC_URIs. The python and go-vendor fetcher >>> use this approach. >>> >>> Alternative we allow dependency fetchers and require that the meta >>> data be always used via bitbake. In this case we could extend the >>> meta data via the fetcher. >>> >>> In both cases it is possible to produce the same meta data. It >>> doesn't matter if we use recipetool, devtool, bbclasses or fetcher. >>> In any case we could resolve the SRC_URIs, checksums or srcrev from a >>> file. The license information could be fetched from the package >>> repositories without integrity checks or could be extracted from the >>> individual package description file inside the downloaded sources >>> (ex. npm). We should skip the license detection from license files >>> for now because they generate manual work and could be discuses >>> later. >> That was the reason the current task based approach doesn't use them, >> yet! I mention it just to highlight that it can be solved either way, >> the approach doesn't really change what we need to do. The bigger >> concern is having information available in the metadata which I think >> we need do to some level regardless of which approach we choose. >> >>> The recipe approach has the advantage that it uses fixed licenses and >>> that license changes could be (theoretical) reviewed during recipe >>> update. >> FWIW that is an important use case and one of our general strengths. We >> can only do that as the license information is written in recipes and >> can be compared at update time. > > Does this apply to the license of the every individual dependency > or only to the combined license? > >>> In contrast the fetcher approach reduces the update procedure to a >>> simple file rename or SRCREV update (ex. gitsm). Furthermore, the >>> user could simply place a file beside the recipe to update the >>> dependencies. Could we realize the same via devtool integration and a >>> patch? >> This is effectively what the task based approach is aiming for >> currently. I think the idea was that we could have devtool/recipetool >> integration around that update task, a task was just a convenient way >> to capture the code to do it and get things working without needing the >> tool to be finished. > What is the task based approach? `bitbake -c update xyz`? > >>> We have different solutions between the languages (ex. npmsw vs >>> crate vs pypi) and even inside the languages (ex. go-vendor vs >>> gomod). I would like to unify the dependency support. It doesn't >>> matter if we decide to use the bitbake fetcher or a bitbake / devtool >>> command for the dependency and license resolution. >> I do very much prefer having one good way of doing things rather than >> multiple ways of doing things, each with a potential drawback. I'm >> therefore broadly in favour of doing that as long as we don't upset too >> much existing mindshare along the way. > > Okay > >>>>> >>>>> I have a WIP to integrate the the dependencies into the spdx . >>>>> This >>>>> uses the expanded_urldata / implicit_urldata function to add the >>>>> dependencies to the process list of archiver and spdx. >>>>> >>>>> https://github.com/weidmueller/poky/tree/feature/dependency- >>>>> fetcher >>>>> >>>>> Regarding the license we could migrate the functionality from >>>>> recipetool into a class and detect the licenses at build time. >>>>> Theoretically the fetcher could fetch the license from the >>>>> package >>>>> manager repository but we have to trust the repository because we >>>>> have no checksum to detect changes. Maybe we could integrate >>>>> tools >>>>> like Syft or ScanCode to detect the licenses at build time. At >>>>> the >>>>> moment the best solution is to make sure that the SBOM contains >>>>> the >>>>> name and version of the dependencies and let other tools handle >>>>> the >>>>> license via SBOM for now. Therefore I propose a common scheme to >>>>> define the dependency name (dn) and version (dv) in the SRC_URI. >>>>> >>>> >>>> We could compare what licenses the package manager is showing us >>>> with >>>> what is in the recipe and error if different. There would then need >>>> to >>>> be a command to update the licenses in the recipe (in much the way >>>> urls >>>> currently get updated). >>>> >>> >>> Either we request the licenses from the package manager during >>> package update or during fetch. I wouldn't do both. Instead I would >>> analyze the the license file during build and compare the detected >>> license with the recipe or fetcher generated licenses. But the >>> license detection from files is an other topic and I would like to >>> postpone it for now. >> Agreed, I mention it just to highlight that supporting them does have >> impact on the design so any solution needs to ultimately be able to >> support it. >> >>>>>> You're using DL_DIR for that which I >>>>>> suspect isn't a great idea for tmp files. >>>>> Take over from gitsm. >>>> Probably not the best fetcher and I'd say gitsm should be fixed. >>> I don't see a reason why the gitsm fetcher shouldn't handled like the >>> other dependency fetcher. We could update the handler after we have a >>> decision for the dependency fetchers. >> In principle perhaps but as mentioned above, gitsm has its own challenges. > > Based on your feedback I have the feeling that a dependency > fetcher isn't the correct solution. The fetcher makes it > impossible to review changes during recipe update. Additionally it > needs caching for the resolved fetch and license data. > > The alternative is to create an inc file with SRC_URIs, checksums, > SRCREVs and LICENSE. Any recommendation how to integrate the > dependency resolution and inc creation into oe-core? > >>>>>> The url scheme is clever but also has a potential risk in that you >>>>>> can't really pass parameters to both the top level fetcher and the >>>>>> underlying one. I'm worried that is going to bite us further down >>>>>> the >>>>>> line. >>>>> At the moment I don't see a real problem but maybe you are right. The >>>>> existing language specific fetcher use fixed paths for there >>>>> downloads. >>>>> >>>>> What do you propose? Should the fetcher skip the unpack of the >>>>> source or should we introduce a sub fetcher which uses the download >>>>> from an other SRC_URI entry. The two entries could be linked via the >>>>> name parameter. This approach could be combined with your suggestion >>>>> above. The new fetcher will unpack a lock file from an other >>>>> (default) download. >>>> >>>> I'm not really sure what is best right now. I'm trying to spell out the >>>> pros/cons of what is going on here in the hope it encourages others to >>>> give feedback as well. I agree there isn't a problem right now but I >>>> worry there soon will be by mixing two things together like this. The >>>> way we handle git protocol does cause us friction with other urls >>>> schemes already. >>> The dependency fetcher could simple skip the unpack. In this case the >>> user needs to use a variable to pass the same URL to the git and >>> dependency fetcher or we could provide a python function to generate >>> two SRC_URI with the same base URL. >>> >> I'm starting to wonder about a slightly different approach, basically >> an optional generated file alongside a recipe which contains "expanded" >> information which is effectively expensive to generate (in computation >> or resource like network access/process terms). We could teach bitbake >> a new phase of parsing where it generated them if missing. There are >> some other pieces of information which we know during the build process >> which it would be helpful to know earlier (e.g. which packages a recipe >> generates). I've wondered about this for a long time and the fetcher >> issues remind me of it again. It would be a big change with advantages >> and drawbacks. I think it would put more pressure on a layer maintainer >> as they'd have to computationally keep this up to date and it would >> complicate the patch workflow (who should send/regen the files?). I'm >> putting the idea there, I'm not saying I think we should do it, I'm >> just considering options. > > Do you mean like a cache or like the inc files? Is the file > totally auto generated or is manual editing acceptable? > >>> = Open questions >>>>>>> * Where should we download dependencies? >>>>>>> ** Should we use a folder per fetcher (ex. git and npm)? >>>>>>> ** Should we use the main folder (ex. crate)? >>>>>>> ** Should we translate the name into folder (ex. gomod)? >>>>>>> ** Should we integrate the name into the filename (ex. git)? >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> DL_DIR is meant to be a complete cache of the source so it would >>>>>> need >>>>>> to be downloaded there. Given it maps to the other fetchers, the >>>>>> existing cache mechanisms likely work for these just fine, the open >>>>>> question is on whether the lock/spec files should be cached after >>>>>> extraction. >>>>> >>>>> You misunderstand the question. Its about the downloadfilename >>>>> parameter. At the moment some fetcher use sub folder inside DL_DIR >>>>> and others use the main folder. It looks like every fetcher has its >>>>> own concept to handle file collision between different fetchers. The >>>>> git and npm fetcher use there own folder, the crate fetcher use its >>>>> own .crate file prefix, the gomod fetcher translate the URL into >>>>> multiple folders and the git fetcher translate the URL into a single >>>>> folder name. >>>> That makes more sense. The layout is partially legacy. The wget and >>>> local fetchers were first and hence go directly into DL_DIR. git/svn >>>> were separated out into their own directories with a plan to have a >>>> directory per fetcher. That didn't always work out with each newer >>>> fetcher. Each fetcher does have to handle a unique naming of its urls >>>> as only the specific fetcher can know all the urls parameters and which >>>> ones affect the output vs which ones don't. >>>> >>> This doesn't explain why the npm but not the gomod and crate fetcher >>> use a sub folder. All fetchers are based on the wget fetcher. >> That is probably "my fault". Put yourself in my position. You get a ton >> of different patches, all touching very varied aspects of the system. >> When reviewing them you have to try and remember the original design >> decisions, the future directions, the ways things broke in the past, a >> desire to try and have clean consistent APIs and so on. I have tried >> very hard to move things in a direction where things incrementally >> improve, without unnecessarily blocking new features. It means that >> things that merge often aren't perfect. We've tried a few different >> approaches with the newer programming languages and each approach has >> had pros and cons. The inconsistency is probably as I missed something >> in review. Sorry :(. > > Sorry, I don't want to criticism you. I see that you have a lot of > work. I want to understand the reasons for the actual design and > how it should look like. > >> I only have finite time. There are few people who seem to want to dive >> in and help with review of patches like these. I did ask some people >> yesterday, one told me they simply couldn't understand these patches. > > What can I do to improve the review? > >> I'm doing my best to ask the right questions, try and help others >> understand them, ensure my own concerns I can identify are resolved and >> I don't want to de-motivate you on this work either, I think the idea >> of improving this is great and I'd love to see it. Equally, I'm also >> the first person everyone will complain to if we change something and >> it causes problems for people. >> >> So the explanation is probably I just missed something in review at >> some point. The intent was to separate out the fetcher output going >> forward (unless it makes sense to be shared). >> >> FWIW there are multiple things which bother me about the existing >> fetcher storage layout but that is a different discussion. > > Okay. > >>>>>>> * Where should we unpack the dependencies? >>>>>>> ** Should we use a folder inside the parent folder (ex. >>>>>>> node_modules)? >>>>>>> ** Should we use a fixed folder inside unpackdir >>>>>>> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? >>>>>> >>>>>> This likely depends on the fetcher as the different mechanisms will >>>>>> have different expectations about how they should be extracted (as >>>>>> npm/etc. would). >>>>> >>>>> It depends on the fetcher but the fetcher could use the same >>>>> approach. At the moment every fetcher use a different approach. The >>>>> crate fetcher use a fixed value. The gomod fetcher uses a variable >>>>> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). >>>>> Furthermore the gomod fetcher override the common subdir parameter. >>>> I think we really need to standardise that if we can. Each new fetcher >>>> has claimed a certain approach is effectively required by the package >>>> manager. >>>> What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? >> I suspect we need a default via a variable and then the option to >> change the default via parameters. The default value should be a >> bitbake fetcher namespaced control variable. >> >> I'm wary of making a definitive statement saying X if that isn't going >> to make sense for some backend though. I simply don't have enough >> knowledge of them all, which is why you see me being reluctant to make >> definitive statements about design. > > Okay. > >>>>>>> * How should we treat archives for package manager caches? >>>>>>> ** Should we unpack the archives to support patching (ex. npm)? >>>>>>> ** Should we copy the packed archive to avoid unpacking and >>>>>>> packaging >>>>>>> (ex. gomod)? >>>>>>> >>>>>> If there are archives left after do_unpack, which task is going >>>>>> to unpack those? Are we expecting the build process in >>>>>> configure/compile to decompress them? Would those management >>>>>> tools accept things if they were extracted earlier? "unpack" >>>>>> would be the correct time to do it but I can see this getting >>>>>> into conflict with the package manager :/. >>>>> >>>>> Most package manager expect archives. In the npm case the archive is >>>>> unpack by the fetcher and packed by thenpm.bbclass to support >>>>> patching. The gomod fetcher doesn't unpack the downloaded archive and >>>>> the gomodgit fetcher create archives from git folders during unpack. >>>>> It would be possible to always keep the archives or always extract >>>>> the archives and recreate archives during build. It is a decision >>>>> between performance and patchability. >>>>> >>>>> At the moment it is complicated to work with the different fetcher >>>>> because every fetcher use a different concept and it is unclear what >>>>> is the desired approach. >>>> >>>> This is a challenge. Can we handle the unpacking with the package >>>> manager as a specific step or does it have to be combined with other >>>> steps like configure/compile? >>>> >>> It looks like this is possible: >>> cargo fetch >>> go mod vendor >>> npm install >>> >>> I suspect you're thinking about using the package manager in >>> do_unpack to unpack the archives and patch the unpacked archives >>> afterwards? >> I'm wondering about it, yes. I know we've had challenges with patching >> rust modules for example so this isn't a theoretical problem :/. > > It is an interesting idea because most package manager check the > integrity before unpack. Additionally it should simplify and speed > up the npm build because it removes the repack of the packages. > The problem is that we need an additional task to patch the > dependency specification file and to unpack the file. > >>>>>> I did wonder if patches 1-5 of this series could be merged >>>>>> separately too as they look reasonable regardless of the rest >>>>>> of the series? >>>>> >>>>> Sure. Should I resend the patches as separate series? >>>> Yes please, that would then let us remove the bits we can easily >>>> review/sort and focus on this other part. >>>> >>> Done. >> Thanks. >> >>> I will also resend the go h1 checksum commit separate because it >>> could be useful for the gomod fetcher. >> Yes, I was waiting for a new version of that one with the naming tweaked. > > Done. > >>> Should I also move the dn / dv parameter patches to a separate series >>> because it could be useful without the dependency fetcher. I could >>> add the parameters to the fetchers in a backward compatible way. >> I need to think more about that one... > > The motivation is to include the dependencies with name, version, > license and cpe into the SBOM. > > Regards > Stefan > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#16981): > https://lists.openembedded.org/g/bitbake-devel/message/16981 > Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 > Group Owner: bitbake-devel+owner@lists.openembedded.org > <mailto:bitbake-devel%2Bowner@lists.openembedded.org> > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub > [bruce.ashfield@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > > > > -- > - Thou shalt not follow the NULL pointer, for chaos and madness await > thee at its end > - "Use the force Harry" - Gandalf, Star Trek II >
On Tue, Jan 7, 2025 at 12:46 PM Stefan Herbrechtsmeier < stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > Am 07.01.2025 um 17:58 schrieb Bruce Ashfield: > > Hi all, > > I'm going to reply at this point in the thread to at least let everyone > know that I've been reading along, but honestly can't say if a few > questions that I have have been asked (and answered). > > The biggest use case that I have for the layers and recipes that I > maintain is about being able to both "easily" patch or update > vendor/dependencies of the main application build. > > It was unclear to me how I'd do that with these changes. > > For the copied/extracted dependencies, I can see that you'd just be able > to figure out where they were extracted (and I see the discussions on where > to extract/store some of the files) and then write a patch as you would > with any recipe. But would there be a way to patch the dependency "lock > file" ? I definitely don't see a way that I'd be able to tweak a source > hash and have an updated dependency pulled in .. but I could have easily > missed that. > > You have to provide your own "lock file" and place it beside the recipe. > The "lock file" is fetched via the file fetcher and is used to fetch the > dependencies. > > My requirement would be to individually bump the vendored dependencies. A copy and update of just a single entry in the lock file is possible, which is what I'd do. I'm just pointing out that finer grained control is required when quickly iterating or developing packages. I find a lot of mindshare goes towards just building and creating images, where there's also a need to support development workflows. > Those are the primary reasons why I'll stay with explicitly listed / > visible dependencies, unless something similar is available in a re-worked > / unified fetcher. > > It is impossible to patch the sources inside bitbake. Therefore the > dependency resolution must be moved inside a dependency fetch task and an > additional dependency patch task need to be added. > I'm just talking about being able to patch the vendor source once they are fetched and placed in their build location. Using normal patch files on the SRC_URI. When the location of the vendor source isn't obvious (because it is calculated or dynamically generated, this becomes more challenging). > I prefer the translation to git, so I have debug source for vendor > dependencies as well as a well travelled path to mirror and archive the > source > > Do you reference to the go-vendor implementation? Do you mean the vendor > directory? The gomod fetcher should support mirror and archive the sources. > It should be possible to create a vendor folder from the gomod archives. > Nope. I don't use that either. I have my own tools to locate the source of the dependencies, clone and put them into a vendor directory. The recipe simply clones and copies using git after that. > , but something like the update task of rust is at least explicit and > visible to me, so I can also use it without too many issues. > > Do you mean `bitbake -c update_crates recipe-name`? > Correct. The .inc file updating mechanisms. Bruce > > Regards > Stefan > > On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via > lists.openembedded.org <stefan.herbrechtsmeier-oss= > weidmueller.com@lists.openembedded.org> wrote: > >> Am 07.01.2025 um 12:01 schrieb Richard Purdie: >> >> On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote: >> >> Am 06.01.2025 um 16:30 schrieb Richard Purdie: >> >> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: >> >> I'm a little bit worried about how easily you could sneak a >> "floating" version into this and make the fetcher non- >> deterministic. Does (or could?) the code detect and error on >> that? >> >> >> We could raise an error if a checksum is missing in the >> dependency specification file or make the checksum mandatory for >> the dependency fetcher. Furthermore we could inspect the >> dependency URLs to detect a misuse of the file like a latest >> string for the version. >> >> >> I think adding such an error would be a requirement for merging >> this. >> >> >> Should the dependency fetcher (ex. npmsw) or the language specific >> fetcher (ex. npm) fail if the version points to a latest version? >> >> I think right now it has to error to try and reduce complexity. It is >> possible to support such things but you have to pass that version >> information back up the stack so that PV represents the different >> versions and that is a new level of complexity. >> >> I guess we should consider how you could theoretically support it as >> that might influence the design. With multiple git repos in SRC_URI for >> example, we end up adding multiple shortened shas to construct a PV so >> that if any change, PV changes. We also have to add an incrementing >> integer so that on opkg/dpkg/rpm operations work and versions sort. >> >> Okay. In this case we should add the checks to the dependency resolution. >> Thereby we prohibit dynamic versions for the dependencies and allows users >> to add support for it to the fetcher of the package manager. >> >> Put another way, could one of these SRC_URIs map to multiple >> different combinations of underlying component versions? >> >> If you mean the extracted SRC_URI for a single dependency from >> the dependency specification file (ex. npm-shrinkwrap.json) it >> could use special URLs to map to the latest version. But this is >> a missus of the dependency specification file and could be >> detected. The tools generate files with fixed versions always >> because a floating version with a fixed checksum make no senses. >> >> Even if it shouldn't happen, we need to detect and error for this >> case as it would become very problematic for us. >> >> >> Okay. Should we disallow a dynamic version for package manager >> downloads generally or do you see a reasonable use case? >> >> See above. >> >> >> I also thought it would make sense to generate recipes from the >> dependency specification files and therefore worked on the >> recipetool >> previous. But it looks like the tool isn't really used and I'm >> afraid >> nobody will use the recipe to fix dependencies. In most cases it >> is >> easy to update a dependency in the native tooling and only >> provide an >> updated dependency specification file. >> >> >> >> I think people have wanted a single simple command to translate the >> specification file into our recipe format to update the recipe. For >> various reasons people didn't seem to find the recipetool approach >> was working and created the task workflow based one. There are pros >> and cons to both and I don't have a strong preference. I would like >> to see something which makes it clear to users what is going on >> though and is simple to use. >> >> People do intuitively understand a .inc file with a list of urls in >> it. There are challenges in updating it. >> >> This other approach is not as intuitive as everything is abstracted >> out of sight. >> >> One thing for example which worries me is how are the license >> fields in the recipe going to be updated? >> >> Currently, if we teach the class, it can set LICENSE variables >> appropriately. With the new approach, you don't know the licenses >> until >> after unpack has run. Yes it can write it into the SPDX, but it >> won't >> work for something like the layer index or forms of analysis which >> don't build things. >> >> This does also extend to vulnerability analysis since we can't know >> what is in a given recipe without actually unpacking it. For >> example we >> could know crate XXX at version YYY has a CVE but we can't tell if >> a >> recipe uses that crate until after do_unpack, or at least not >> without >> expandurl. >> >> >> >> The main question is if the meta data should contain all information. >> If yes, we shouldn't allow any fetcher which requires an external >> source. This should include the gitsm fetcher and we should replace >> the single SRC_URI with multiple git SRC_URIs. >> >> If we had tooling that supported that well we could certainly consider >> it. It isn't straight forward as you can have a git repo containing >> submodules which then themselves contain submodules which can then >> contain more levels of submodules. There are therefore multiple levels >> of expansion possible. >> >> Okay. That makes the git submodule special in compare to the other >> dependency fetcher. >> >> We can go even further and forbid specific package manager fetchers >> and use plain https or git SRC_URIs. The python and go-vendor fetcher >> use this approach. >> >> Alternative we allow dependency fetchers and require that the meta >> data be always used via bitbake. In this case we could extend the >> meta data via the fetcher. >> >> In both cases it is possible to produce the same meta data. It >> doesn't matter if we use recipetool, devtool, bbclasses or fetcher. >> In any case we could resolve the SRC_URIs, checksums or srcrev from a >> file. The license information could be fetched from the package >> repositories without integrity checks or could be extracted from the >> individual package description file inside the downloaded sources >> (ex. npm). We should skip the license detection from license files >> for now because they generate manual work and could be discuses >> later. >> >> That was the reason the current task based approach doesn't use them, >> yet! I mention it just to highlight that it can be solved either way, >> the approach doesn't really change what we need to do. The bigger >> concern is having information available in the metadata which I think >> we need do to some level regardless of which approach we choose. >> >> >> The recipe approach has the advantage that it uses fixed licenses and >> that license changes could be (theoretical) reviewed during recipe >> update. >> >> FWIW that is an important use case and one of our general strengths. We >> can only do that as the license information is written in recipes and >> can be compared at update time. >> >> Does this apply to the license of the every individual dependency or only >> to the combined license? >> >> In contrast the fetcher approach reduces the update procedure to a >> simple file rename or SRCREV update (ex. gitsm). Furthermore, the >> user could simply place a file beside the recipe to update the >> dependencies. Could we realize the same via devtool integration and a >> patch? >> >> This is effectively what the task based approach is aiming for >> currently. I think the idea was that we could have devtool/recipetool >> integration around that update task, a task was just a convenient way >> to capture the code to do it and get things working without needing the >> tool to be finished. >> >> What is the task based approach? `bitbake -c update xyz`? >> >> We have different solutions between the languages (ex. npmsw vs >> crate vs pypi) and even inside the languages (ex. go-vendor vs >> gomod). I would like to unify the dependency support. It doesn't >> matter if we decide to use the bitbake fetcher or a bitbake / devtool >> command for the dependency and license resolution. >> >> I do very much prefer having one good way of doing things rather than >> multiple ways of doing things, each with a potential drawback. I'm >> therefore broadly in favour of doing that as long as we don't upset too >> much existing mindshare along the way. >> >> Okay >> >> >> I have a WIP to integrate the the dependencies into the spdx . >> This >> uses the expanded_urldata / implicit_urldata function to add the >> dependencies to the process list of archiver and spdx. >> https://github.com/weidmueller/poky/tree/feature/dependency- >> fetcher >> >> Regarding the license we could migrate the functionality from >> recipetool into a class and detect the licenses at build time. >> Theoretically the fetcher could fetch the license from the >> package >> manager repository but we have to trust the repository because we >> have no checksum to detect changes. Maybe we could integrate >> tools >> like Syft or ScanCode to detect the licenses at build time. At >> the >> moment the best solution is to make sure that the SBOM contains >> the >> name and version of the dependencies and let other tools handle >> the >> license via SBOM for now. Therefore I propose a common scheme to >> define the dependency name (dn) and version (dv) in the SRC_URI. >> >> >> >> We could compare what licenses the package manager is showing us >> with >> what is in the recipe and error if different. There would then need >> to >> be a command to update the licenses in the recipe (in much the way >> urls >> currently get updated). >> >> >> >> Either we request the licenses from the package manager during >> package update or during fetch. I wouldn't do both. Instead I would >> analyze the the license file during build and compare the detected >> license with the recipe or fetcher generated licenses. But the >> license detection from files is an other topic and I would like to >> postpone it for now. >> >> Agreed, I mention it just to highlight that supporting them does have >> impact on the design so any solution needs to ultimately be able to >> support it. >> >> >> You're using DL_DIR for that which I >> suspect isn't a great idea for tmp files. >> >> Take over from gitsm. >> >> Probably not the best fetcher and I'd say gitsm should be fixed. >> >> I don't see a reason why the gitsm fetcher shouldn't handled like the >> other dependency fetcher. We could update the handler after we have a >> decision for the dependency fetchers. >> >> In principle perhaps but as mentioned above, gitsm has its own challenges. >> >> Based on your feedback I have the feeling that a dependency fetcher isn't >> the correct solution. The fetcher makes it impossible to review changes >> during recipe update. Additionally it needs caching for the resolved fetch >> and license data. >> >> The alternative is to create an inc file with SRC_URIs, checksums, >> SRCREVs and LICENSE. Any recommendation how to integrate the dependency >> resolution and inc creation into oe-core? >> >> The url scheme is clever but also has a potential risk in that you >> can't really pass parameters to both the top level fetcher and the >> underlying one. I'm worried that is going to bite us further down >> the >> line. >> >> At the moment I don't see a real problem but maybe you are right. The >> existing language specific fetcher use fixed paths for there >> downloads. >> >> What do you propose? Should the fetcher skip the unpack of the >> source or should we introduce a sub fetcher which uses the download >> from an other SRC_URI entry. The two entries could be linked via the >> name parameter. This approach could be combined with your suggestion >> above. The new fetcher will unpack a lock file from an other >> (default) download. >> >> >> I'm not really sure what is best right now. I'm trying to spell out the >> pros/cons of what is going on here in the hope it encourages others to >> give feedback as well. I agree there isn't a problem right now but I >> worry there soon will be by mixing two things together like this. The >> way we handle git protocol does cause us friction with other urls >> schemes already. >> >> The dependency fetcher could simple skip the unpack. In this case the >> user needs to use a variable to pass the same URL to the git and >> dependency fetcher or we could provide a python function to generate >> two SRC_URI with the same base URL. >> >> >> I'm starting to wonder about a slightly different approach, basically >> an optional generated file alongside a recipe which contains "expanded" >> information which is effectively expensive to generate (in computation >> or resource like network access/process terms). We could teach bitbake >> a new phase of parsing where it generated them if missing. There are >> some other pieces of information which we know during the build process >> which it would be helpful to know earlier (e.g. which packages a recipe >> generates). I've wondered about this for a long time and the fetcher >> issues remind me of it again. It would be a big change with advantages >> and drawbacks. I think it would put more pressure on a layer maintainer >> as they'd have to computationally keep this up to date and it would >> complicate the patch workflow (who should send/regen the files?). I'm >> putting the idea there, I'm not saying I think we should do it, I'm >> just considering options. >> >> Do you mean like a cache or like the inc files? Is the file totally auto >> generated or is manual editing acceptable? >> >> = Open questions >> >> * Where should we download dependencies? >> ** Should we use a folder per fetcher (ex. git and npm)? >> ** Should we use the main folder (ex. crate)? >> ** Should we translate the name into folder (ex. gomod)? >> ** Should we integrate the name into the filename (ex. git)? >> >> >> >> >> >> DL_DIR is meant to be a complete cache of the source so it would >> need >> to be downloaded there. Given it maps to the other fetchers, the >> existing cache mechanisms likely work for these just fine, the open >> question is on whether the lock/spec files should be cached after >> extraction. >> >> >> You misunderstand the question. Its about the downloadfilename >> parameter. At the moment some fetcher use sub folder inside DL_DIR >> and others use the main folder. It looks like every fetcher has its >> own concept to handle file collision between different fetchers. The >> git and npm fetcher use there own folder, the crate fetcher use its >> own .crate file prefix, the gomod fetcher translate the URL into >> multiple folders and the git fetcher translate the URL into a single >> folder name. >> >> That makes more sense. The layout is partially legacy. The wget and >> local fetchers were first and hence go directly into DL_DIR. git/svn >> were separated out into their own directories with a plan to have a >> directory per fetcher. That didn't always work out with each newer >> fetcher. Each fetcher does have to handle a unique naming of its urls >> as only the specific fetcher can know all the urls parameters and which >> ones affect the output vs which ones don't. >> >> >> This doesn't explain why the npm but not the gomod and crate fetcher >> use a sub folder. All fetchers are based on the wget fetcher. >> >> That is probably "my fault". Put yourself in my position. You get a ton >> of different patches, all touching very varied aspects of the system. >> When reviewing them you have to try and remember the original design >> decisions, the future directions, the ways things broke in the past, a >> desire to try and have clean consistent APIs and so on. I have tried >> very hard to move things in a direction where things incrementally >> improve, without unnecessarily blocking new features. It means that >> things that merge often aren't perfect. We've tried a few different >> approaches with the newer programming languages and each approach has >> had pros and cons. The inconsistency is probably as I missed something >> in review. Sorry :(. >> >> Sorry, I don't want to criticism you. I see that you have a lot of work. >> I want to understand the reasons for the actual design and how it should >> look like. >> >> I only have finite time. There are few people who seem to want to dive >> in and help with review of patches like these. I did ask some people >> yesterday, one told me they simply couldn't understand these patches. >> >> What can I do to improve the review? >> >> I'm doing my best to ask the right questions, try and help others >> understand them, ensure my own concerns I can identify are resolved and >> I don't want to de-motivate you on this work either, I think the idea >> of improving this is great and I'd love to see it. Equally, I'm also >> the first person everyone will complain to if we change something and >> it causes problems for people. >> >> So the explanation is probably I just missed something in review at >> some point. The intent was to separate out the fetcher output going >> forward (unless it makes sense to be shared). >> >> FWIW there are multiple things which bother me about the existing >> fetcher storage layout but that is a different discussion. >> >> Okay. >> >> * Where should we unpack the dependencies? >> ** Should we use a folder inside the parent folder (ex. >> node_modules)? >> ** Should we use a fixed folder inside unpackdir >> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? >> >> >> This likely depends on the fetcher as the different mechanisms will >> have different expectations about how they should be extracted (as >> npm/etc. would). >> >> >> It depends on the fetcher but the fetcher could use the same >> approach. At the moment every fetcher use a different approach. The >> crate fetcher use a fixed value. The gomod fetcher uses a variable >> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). >> Furthermore the gomod fetcher override the common subdir parameter. >> >> I think we really need to standardise that if we can. Each new fetcher >> has claimed a certain approach is effectively required by the package >> manager. >> >> What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? >> >> I suspect we need a default via a variable and then the option to >> change the default via parameters. The default value should be a >> bitbake fetcher namespaced control variable. >> >> I'm wary of making a definitive statement saying X if that isn't going >> to make sense for some backend though. I simply don't have enough >> knowledge of them all, which is why you see me being reluctant to make >> definitive statements about design. >> >> Okay. >> >> * How should we treat archives for package manager caches? >> ** Should we unpack the archives to support patching (ex. npm)? >> ** Should we copy the packed archive to avoid unpacking and >> packaging >> (ex. gomod)? >> >> >> If there are archives left after do_unpack, which task is going >> to unpack those? Are we expecting the build process in >> configure/compile to decompress them? Would those management >> tools accept things if they were extracted earlier? "unpack" >> would be the correct time to do it but I can see this getting >> into conflict with the package manager :/. >> >> >> Most package manager expect archives. In the npm case the archive is >> unpack by the fetcher and packed by thenpm.bbclass to support >> patching. The gomod fetcher doesn't unpack the downloaded archive and >> the gomodgit fetcher create archives from git folders during unpack. >> It would be possible to always keep the archives or always extract >> the archives and recreate archives during build. It is a decision >> between performance and patchability. >> >> At the moment it is complicated to work with the different fetcher >> because every fetcher use a different concept and it is unclear what >> is the desired approach. >> >> >> This is a challenge. Can we handle the unpacking with the package >> manager as a specific step or does it have to be combined with other >> steps like configure/compile? >> >> >> It looks like this is possible: >> cargo fetch >> go mod vendor >> npm install >> >> I suspect you're thinking about using the package manager in >> do_unpack to unpack the archives and patch the unpacked archives >> afterwards? >> >> I'm wondering about it, yes. I know we've had challenges with patching >> rust modules for example so this isn't a theoretical problem :/. >> >> It is an interesting idea because most package manager check the >> integrity before unpack. Additionally it should simplify and speed up the >> npm build because it removes the repack of the packages. The problem is >> that we need an additional task to patch the dependency specification file >> and to unpack the file. >> >> I did wonder if patches 1-5 of this series could be merged >> separately too as they look reasonable regardless of the rest >> of the series? >> >> >> Sure. Should I resend the patches as separate series? >> >> Yes please, that would then let us remove the bits we can easily >> review/sort and focus on this other part. >> >> >> Done. >> >> Thanks. >> >> >> I will also resend the go h1 checksum commit separate because it >> could be useful for the gomod fetcher. >> >> Yes, I was waiting for a new version of that one with the naming tweaked. >> >> Done. >> >> Should I also move the dn / dv parameter patches to a separate series >> because it could be useful without the dependency fetcher. I could >> add the parameters to the fetchers in a backward compatible way. >> >> I need to think more about that one... >> >> The motivation is to include the dependencies with name, version, license >> and cpe into the SBOM. >> >> Regards >> Stefan >> >> >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> View/Reply Online (#16981): >> https://lists.openembedded.org/g/bitbake-devel/message/16981 >> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 >> Group Owner: bitbake-devel+owner@lists.openembedded.org >> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ >> bruce.ashfield@gmail.com] >> -=-=-=-=-=-=-=-=-=-=-=- >> >> > > -- > - Thou shalt not follow the NULL pointer, for chaos and madness await thee > at its end > - "Use the force Harry" - Gandalf, Star Trek II > >
On Mon, 6 Jan 2025 at 15:43, Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > https://github.com/yoctoproject/poky/compare/master...weidmueller:poky:feature/dependency-fetcher > > I have migrate the crate recipes to the new fetcher and improve the spdx > 2.2 class to include the name and version of the crate dependencies. > > You have to inherit the create-spdx-2.2 class and build the librsvg > recipe to test the new fetcher. Thanks, I checked out the branch and run bitbake -c patch librsvg with the default build/conf/ config. It works and the recipe is short and neat. I'm not sure what create-spdx-2.2 is needed for? I didn't use it, and there were no errors. Like others, I'm torn on two things: - visibility - control When a recipe explicitly lists what goes into a build, this can be easily seen, audited, and adjusted directly in the recipe. With the new fetchers, you need to actually run a build to produce that list, and it isn't clear where the list is placed, in which format, and what to do if something needs to deviate from versions prescribed by upstream. This is not a theoretical concern, I'm thinking specifically of log4j-like vulnerabilities, and how one would check that their product doesn't contain them: https://lwn.net/Articles/878570/ Alex
On Thu, 9 Jan 2025 at 11:40, Alexander Kanavin via lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org> wrote: > This is not a theoretical concern, I'm thinking specifically of > log4j-like vulnerabilities, and how one would check that their product > doesn't contain them: > https://lwn.net/Articles/878570/ I meant to say 'yocto layer' here, not product. And ideally it should be possible with 'static analysis', e.g. just by looking at the layer content. Alex
Am 08.01.2025 um 16:43 schrieb Bruce Ashfield: > > On Tue, Jan 7, 2025 at 12:46 PM Stefan Herbrechtsmeier > <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > > Am 07.01.2025 um 17:58 schrieb Bruce Ashfield: >> Hi all, >> >> I'm going to reply at this point in the thread to at least let >> everyone know that I've been reading along, but honestly can't >> say if a few questions that I have have been asked (and answered). >> >> The biggest use case that I have for the layers and recipes that >> I maintain is about being able to both "easily" patch or update >> vendor/dependencies of the main application build. >> >> It was unclear to me how I'd do that with these changes. >> >> For the copied/extracted dependencies, I can see that you'd just >> be able to figure out where they were extracted (and I see the >> discussions on where to extract/store some of the files) and then >> write a patch as you would with any recipe. But would there be a >> way to patch the dependency "lock file" ? I definitely don't see >> a way that I'd be able to tweak a source hash and have an updated >> dependency pulled in .. but I could have easily missed that. > > You have to provide your own "lock file" and place it beside the > recipe. The "lock file" is fetched via the file fetcher and is > used to fetch the dependencies. > > My requirement would be to individually bump the vendored > dependencies. A copy and update of just a single entry in the lock > file is possible, which is what I'd do. I'm just pointing out that > finer grained control is required when quickly iterating or developing > packages. > > I find a lot of mindshare goes towards just building and creating > images, where there's also a need to support development workflows. That's the reason I use the package manager specific lock file as base. Every package manager has tools and workflows to manage, update or override the dependencies. This tools not only update the source URL and checksum but also handle the influence to other dependencies, sub dependencies or version selection. It is much easy to update the lock file with the existing tooling and pass the update lock file (or patch) to bitbake. >> Those are the primary reasons why I'll stay with explicitly >> listed / visible dependencies, unless something similar is >> available in a re-worked / unified fetcher. > > It is impossible to patch the sources inside bitbake. Therefore > the dependency resolution must be moved inside a dependency fetch > task and an additional dependency patch task need to be added. > > I'm just talking about being able to patch the vendor source once they > are fetched and placed in their build location. Using normal patch > files on the SRC_URI. When the location of the vendor source isn't > obvious (because it is calculated or dynamically generated, this > becomes more challenging). This should be possible if we use vendoring and create the vendor folder before do_patch. do_fetch do_unpack do_vendor do_patch We could use a do_update task to parse the lock file and update a inc file. Or we could add additional tasks to resolve additional fetcher URLs from the spec / lock file: do_vendor_spec_fetch do_vendor_spec_unpack do_vendor_spec_patch do_vendor_fetch do_fetch do_unpack do_vendor do_patch This sequence ensure that the do_fetch still download all dependencies. Only the do_vendor_fetch need internet access. > >> I prefer the translation to git, so I have debug source for >> vendor dependencies as well as a well travelled path to mirror >> and archive the source > > Do you reference to the go-vendor implementation? Do you mean the > vendor directory? The gomod fetcher should support mirror and > archive the sources. It should be possible to create a vendor > folder from the gomod archives. > > > Nope. I don't use that either. I have my own tools to locate the > source of the dependencies, clone and put them into a vendor > directory. The recipe simply clones and copies using git after that. > >> , but something like the update task of rust is at least explicit >> and visible to me, so I can also use it without too many issues. > > Do you mean `bitbake -c update_crates recipe-name`? > > > Correct. The .inc file updating mechanisms. Okay. Regards Stefan >> On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via >> lists.openembedded.org <http://lists.openembedded.org> >> <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> >> wrote: >> >> Am 07.01.2025 um 12:01 schrieb Richard Purdie: >>> On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote: >>>> Am 06.01.2025 um 16:30 schrieb Richard Purdie: >>>>> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote: >>>>>>> I'm a little bit worried about how easily you could sneak a >>>>>>> "floating" version into this and make the fetcher non- >>>>>>> deterministic. Does (or could?) the code detect and error on >>>>>>> that? >>>>>>> >>>>>> We could raise an error if a checksum is missing in the >>>>>> dependency specification file or make the checksum mandatory for >>>>>> the dependency fetcher. Furthermore we could inspect the >>>>>> dependency URLs to detect a misuse of the file like a latest >>>>>> string for the version. >>>>>> >>>>> I think adding such an error would be a requirement for merging >>>>> this. >>>>> >>>> Should the dependency fetcher (ex. npmsw) or the language specific >>>> fetcher (ex. npm) fail if the version points to a latest version? >>> I think right now it has to error to try and reduce complexity. It is >>> possible to support such things but you have to pass that version >>> information back up the stack so that PV represents the different >>> versions and that is a new level of complexity. >>> >>> I guess we should consider how you could theoretically support it as >>> that might influence the design. With multiple git repos in SRC_URI for >>> example, we end up adding multiple shortened shas to construct a PV so >>> that if any change, PV changes. We also have to add an incrementing >>> integer so that on opkg/dpkg/rpm operations work and versions sort. >> >> Okay. In this case we should add the checks to the dependency >> resolution. Thereby we prohibit dynamic versions for the >> dependencies and allows users to add support for it to the >> fetcher of the package manager. >> >>>>>>> Put another way, could one of these SRC_URIs map to multiple >>>>>>> different combinations of underlying component versions? >>>>>> If you mean the extracted SRC_URI for a single dependency from >>>>>> the dependency specification file (ex. npm-shrinkwrap.json) it >>>>>> could use special URLs to map to the latest version. But this is >>>>>> a missus of the dependency specification file and could be >>>>>> detected. The tools generate files with fixed versions always >>>>>> because a floating version with a fixed checksum make no senses. >>>>> Even if it shouldn't happen, we need to detect and error for this >>>>> case as it would become very problematic for us. >>>>> >>>> Okay. Should we disallow a dynamic version for package manager >>>> downloads generally or do you see a reasonable use case? >>> See above. >>> >>>>>> I also thought it would make sense to generate recipes from the >>>>>> dependency specification files and therefore worked on the >>>>>> recipetool >>>>>> previous. But it looks like the tool isn't really used and I'm >>>>>> afraid >>>>>> nobody will use the recipe to fix dependencies. In most cases it >>>>>> is >>>>>> easy to update a dependency in the native tooling and only >>>>>> provide an >>>>>> updated dependency specification file. >>>>>> >>>>> >>>>> I think people have wanted a single simple command to translate the >>>>> specification file into our recipe format to update the recipe. For >>>>> various reasons people didn't seem to find the recipetool approach >>>>> was working and created the task workflow based one. There are pros >>>>> and cons to both and I don't have a strong preference. I would like >>>>> to see something which makes it clear to users what is going on >>>>> though and is simple to use. >>>>> >>>>> People do intuitively understand a .inc file with a list of urls in >>>>> it. There are challenges in updating it. >>>>> >>>>> This other approach is not as intuitive as everything is abstracted >>>>> out of sight. >>>>> >>>>> One thing for example which worries me is how are the license >>>>> fields in the recipe going to be updated? >>>>> >>>>> Currently, if we teach the class, it can set LICENSE variables >>>>> appropriately. With the new approach, you don't know the licenses >>>>> until >>>>> after unpack has run. Yes it can write it into the SPDX, but it >>>>> won't >>>>> work for something like the layer index or forms of analysis which >>>>> don't build things. >>>>> >>>>> This does also extend to vulnerability analysis since we can't know >>>>> what is in a given recipe without actually unpacking it. For >>>>> example we >>>>> could know crate XXX at version YYY has a CVE but we can't tell if >>>>> a >>>>> recipe uses that crate until after do_unpack, or at least not >>>>> without >>>>> expandurl. >>>>> >>>> >>>> The main question is if the meta data should contain all information. >>>> If yes, we shouldn't allow any fetcher which requires an external >>>> source. This should include the gitsm fetcher and we should replace >>>> the single SRC_URI with multiple git SRC_URIs. >>> If we had tooling that supported that well we could certainly consider >>> it. It isn't straight forward as you can have a git repo containing >>> submodules which then themselves contain submodules which can then >>> contain more levels of submodules. There are therefore multiple levels >>> of expansion possible. >> >> Okay. That makes the git submodule special in compare to the >> other dependency fetcher. >> >>>> We can go even further and forbid specific package manager fetchers >>>> and use plain https or git SRC_URIs. The python and go-vendor fetcher >>>> use this approach. >>>> >>>> Alternative we allow dependency fetchers and require that the meta >>>> data be always used via bitbake. In this case we could extend the >>>> meta data via the fetcher. >>>> >>>> In both cases it is possible to produce the same meta data. It >>>> doesn't matter if we use recipetool, devtool, bbclasses or fetcher. >>>> In any case we could resolve the SRC_URIs, checksums or srcrev from a >>>> file. The license information could be fetched from the package >>>> repositories without integrity checks or could be extracted from the >>>> individual package description file inside the downloaded sources >>>> (ex. npm). We should skip the license detection from license files >>>> for now because they generate manual work and could be discuses >>>> later. >>> That was the reason the current task based approach doesn't use them, >>> yet! I mention it just to highlight that it can be solved either way, >>> the approach doesn't really change what we need to do. The bigger >>> concern is having information available in the metadata which I think >>> we need do to some level regardless of which approach we choose. >>> >>>> The recipe approach has the advantage that it uses fixed licenses and >>>> that license changes could be (theoretical) reviewed during recipe >>>> update. >>> FWIW that is an important use case and one of our general strengths. We >>> can only do that as the license information is written in recipes and >>> can be compared at update time. >> >> Does this apply to the license of the every individual >> dependency or only to the combined license? >> >>>> In contrast the fetcher approach reduces the update procedure to a >>>> simple file rename or SRCREV update (ex. gitsm). Furthermore, the >>>> user could simply place a file beside the recipe to update the >>>> dependencies. Could we realize the same via devtool integration and a >>>> patch? >>> This is effectively what the task based approach is aiming for >>> currently. I think the idea was that we could have devtool/recipetool >>> integration around that update task, a task was just a convenient way >>> to capture the code to do it and get things working without needing the >>> tool to be finished. >> What is the task based approach? `bitbake -c update xyz`? >> >>>> We have different solutions between the languages (ex. npmsw vs >>>> crate vs pypi) and even inside the languages (ex. go-vendor vs >>>> gomod). I would like to unify the dependency support. It doesn't >>>> matter if we decide to use the bitbake fetcher or a bitbake / devtool >>>> command for the dependency and license resolution. >>> I do very much prefer having one good way of doing things rather than >>> multiple ways of doing things, each with a potential drawback. I'm >>> therefore broadly in favour of doing that as long as we don't upset too >>> much existing mindshare along the way. >> >> Okay >> >>>>>> >>>>>> I have a WIP to integrate the the dependencies into the spdx . >>>>>> This >>>>>> uses the expanded_urldata / implicit_urldata function to add the >>>>>> dependencies to the process list of archiver and spdx. >>>>>> >>>>>> https://github.com/weidmueller/poky/tree/feature/dependency- >>>>>> fetcher >>>>>> >>>>>> Regarding the license we could migrate the functionality from >>>>>> recipetool into a class and detect the licenses at build time. >>>>>> Theoretically the fetcher could fetch the license from the >>>>>> package >>>>>> manager repository but we have to trust the repository because we >>>>>> have no checksum to detect changes. Maybe we could integrate >>>>>> tools >>>>>> like Syft or ScanCode to detect the licenses at build time. At >>>>>> the >>>>>> moment the best solution is to make sure that the SBOM contains >>>>>> the >>>>>> name and version of the dependencies and let other tools handle >>>>>> the >>>>>> license via SBOM for now. Therefore I propose a common scheme to >>>>>> define the dependency name (dn) and version (dv) in the SRC_URI. >>>>>> >>>>> >>>>> We could compare what licenses the package manager is showing us >>>>> with >>>>> what is in the recipe and error if different. There would then need >>>>> to >>>>> be a command to update the licenses in the recipe (in much the way >>>>> urls >>>>> currently get updated). >>>>> >>>> >>>> Either we request the licenses from the package manager during >>>> package update or during fetch. I wouldn't do both. Instead I would >>>> analyze the the license file during build and compare the detected >>>> license with the recipe or fetcher generated licenses. But the >>>> license detection from files is an other topic and I would like to >>>> postpone it for now. >>> Agreed, I mention it just to highlight that supporting them does have >>> impact on the design so any solution needs to ultimately be able to >>> support it. >>> >>>>>>> You're using DL_DIR for that which I >>>>>>> suspect isn't a great idea for tmp files. >>>>>> Take over from gitsm. >>>>> Probably not the best fetcher and I'd say gitsm should be fixed. >>>> I don't see a reason why the gitsm fetcher shouldn't handled like the >>>> other dependency fetcher. We could update the handler after we have a >>>> decision for the dependency fetchers. >>> In principle perhaps but as mentioned above, gitsm has its own challenges. >> >> Based on your feedback I have the feeling that a dependency >> fetcher isn't the correct solution. The fetcher makes it >> impossible to review changes during recipe >> update. Additionally it needs caching for the resolved fetch >> and license data. >> >> The alternative is to create an inc file with SRC_URIs, >> checksums, SRCREVs and LICENSE. Any recommendation how to >> integrate the dependency resolution and inc creation into >> oe-core? >> >>>>>>> The url scheme is clever but also has a potential risk in that you >>>>>>> can't really pass parameters to both the top level fetcher and the >>>>>>> underlying one. I'm worried that is going to bite us further down >>>>>>> the >>>>>>> line. >>>>>> At the moment I don't see a real problem but maybe you are right. The >>>>>> existing language specific fetcher use fixed paths for there >>>>>> downloads. >>>>>> >>>>>> What do you propose? Should the fetcher skip the unpack of the >>>>>> source or should we introduce a sub fetcher which uses the download >>>>>> from an other SRC_URI entry. The two entries could be linked via the >>>>>> name parameter. This approach could be combined with your suggestion >>>>>> above. The new fetcher will unpack a lock file from an other >>>>>> (default) download. >>>>> >>>>> I'm not really sure what is best right now. I'm trying to spell out the >>>>> pros/cons of what is going on here in the hope it encourages others to >>>>> give feedback as well. I agree there isn't a problem right now but I >>>>> worry there soon will be by mixing two things together like this. The >>>>> way we handle git protocol does cause us friction with other urls >>>>> schemes already. >>>> The dependency fetcher could simple skip the unpack. In this case the >>>> user needs to use a variable to pass the same URL to the git and >>>> dependency fetcher or we could provide a python function to generate >>>> two SRC_URI with the same base URL. >>>> >>> I'm starting to wonder about a slightly different approach, basically >>> an optional generated file alongside a recipe which contains "expanded" >>> information which is effectively expensive to generate (in computation >>> or resource like network access/process terms). We could teach bitbake >>> a new phase of parsing where it generated them if missing. There are >>> some other pieces of information which we know during the build process >>> which it would be helpful to know earlier (e.g. which packages a recipe >>> generates). I've wondered about this for a long time and the fetcher >>> issues remind me of it again. It would be a big change with advantages >>> and drawbacks. I think it would put more pressure on a layer maintainer >>> as they'd have to computationally keep this up to date and it would >>> complicate the patch workflow (who should send/regen the files?). I'm >>> putting the idea there, I'm not saying I think we should do it, I'm >>> just considering options. >> >> Do you mean like a cache or like the inc files? Is the file >> totally auto generated or is manual editing acceptable? >> >>>> = Open questions >>>>>>>> * Where should we download dependencies? >>>>>>>> ** Should we use a folder per fetcher (ex. git and npm)? >>>>>>>> ** Should we use the main folder (ex. crate)? >>>>>>>> ** Should we translate the name into folder (ex. gomod)? >>>>>>>> ** Should we integrate the name into the filename (ex. git)? >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> DL_DIR is meant to be a complete cache of the source so it would >>>>>>> need >>>>>>> to be downloaded there. Given it maps to the other fetchers, the >>>>>>> existing cache mechanisms likely work for these just fine, the open >>>>>>> question is on whether the lock/spec files should be cached after >>>>>>> extraction. >>>>>> >>>>>> You misunderstand the question. Its about the downloadfilename >>>>>> parameter. At the moment some fetcher use sub folder inside DL_DIR >>>>>> and others use the main folder. It looks like every fetcher has its >>>>>> own concept to handle file collision between different fetchers. The >>>>>> git and npm fetcher use there own folder, the crate fetcher use its >>>>>> own .crate file prefix, the gomod fetcher translate the URL into >>>>>> multiple folders and the git fetcher translate the URL into a single >>>>>> folder name. >>>>> That makes more sense. The layout is partially legacy. The wget and >>>>> local fetchers were first and hence go directly into DL_DIR. git/svn >>>>> were separated out into their own directories with a plan to have a >>>>> directory per fetcher. That didn't always work out with each newer >>>>> fetcher. Each fetcher does have to handle a unique naming of its urls >>>>> as only the specific fetcher can know all the urls parameters and which >>>>> ones affect the output vs which ones don't. >>>>> >>>> This doesn't explain why the npm but not the gomod and crate fetcher >>>> use a sub folder. All fetchers are based on the wget fetcher. >>> That is probably "my fault". Put yourself in my position. You get a ton >>> of different patches, all touching very varied aspects of the system. >>> When reviewing them you have to try and remember the original design >>> decisions, the future directions, the ways things broke in the past, a >>> desire to try and have clean consistent APIs and so on. I have tried >>> very hard to move things in a direction where things incrementally >>> improve, without unnecessarily blocking new features. It means that >>> things that merge often aren't perfect. We've tried a few different >>> approaches with the newer programming languages and each approach has >>> had pros and cons. The inconsistency is probably as I missed something >>> in review. Sorry :(. >> >> Sorry, I don't want to criticism you. I see that you have a >> lot of work. I want to understand the reasons for the actual >> design and how it should look like. >> >>> I only have finite time. There are few people who seem to want to dive >>> in and help with review of patches like these. I did ask some people >>> yesterday, one told me they simply couldn't understand these patches. >> >> What can I do to improve the review? >> >>> I'm doing my best to ask the right questions, try and help others >>> understand them, ensure my own concerns I can identify are resolved and >>> I don't want to de-motivate you on this work either, I think the idea >>> of improving this is great and I'd love to see it. Equally, I'm also >>> the first person everyone will complain to if we change something and >>> it causes problems for people. >>> >>> So the explanation is probably I just missed something in review at >>> some point. The intent was to separate out the fetcher output going >>> forward (unless it makes sense to be shared). >>> >>> FWIW there are multiple things which bother me about the existing >>> fetcher storage layout but that is a different discussion. >> >> Okay. >> >>>>>>>> * Where should we unpack the dependencies? >>>>>>>> ** Should we use a folder inside the parent folder (ex. >>>>>>>> node_modules)? >>>>>>>> ** Should we use a fixed folder inside unpackdir >>>>>>>> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? >>>>>>> >>>>>>> This likely depends on the fetcher as the different mechanisms will >>>>>>> have different expectations about how they should be extracted (as >>>>>>> npm/etc. would). >>>>>> >>>>>> It depends on the fetcher but the fetcher could use the same >>>>>> approach. At the moment every fetcher use a different approach. The >>>>>> crate fetcher use a fixed value. The gomod fetcher uses a variable >>>>>> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix). >>>>>> Furthermore the gomod fetcher override the common subdir parameter. >>>>> I think we really need to standardise that if we can. Each new fetcher >>>>> has claimed a certain approach is effectively required by the package >>>>> manager. >>>>> What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI? >>> I suspect we need a default via a variable and then the option to >>> change the default via parameters. The default value should be a >>> bitbake fetcher namespaced control variable. >>> >>> I'm wary of making a definitive statement saying X if that isn't going >>> to make sense for some backend though. I simply don't have enough >>> knowledge of them all, which is why you see me being reluctant to make >>> definitive statements about design. >> >> Okay. >> >>>>>>>> * How should we treat archives for package manager caches? >>>>>>>> ** Should we unpack the archives to support patching (ex. npm)? >>>>>>>> ** Should we copy the packed archive to avoid unpacking and >>>>>>>> packaging >>>>>>>> (ex. gomod)? >>>>>>>> >>>>>>> If there are archives left after do_unpack, which task is going >>>>>>> to unpack those? Are we expecting the build process in >>>>>>> configure/compile to decompress them? Would those management >>>>>>> tools accept things if they were extracted earlier? "unpack" >>>>>>> would be the correct time to do it but I can see this getting >>>>>>> into conflict with the package manager :/. >>>>>> >>>>>> Most package manager expect archives. In the npm case the archive is >>>>>> unpack by the fetcher and packed by thenpm.bbclass to support >>>>>> patching. The gomod fetcher doesn't unpack the downloaded archive and >>>>>> the gomodgit fetcher create archives from git folders during unpack. >>>>>> It would be possible to always keep the archives or always extract >>>>>> the archives and recreate archives during build. It is a decision >>>>>> between performance and patchability. >>>>>> >>>>>> At the moment it is complicated to work with the different fetcher >>>>>> because every fetcher use a different concept and it is unclear what >>>>>> is the desired approach. >>>>> >>>>> This is a challenge. Can we handle the unpacking with the package >>>>> manager as a specific step or does it have to be combined with other >>>>> steps like configure/compile? >>>>> >>>> It looks like this is possible: >>>> cargo fetch >>>> go mod vendor >>>> npm install >>>> >>>> I suspect you're thinking about using the package manager in >>>> do_unpack to unpack the archives and patch the unpacked archives >>>> afterwards? >>> I'm wondering about it, yes. I know we've had challenges with patching >>> rust modules for example so this isn't a theoretical problem :/. >> >> It is an interesting idea because most package manager check >> the integrity before unpack. Additionally it should simplify >> and speed up the npm build because it removes the repack of >> the packages. The problem is that we need an additional task >> to patch the dependency specification file and to unpack the >> file. >> >>>>>>> I did wonder if patches 1-5 of this series could be merged >>>>>>> separately too as they look reasonable regardless of the rest >>>>>>> of the series? >>>>>> >>>>>> Sure. Should I resend the patches as separate series? >>>>> Yes please, that would then let us remove the bits we can easily >>>>> review/sort and focus on this other part. >>>>> >>>> Done. >>> Thanks. >>> >>>> I will also resend the go h1 checksum commit separate because it >>>> could be useful for the gomod fetcher. >>> Yes, I was waiting for a new version of that one with the naming tweaked. >> >> Done. >> >>>> Should I also move the dn / dv parameter patches to a separate series >>>> because it could be useful without the dependency fetcher. I could >>>> add the parameters to the fetchers in a backward compatible way. >>> I need to think more about that one... >> >> The motivation is to include the dependencies with name, >> version, license and cpe into the SBOM. >> >> Regards >> Stefan >> >> >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> View/Reply Online (#16981): >> https://lists.openembedded.org/g/bitbake-devel/message/16981 >> Mute This Topic: >> https://lists.openembedded.org/mt/110212697/1050810 >> Group Owner: bitbake-devel+owner@lists.openembedded.org >> <mailto:bitbake-devel%2Bowner@lists.openembedded.org> >> Unsubscribe: >> https://lists.openembedded.org/g/bitbake-devel/unsub >> [bruce.ashfield@gmail.com] >> -=-=-=-=-=-=-=-=-=-=-=- >> >> >> >> -- >> - Thou shalt not follow the NULL pointer, for chaos and madness >> await thee at its end >> - "Use the force Harry" - Gandalf, Star Trek II >> > > > -- > - Thou shalt not follow the NULL pointer, for chaos and madness await > thee at its end > - "Use the force Harry" - Gandalf, Star Trek II >
Hi, thanks for looking into this. With this series applied I've noticed some recipes now showing warnings like: WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for the url to npm fetcher: https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for the url to npm fetcher: https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for the url to npm fetcher: https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz and the same is shown before from do_fetch. Not sure what's special about these, but I believe it used to work with previous npmsw implementation. Any hint what to check? On Fri, Dec 20, 2024 at 12:26 PM Stefan Herbrechtsmeier via lists.openembedded.org <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> wrote: > > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > > The patch series improves the fetcher support for tightly coupled > package manager (npm, go and cargo). It adds support for embedded > dependency fetcher via a common dependency mixin. The patch series > reworks the npm-shrinkwrap.json (package-lock.json) support and adds a > fetcher for go.sum and cargo.lock files. The dependency mixin contains > two stages. The first stage locates a local specification file or > fetches an archive or git repository with a specification file. The > second stage resolves the dependency URLs from the specification file > and fetches the dependencies. > > SRC_URI = "<type>://npm-shrinkwrap.json" > SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" > SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" > SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" > > Additionally, the patch series reworks the npm fetcher to work without a > npm binary and external package repository. It adds support for a common > dependency name and version schema to integrate the dependencies into > the SBOM. > > = Background > Bitbake has diverse concepts and drawbacks for different tightly coupled > package manager. The Python support uses a recipe per dependency and > generates common fetcher URLs via a python function. The other languages > embed the dependencies inside the recipe. The Node.js support offers a > npmsw fetcher which uses a lock file beside the recipe to generates > multiple common fetcher URLs on the fly and thereby hides the real > download sources. This leads to a single source in the SBOM for example. > The Go support contains two parallel implementations. A vendor-based > solution with a common fetcher and a go-mod-based solution with a gomod > fetcher. The vendor-based solution includes the individual dependencies > into the SRC_URI of the recipe and uses a python function to generate > common fetcher URLs which additional information for the vendor task.The > gomod fetcher uses a proprietary gomod URL. It translates the URL into a > common URL and prepares meta data during unpack. The Rust support > includes the individual dependencies in the SRC_URI of the recipe and > uses proprietary crate URLs. The crate fetcher translates a proprietary > URL into a common fetcher URL and prepares meta data during unpack. The > recipetool does not support the crate and the gomod fetcher. This leads > to missing licenses of the dependencies in the recipe for example > librsvg. > > The steps needed to fetch dependencies for Node.js, Go and Rust are > similar: > 1. Extract the dependencies from a specification file (name, version, > checksum and URL) > 2. Generate proprietary fetcher URIs > a. npm://registry.npmjs.org/;package=glob;version= 10.3.15 > b. gomod://golang.org/x/net;version=v0.9.0 > gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net > c. crate://crates.io/glob/0.3.1 > 3. Generate wget or git fetcher URIs > a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=… > b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=… > git://go.googlesource.com/net;protocol=https; subdir=… > c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=… > 4. Unpack > 5. Create meta files > a. Update lockfile and create tar.gz archives > b. Create go.mod file > Create info, go.mod file and zip archives > c. Create .cargo-checksum.json files > > It looks like the recipetool is not widely used and therefore this patch > series integrates the dependency resolving into the fetcher. After an > agreement on a concept the fetcher could be extended. The fetcher could > download the license information per package and a new build task could > run the license cruncher from the recipetool. > > = Open questions > > * Where should we download dependencies? > ** Should we use a folder per fetcher (ex. git and npm)? > ** Should we use the main folder (ex. crate)? > ** Should we translate the name into folder (ex. gomod)? > ** Should we integrate the name into the filename (ex. git)? > * Where should we unpack the dependencies? > ** Should we use a folder inside the parent folder (ex. node_modules)? > ** Should we use a fixed folder inside unpackdir > (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? > * How should we treat archives for package manager caches? > ** Should we unpack the archives to support patching (ex. npm)? > ** Should we copy the packed archive to avoid unpacking and packaging > (ex. gomod)? > > This patch series depends on patch series > 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com > ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior"). > > > Stefan Herbrechtsmeier (21): > tests: fetch: update npmsw tests to new lockfile format > fetch2: npmsw: remove old lockfile format support > tests: fetch: replace [url] with urls for npm > fetch2: do not prefix embedded checksums > fetch2: read checksum from SRC_URI flag for npm > fetch2: introduce common package manager metadata > fetch2: add unpack support for npm archives > utils: add Go mod h1 checksum support > fetch2: add destdir to FetchData > fetch: npm: rework > tests: fetch: adapt style in npm(sw) class > tests: fetch: move npmsw test cases into npmsw test class > tests: fetch: adapt npm test cases > fetch: add dependency mixin > tests: fetch: add test cases for dependency fetcher > fetch: npmsw: migrate to dependency mixin > tests: fetch: adapt npmsw test cases > fetch: add gosum fetcher > tests: fetch: add test cases for gosum > fetch: add cargolock fetcher > tests: fetch: add test cases for cargolock > > lib/bb/fetch2/__init__.py | 35 +- > lib/bb/fetch2/cargolock.py | 73 +++ > lib/bb/fetch2/dependency.py | 167 +++++++ > lib/bb/fetch2/gomod.py | 5 +- > lib/bb/fetch2/gosum.py | 51 +++ > lib/bb/fetch2/npm.py | 244 +++------- > lib/bb/fetch2/npmsw.py | 347 ++++---------- > lib/bb/tests/fetch.py | 880 +++++++++++++++++------------------- > lib/bb/utils.py | 25 + > 9 files changed, 916 insertions(+), 911 deletions(-) > create mode 100644 lib/bb/fetch2/cargolock.py > create mode 100644 lib/bb/fetch2/dependency.py > create mode 100644 lib/bb/fetch2/gosum.py > > -- > 2.39.5 > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#16920): https://lists.openembedded.org/g/bitbake-devel/message/16920 > Mute This Topic: https://lists.openembedded.org/mt/110212697/3617156 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [martin.jansa@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- >
Am 09.01.2025 um 11:40 schrieb Alexander Kanavin: > On Mon, 6 Jan 2025 at 15:43, Stefan Herbrechtsmeier > <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >> https://github.com/yoctoproject/poky/compare/master...weidmueller:poky:feature/dependency-fetcher >> >> I have migrate the crate recipes to the new fetcher and improve the spdx >> 2.2 class to include the name and version of the crate dependencies. >> >> You have to inherit the create-spdx-2.2 class and build the librsvg >> recipe to test the new fetcher. > Thanks, I checked out the branch and run bitbake -c patch librsvg with > the default build/conf/ config. It works and the recipe is short and > neat. Thanks for your test. > I'm not sure what create-spdx-2.2 is needed for? I didn't use > it, and there were no errors. The change is needed to add the dependencies and their names and versions to the SBOM. > Like others, I'm torn on two things: > - visibility > - control > > When a recipe explicitly lists what goes into a build, this can be > easily seen, audited, and adjusted directly in the recipe. With the > new fetchers, you need to actually run a build to produce that list, > and it isn't clear where the list is placed, in which format, and what > to do if something needs to deviate from versions prescribed by > upstream. I missed the appropriate function in the dependency mixin in this series. The list is created on demand (see archiver or spdx patch). Every derivation need to be handled in a package manager lock file. Therefore you could place a lock file beside the recipe. You could use an editor or the language specific tools to manipulate the lock file. > This is not a theoretical concern, I'm thinking specifically of > log4j-like vulnerabilities, and how one would check that their product > doesn't contain them: > https://lwn.net/Articles/878570/ Do you have any tools to check it at the moment? I proposed a common style for the package name and version parameter of a package manager fetch URI (ex. crate). The information can be included in the SBOM and used outside of bitbake. As a follow up we could use the information to create a CPE and add the dependencies to the cve check.
Am 09.01.2025 um 11:50 schrieb Alexander Kanavin: > On Thu, 9 Jan 2025 at 11:40, Alexander Kanavin via > lists.openembedded.org<alex.kanavin=gmail.com@lists.openembedded.org> > wrote: >> This is not a theoretical concern, I'm thinking specifically of >> log4j-like vulnerabilities, and how one would check that their product >> doesn't contain them: >> https://lwn.net/Articles/878570/ > I meant to say 'yocto layer' here, not product. And ideally it should > be possible with 'static analysis', e.g. just by looking at the layer > content. What is the motivation for that requirement ("just by looking at the layer content")? Why can't we use a SBOM for the vulnerability check?Do be really safe you have to scan the code because of embedded packages (vendoring) or git submodules.
Am 09.01.2025 um 12:53 schrieb Martin Jansa: > Hi, > > thanks for looking into this. > > With this series applied I've noticed some recipes now showing warnings like: > WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for > the url to npm fetcher: > https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz > WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for > the url to npm fetcher: > https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz > WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for > the url to npm fetcher: > https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz Thanks for your test. I assume the packages are renamed. I have a fix for that in my WIP branch. https://github.com/weidmueller/poky/commit/ae988d20777d7a542fe18fcbf95110829eef0b4f > and the same is shown before from do_fetch. > > Not sure what's special about these, but I believe it used to work > with previous npmsw implementation. Any hint what to check? Could you please check if the entry in the package-lock.json contains a "name" field. It looks like this is an undocumented feature of the package-lock.json. > On Fri, Dec 20, 2024 at 12:26 PM Stefan Herbrechtsmeier via > lists.openembedded.org > <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> > wrote: >> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> >> >> The patch series improves the fetcher support for tightly coupled >> package manager (npm, go and cargo). It adds support for embedded >> dependency fetcher via a common dependency mixin. The patch series >> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a >> fetcher for go.sum and cargo.lock files. The dependency mixin contains >> two stages. The first stage locates a local specification file or >> fetches an archive or git repository with a specification file. The >> second stage resolves the dependency URLs from the specification file >> and fetches the dependencies. >> >> SRC_URI = "<type>://npm-shrinkwrap.json" >> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" >> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" >> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" >> >> Additionally, the patch series reworks the npm fetcher to work without a >> npm binary and external package repository. It adds support for a common >> dependency name and version schema to integrate the dependencies into >> the SBOM. >> >> = Background >> Bitbake has diverse concepts and drawbacks for different tightly coupled >> package manager. The Python support uses a recipe per dependency and >> generates common fetcher URLs via a python function. The other languages >> embed the dependencies inside the recipe. The Node.js support offers a >> npmsw fetcher which uses a lock file beside the recipe to generates >> multiple common fetcher URLs on the fly and thereby hides the real >> download sources. This leads to a single source in the SBOM for example. >> The Go support contains two parallel implementations. A vendor-based >> solution with a common fetcher and a go-mod-based solution with a gomod >> fetcher. The vendor-based solution includes the individual dependencies >> into the SRC_URI of the recipe and uses a python function to generate >> common fetcher URLs which additional information for the vendor task.The >> gomod fetcher uses a proprietary gomod URL. It translates the URL into a >> common URL and prepares meta data during unpack. The Rust support >> includes the individual dependencies in the SRC_URI of the recipe and >> uses proprietary crate URLs. The crate fetcher translates a proprietary >> URL into a common fetcher URL and prepares meta data during unpack. The >> recipetool does not support the crate and the gomod fetcher. This leads >> to missing licenses of the dependencies in the recipe for example >> librsvg. >> >> The steps needed to fetch dependencies for Node.js, Go and Rust are >> similar: >> 1. Extract the dependencies from a specification file (name, version, >> checksum and URL) >> 2. Generate proprietary fetcher URIs >> a. npm://registry.npmjs.org/;package=glob;version= 10.3.15 >> b. gomod://golang.org/x/net;version=v0.9.0 >> gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net >> c. crate://crates.io/glob/0.3.1 >> 3. Generate wget or git fetcher URIs >> a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=… >> b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=… >> git://go.googlesource.com/net;protocol=https; subdir=… >> c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=… >> 4. Unpack >> 5. Create meta files >> a. Update lockfile and create tar.gz archives >> b. Create go.mod file >> Create info, go.mod file and zip archives >> c. Create .cargo-checksum.json files >> >> It looks like the recipetool is not widely used and therefore this patch >> series integrates the dependency resolving into the fetcher. After an >> agreement on a concept the fetcher could be extended. The fetcher could >> download the license information per package and a new build task could >> run the license cruncher from the recipetool. >> >> = Open questions >> >> * Where should we download dependencies? >> ** Should we use a folder per fetcher (ex. git and npm)? >> ** Should we use the main folder (ex. crate)? >> ** Should we translate the name into folder (ex. gomod)? >> ** Should we integrate the name into the filename (ex. git)? >> * Where should we unpack the dependencies? >> ** Should we use a folder inside the parent folder (ex. node_modules)? >> ** Should we use a fixed folder inside unpackdir >> (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? >> * How should we treat archives for package manager caches? >> ** Should we unpack the archives to support patching (ex. npm)? >> ** Should we copy the packed archive to avoid unpacking and packaging >> (ex. gomod)? >> >> This patch series depends on patch series >> 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com >> ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior"). >> >> >> Stefan Herbrechtsmeier (21): >> tests: fetch: update npmsw tests to new lockfile format >> fetch2: npmsw: remove old lockfile format support >> tests: fetch: replace [url] with urls for npm >> fetch2: do not prefix embedded checksums >> fetch2: read checksum from SRC_URI flag for npm >> fetch2: introduce common package manager metadata >> fetch2: add unpack support for npm archives >> utils: add Go mod h1 checksum support >> fetch2: add destdir to FetchData >> fetch: npm: rework >> tests: fetch: adapt style in npm(sw) class >> tests: fetch: move npmsw test cases into npmsw test class >> tests: fetch: adapt npm test cases >> fetch: add dependency mixin >> tests: fetch: add test cases for dependency fetcher >> fetch: npmsw: migrate to dependency mixin >> tests: fetch: adapt npmsw test cases >> fetch: add gosum fetcher >> tests: fetch: add test cases for gosum >> fetch: add cargolock fetcher >> tests: fetch: add test cases for cargolock >> >> lib/bb/fetch2/__init__.py | 35 +- >> lib/bb/fetch2/cargolock.py | 73 +++ >> lib/bb/fetch2/dependency.py | 167 +++++++ >> lib/bb/fetch2/gomod.py | 5 +- >> lib/bb/fetch2/gosum.py | 51 +++ >> lib/bb/fetch2/npm.py | 244 +++------- >> lib/bb/fetch2/npmsw.py | 347 ++++---------- >> lib/bb/tests/fetch.py | 880 +++++++++++++++++------------------- >> lib/bb/utils.py | 25 + >> 9 files changed, 916 insertions(+), 911 deletions(-) >> create mode 100644 lib/bb/fetch2/cargolock.py >> create mode 100644 lib/bb/fetch2/dependency.py >> create mode 100644 lib/bb/fetch2/gosum.py >> >> -- >> 2.39.5 >> >> >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> View/Reply Online (#16920): https://lists.openembedded.org/g/bitbake-devel/message/16920 >> Mute This Topic: https://lists.openembedded.org/mt/110212697/3617156 >> Group Owner: bitbake-devel+owner@lists.openembedded.org >> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [martin.jansa@gmail.com] >> -=-=-=-=-=-=-=-=-=-=-=- >>
On Thu, 9 Jan 2025 at 15:00, Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > I missed the appropriate function in the dependency mixin in this > series. The list is created on demand (see archiver or spdx patch). > Every derivation need to be handled in a package manager lock file. > Therefore you could place a lock file beside the recipe. You could use > an editor or the language specific tools to manipulate the lock file. I'm not sure, is there code I can try that does this (and if so, how?), or is this code still to be written? > What is the motivation for that requirement ("just by looking at the layer content")? > Why can't we use a SBOM for the vulnerability check? Do be really safe you have > to scan the code because of embedded packages (vendoring) or git submodules. That's right. I don't disagree with this. I do however have another concern I want to express: I can't convince myself that the 'integrated fetcher' is an overall significant, obvious, major improvement over the 'generate the SRC_URI lists in .inc files via task in a bbclass' approach. Just a couple reasons: - the 'integrated fetcher' is not trivial, and notably increases the complexity of bitbake fetcher codebase. We already struggle to maintain bitbake, RP is overloaded, and very few other people have time and knowledge to look at bitbake patches, and understand what is going on. You've already seen this with your patchset where getting it properly reviewed by anyone else than RP is an ongoing challenge. On the other hand, the .inc updaters are fully contained in oe-core classes, they implement a task in well-understood 'recipe python' dialect and thus benefit from a lot more people being able to take care of them. They're also safer in the sense that any bugs in them are only triggered when someone needs to update a recipe. Fetchers, on the other hand, are fairly critical pieces of code and they must work regardless of host environment, python versions, unforeseen corner cases in source trees and so on. - we might be able to remove those long SRC_URI lists by migrating recipes to the integrated fetcher, but we won't be able to do this with the licensing information (pointers+checksums to licenses, license strings) for items that are being fetched. For that, you still need some way to write it into a recipe with a tool. We don't do this yet, but we really should. Alex
Am 20.12.2024 um 12:25 schrieb Stefan Herbrechtsmeier via lists.openembedded.org: > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > > Downloads from package manager repositories are identified via registry, > name, and version. The fetchers use individual styles to define the > download metadata: > > npm://<REGISTRY>;package=<NAME>;version=<VERSION> > > crate://<REGISTRY>/<NAME>/<VERSION> > > GO_MOD_PROXY = “<REGISTRY>” > gomod://<NAME>;version=<VERSION> > gomodgit://<NAME>;version=<VERSION>;repo= <REPOSITORY> > > The name and version are important for the SBOM to add usable name, > version, and CPE to the SBOM entries for the downloaded dependencies. > Introduce a common style and check the existence of the parameters: > > <TYPE>://<REGISTRY | REPOSITORY>;dn=<NAME>;dv=<VERSION> > > The style clearly separates the metadata and supports slashes and @ > in the name. > > Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > --- > > lib/bb/fetch2/__init__.py | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py > index d2a30c18f..4b7c01d6a 100644 > --- a/lib/bb/fetch2/__init__.py > +++ b/lib/bb/fetch2/__init__.py > @@ -1356,6 +1356,12 @@ class FetchData(object): > if hasattr(self.method, "urldata_init"): > self.method.urldata_init(self, d) > > + if self.method.require_download_metadata(): > + if "dn" not in self.parm: > + raise MissingParameterError("dn", self.url) > + if "dv" not in self.parm: > + raise MissingParameterError("dv", self.url) > + Alternative to the short name (dn, dv) we could add an optional version to the resolution of the checksum and source revision and remove the version value from the name parameter: configure_checksum: if all(key in self.parm for key in ["name", "version"]): checksum_name = "%s@%s.%ssum" % (self.parm["name"], self.parm["version"], checksum_id) srcrev_internal_helper: if name and version: attempts.append("SRCREV_%s@%s" % (name, version)) > for checksum_id in CHECKSUM_LIST: > configure_checksum(checksum_id) > > @@ -1711,6 +1717,12 @@ class FetchMethod(object): > """ > return [] > > + def require_download_metadata(self): > + """ > + The fetcher requires download name (dn) und version (dv) parameter. > + """ > + return False > + > > class DummyUnpackTracer(object): > """
Am 09.01.2025 um 20:40 schrieb Alexander Kanavin: > On Thu, 9 Jan 2025 at 15:00, Stefan Herbrechtsmeier > <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >> I missed the appropriate function in the dependency mixin in this >> series. The list is created on demand (see archiver or spdx patch). >> Every derivation need to be handled in a package manager lock file. >> Therefore you could place a lock file beside the recipe. You could use >> an editor or the language specific tools to manipulate the lock file. > I'm not sure, is there code I can try that does this (and if so, > how?), or is this code still to be written? You have to inherit create-spdx-2.2. If you use poky as distro you have to replace the 3.0 in create-spdx with 2.2 because it is impossible to override the inherit in poky.conf. Afterwards you can create a SBOM with the following command: bitbake -c create_spdx librsvg The same feature will be added to create-spdx-3.0 but therefore I need some recommendations from the spdx experts. After we have an agreement how to provide the needed information I will work on the create-spdx-3.0 support. >> What is the motivation for that requirement ("just by looking at the layer content")? >> Why can't we use a SBOM for the vulnerability check? Do be really safe you have >> to scan the code because of embedded packages (vendoring) or git submodules. > That's right. I don't disagree with this. > > I do however have another concern I want to express: I can't convince > myself that the 'integrated fetcher' is an overall significant, > obvious, major improvement over the 'generate the SRC_URI lists in > .inc files via task in a bbclass' approach. > > Just a couple reasons: > > - the 'integrated fetcher' is not trivial, and notably increases the > complexity of bitbake fetcher codebase. We already struggle to > maintain bitbake, RP is overloaded, and very few other people have > time and knowledge to look at bitbake patches, and understand what is > going on. You've already seen this with your patchset where getting it > properly reviewed by anyone else than RP is an ongoing challenge. On > the other hand, the .inc updaters are fully contained in oe-core > classes, they implement a task in well-understood 'recipe python' > dialect and thus benefit from a lot more people being able to take > care of them. They're also safer in the sense that any bugs in them > are only triggered when someone needs to update a recipe. Fetchers, on > the other hand, are fairly critical pieces of code and they must work > regardless of host environment, python versions, unforeseen corner > cases in source trees and so on. You mixed two different points. We have to distinguish between bitbake fetcher and the on-the-fly resolve of SRC_URIs. Regarding the bitbake fetcher the same reason are true for the language specific fetchers. The fetchers are based on the wget or git fetcher. They only add a preprocessing of the source uri and a post-processing of the download. There is no requirement to do this inside the fetcher. The on-the-fly resolve is also possible in oe-core. I think it isn't really practicable to manipulate the resoled source uris because of dependencies between dependencies and the relationship to other package manager configuration files. Why shouldn't we use the package manager specific tools to update the configuration and dependency specification. I understand that a patch is more straightforward than a new dependency specification file. The inc file is an oe specific format of a dependency specification / lock file without available tools to update entries with respect to the relationship between entries. Furthermore it is impossible to use the changes outside of oe for tests or debugging. What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? Do you really review the changes of the inc file? I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) > - we might be able to remove those long SRC_URI lists by migrating > recipes to the integrated fetcher, but we won't be able to do this > with the licensing information (pointers+checksums to licenses, > license strings) for items that are being fetched. For that, you still > need some way to write it into a recipe with a tool. We don't do this > yet, but we really should. The license topic is independent of the fetcher because the dependency specification doesn't contain license information. The recipetool shows that the license topic is really complicated. It is possible to fetch a license string from the package manager repository but this information is useless without a pointer to the license file and a checksum. The automatic determine of the license from a file need a very good tooling because we need to trust the process and it must minimize the manual correction. Furthermore you need a central database otherwise you have to fix the same problem twice because the recipes could use the same dependency. Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly. I understand the motivation for the update task / inc file but I don't think it adds any practical benefit. Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.
On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? That ship has sailed. We can't remove gitsm, it has users, and they will be very angry. > Do you really review the changes of the inc file? > I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) But do you? I have to restate the point: a solution that can be placed inside a layer is much more scalable and maintainable than adding code to bitbake. That's why I'm leaning towards drawing the line at existing fetchers that are wget/git convenience wrappers, and shifting dependency/lockfile management to layers. It's ultimately RP's call, but he does seek feedback :) I'm fine with large SRC_URI/sha256 diffs when recipes get updated to new versions. And since you asked, no, no one looks at them, they're auto-generated noise that we learned to block out, just as we learned to quickly skim over recipe patch changes that are just line number churn and similar non-functional changes. > Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly. We won't blindly trust a generator. There are multiple gate-keeping steps, some of which already work, and some should still be implemented: - when creating a recipe with devtool, devtool should discover all licenses and generate appropriate recipe metadata. For classic unix-y components this has to rely on 'guessing', but things like crates have deterministic licensing metadata (a field in Cargo.toml, and LICENSE-* files if I remember right). We can also propose adding such determinism upstream if it's not currently good enough. - when updating a recipe with devtool to a new upstream release, it uses the file:// entries in LIC_FILES_CHKSUM to generate a diff of previous license texts and the new ones, and writes that as a comment into the updated recipe. The diff is reviewed by a human performing the update, and condensed into an update to the LICENSE field (if needed), and an explanation of what changed in the License-Update tag in the commit message. This could be further automated if upstream has deterministic ways to specify licenses, e.g. LICENSE = "&".join(all_license_ids). - when sending the resulting patch for review, there's a mailing list bot (patchtest), which will check that any update in license checksums is accompanied by an explanation in License-Update tag. There are also humans which will check that the licensing changes are sensible. Otherwise we do trust that submitters spot important changes in licensing (from the diff in the previous step or by manual comparison, if they want) and summarise them in LICENSE correctly. - finally there are various license checks that run in recipe_qa task and implemented in insane.bbclass. They could be extended to verify that every dependency has a matching license entry in the recipe and so on. Anything that can be caught by looking at the source tree and the license metadata. > Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point. That would be much appreciated. The more I think about it the more I'm convinced we should have it standardized in core. Alex
Am 10.01.2025 um 14:26 schrieb Alexander Kanavin: > On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier > <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? > That ship has sailed. We can't remove gitsm, it has users, and they > will be very angry. This makes it impossible to fix wrong design decision or remove code with a low code quality. >> Do you really review the changes of the inc file? >> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) > But do you? I have to restate the point: a solution that can be placed > inside a layer is much more scalable and maintainable than adding code > to bitbake. That's why I'm leaning towards drawing the line at > existing fetchers that are wget/git convenience wrappers, and shifting > dependency/lockfile management to layers. It's ultimately RP's call, > but he does seek feedback :) I'm working on it. > I'm fine with large SRC_URI/sha256 diffs when recipes get updated to > new versions. And since you asked, no, no one looks at them, they're > auto-generated noise that we learned to block out, just as we learned > to quickly skim over recipe patch changes that are just line number > churn and similar non-functional changes. Instead of an inc file the generated SRC_URIs could be saved inside the work directory of the recipe. This will eliminate the noise and avoid a manual run of an update task after a recipe changes. >> Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly. > We won't blindly trust a generator. There are multiple gate-keeping > steps, some of which already work, and some should still be > implemented: > > - when creating a recipe with devtool, devtool should discover all > licenses and generate appropriate recipe metadata. For classic unix-y > components this has to rely on 'guessing', but things like crates have > deterministic licensing metadata (a field in Cargo.toml, and LICENSE-* > files if I remember right). We can also propose adding such > determinism upstream if it's not currently good enough. > > - when updating a recipe with devtool to a new upstream release, it > uses thefile:// entries in LIC_FILES_CHKSUM to generate a diff of > previous license texts and the new ones, and writes that as a comment > into the updated recipe. The diff is reviewed by a human performing > the update, and condensed into an update to the LICENSE field (if > needed), and an explanation of what changed in the License-Update tag > in the commit message. This could be further automated if upstream has > deterministic ways to specify licenses, e.g. LICENSE = > "&".join(all_license_ids). > > - when sending the resulting patch for review, there's a mailing list > bot (patchtest), which will check that any update in license checksums > is accompanied by an explanation in License-Update tag. There are also > humans which will check that the licensing changes are sensible. > Otherwise we do trust that submitters spot important changes in > licensing (from the diff in the previous step or by manual comparison, > if they want) and summarise them in LICENSE correctly. > > - finally there are various license checks that run in recipe_qa task > and implemented in insane.bbclass. They could be extended to verify > that every dependency has a matching license entry in the recipe and > so on. Anything that can be caught by looking at the source tree and > the license metadata. This works for individual project but become complicated for dependencies because you have to handle the same change multiple times. But lets stop the discussion for now because license is out of scope of this series. >> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point. > That would be much appreciated. The more I think about it the more I'm > convinced we should have it standardized in core. What do you mean by standardized?
On Fri, 10 Jan 2025 at 16:04, Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? > > That ship has sailed. We can't remove gitsm, it has users, and they > will be very angry. > > This makes it impossible to fix wrong design decision or remove code with a low code quality. It's still possible, you just can't be heavy-handed and dictatorial about 'removing' stuff you don't like. When the existing thing works very well for a lot of people (and gitsm does), then the new thing has to be obviously better, you need to do your best to convince as many people as possible of that, and it needs to co-exist with the old thing, so that users can migrate at their own pace. And some of the users may never do that, and they will get annoyed at or ignore deprecation warnings or similar attempts to push them. > Instead of an inc file the generated SRC_URIs could be saved inside the work directory of the recipe. This will eliminate the noise and avoid a manual run of an update task after a recipe changes. I would be very interested to see the proof of concept that does this. > Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point. > > That would be much appreciated. The more I think about it the more I'm > convinced we should have it standardized in core. > > What do you mean by standardized? Standardized handling of embedded dependencies, lock files and various other aspects of language-specific package managers, so that adding support for a new thing would be writing a new extension/plugin/subclass for the existing framework. Alex
On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via lists.openembedded.org <stefan.herbrechtsmeier-oss= weidmueller.com@lists.openembedded.org> wrote: > Am 10.01.2025 um 14:26 schrieb Alexander Kanavin: > > On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier<stefan.herbrechtsmeier-oss@weidmueller.com> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > > What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? > > That ship has sailed. We can't remove gitsm, it has users, and they > will be very angry. > > This makes it impossible to fix wrong design decision or remove code with > a low code quality. > > Do you really review the changes of the inc file? > I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) > > But do you? I have to restate the point: a solution that can be placed > inside a layer is much more scalable and maintainable than adding code > to bitbake. That's why I'm leaning towards drawing the line at > existing fetchers that are wget/git convenience wrappers, and shifting > dependency/lockfile management to layers. It's ultimately RP's call, > but he does seek feedback :) > > I'm working on it. > > I'm fine with large SRC_URI/sha256 diffs when recipes get updated to > new versions. And since you asked, no, no one looks at them, they're > auto-generated noise that we learned to block out, just as we learned > to quickly skim over recipe patch changes that are just line number > churn and similar non-functional changes. > > Instead of an inc file the generated SRC_URIs could be saved inside the > work directory of the recipe. This will eliminate the noise and avoid a > manual run of an update task after a recipe changes. > Except for those that want the .inc file changes to be version controlled (as well as SRC_URI changes), but maybe I'm misunderstanding what you described above A generated temporary/build file is definitely more visible than something that is programmatically done and held internally during recipe processing and build. It opens the door for extension and doing version control on it. So I don't object to the concept, I just don't think I have all the details straight in my head. Cheers, Bruce Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly. > > We won't blindly trust a generator. There are multiple gate-keeping > steps, some of which already work, and some should still be > implemented: > > - when creating a recipe with devtool, devtool should discover all > licenses and generate appropriate recipe metadata. For classic unix-y > components this has to rely on 'guessing', but things like crates have > deterministic licensing metadata (a field in Cargo.toml, and LICENSE-* > files if I remember right). We can also propose adding such > determinism upstream if it's not currently good enough. > > - when updating a recipe with devtool to a new upstream release, it > uses the file:// entries in LIC_FILES_CHKSUM to generate a diff of > previous license texts and the new ones, and writes that as a comment > into the updated recipe. The diff is reviewed by a human performing > the update, and condensed into an update to the LICENSE field (if > needed), and an explanation of what changed in the License-Update tag > in the commit message. This could be further automated if upstream has > deterministic ways to specify licenses, e.g. LICENSE = > "&".join(all_license_ids). > > - when sending the resulting patch for review, there's a mailing list > bot (patchtest), which will check that any update in license checksums > is accompanied by an explanation in License-Update tag. There are also > humans which will check that the licensing changes are sensible. > Otherwise we do trust that submitters spot important changes in > licensing (from the diff in the previous step or by manual comparison, > if they want) and summarise them in LICENSE correctly. > > - finally there are various license checks that run in recipe_qa task > and implemented in insane.bbclass. They could be extended to verify > that every dependency has a matching license entry in the recipe and > so on. Anything that can be caught by looking at the source tree and > the license metadata. > > This works for individual project but become complicated for dependencies > because you have to handle the same change multiple times. But lets stop > the discussion for now because license is out of scope of this series. > > Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point. > > That would be much appreciated. The more I think about it the more I'm > convinced we should have it standardized in core. > > What do you mean by standardized? > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17006): > https://lists.openembedded.org/g/bitbake-devel/message/17006 > Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ > bruce.ashfield@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > >
Am 10.01.2025 um 21:24 schrieb Bruce Ashfield: > On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via > lists.openembedded.org <http://lists.openembedded.org> > <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> wrote: > > Am 10.01.2025 um 14:26 schrieb Alexander Kanavin: >> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier >> <stefan.herbrechtsmeier-oss@weidmueller.com> <mailto:stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >>> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? >> That ship has sailed. We can't remove gitsm, it has users, and they >> will be very angry. > > This makes it impossible to fix wrong design decision or remove > code with a low code quality. > >>> Do you really review the changes of the inc file? >>> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) >> But do you? I have to restate the point: a solution that can be placed >> inside a layer is much more scalable and maintainable than adding code >> to bitbake. That's why I'm leaning towards drawing the line at >> existing fetchers that are wget/git convenience wrappers, and shifting >> dependency/lockfile management to layers. It's ultimately RP's call, >> but he does seek feedback :) > I'm working on it. > >> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to >> new versions. And since you asked, no, no one looks at them, they're >> auto-generated noise that we learned to block out, just as we learned >> to quickly skim over recipe patch changes that are just line number >> churn and similar non-functional changes. > > Instead of an inc file the generated SRC_URIs could be saved > inside the work directory of the recipe. This will eliminate the > noise and avoid a manual run of an update task after a recipe changes. > > > Except for those that want the .inc file changes to be version > controlled (as well as SRC_URI changes), but maybe I'm > misunderstanding what you described above Why should somebody version control the generated SRC_URI? > A generated temporary/build file is definitely more visible than > something that is programmatically done and held internally during > recipe processing and build. It opens the door for extension and > doing version control on it. So I don't object to the concept, I just > don't think I have all the details straight in my head. A generated build file will be saved in the work directory of the recipe like any other generated build file. It is impossible to add it to the version control system. The update task create a version controlled generated source file. I don't understand why the version control is needed because the source of the generator and the generator are version controlled. Especially if the output is ignored during patch review. I think it is much more straightforward to patch the source (lock file) because it is complicated to handle manual changes during regeneration of a generated file. >>> Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly. >> We won't blindly trust a generator. There are multiple gate-keeping >> steps, some of which already work, and some should still be >> implemented: >> >> - when creating a recipe with devtool, devtool should discover all >> licenses and generate appropriate recipe metadata. For classic unix-y >> components this has to rely on 'guessing', but things like crates have >> deterministic licensing metadata (a field in Cargo.toml, and LICENSE-* >> files if I remember right). We can also propose adding such >> determinism upstream if it's not currently good enough. >> >> - when updating a recipe with devtool to a new upstream release, it >> uses thefile:// entries in LIC_FILES_CHKSUM to generate a diff of >> previous license texts and the new ones, and writes that as a comment >> into the updated recipe. The diff is reviewed by a human performing >> the update, and condensed into an update to the LICENSE field (if >> needed), and an explanation of what changed in the License-Update tag >> in the commit message. This could be further automated if upstream has >> deterministic ways to specify licenses, e.g. LICENSE = >> "&".join(all_license_ids). >> >> - when sending the resulting patch for review, there's a mailing list >> bot (patchtest), which will check that any update in license checksums >> is accompanied by an explanation in License-Update tag. There are also >> humans which will check that the licensing changes are sensible. >> Otherwise we do trust that submitters spot important changes in >> licensing (from the diff in the previous step or by manual comparison, >> if they want) and summarise them in LICENSE correctly. >> >> - finally there are various license checks that run in recipe_qa task >> and implemented in insane.bbclass. They could be extended to verify >> that every dependency has a matching license entry in the recipe and >> so on. Anything that can be caught by looking at the source tree and >> the license metadata. > This works for individual project but become complicated for > dependencies because you have to handle the same change multiple > times. But lets stop the discussion for now because license is out > of scope of this series. > >>> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point. >> That would be much appreciated. The more I think about it the more I'm >> convinced we should have it standardized in core. > What do you mean by standardized? > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17006): > https://lists.openembedded.org/g/bitbake-devel/message/17006 > Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 > Group Owner: bitbake-devel+owner@lists.openembedded.org > <mailto:bitbake-devel%2Bowner@lists.openembedded.org> > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub > [bruce.ashfield@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > > > > -- > - Thou shalt not follow the NULL pointer, for chaos and madness await > thee at its end > - "Use the force Harry" - Gandalf, Star Trek II >
On Mon, Jan 13, 2025 at 2:11 AM Stefan Herbrechtsmeier < stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > Am 10.01.2025 um 21:24 schrieb Bruce Ashfield: > > On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via > lists.openembedded.org <stefan.herbrechtsmeier-oss= > weidmueller.com@lists.openembedded.org> wrote: > >> Am 10.01.2025 um 14:26 schrieb Alexander Kanavin: >> >> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier<stefan.herbrechtsmeier-oss@weidmueller.com> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >> >> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? >> >> That ship has sailed. We can't remove gitsm, it has users, and they >> will be very angry. >> >> This makes it impossible to fix wrong design decision or remove code with >> a low code quality. >> >> Do you really review the changes of the inc file? >> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) >> >> But do you? I have to restate the point: a solution that can be placed >> inside a layer is much more scalable and maintainable than adding code >> to bitbake. That's why I'm leaning towards drawing the line at >> existing fetchers that are wget/git convenience wrappers, and shifting >> dependency/lockfile management to layers. It's ultimately RP's call, >> but he does seek feedback :) >> >> I'm working on it. >> >> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to >> new versions. And since you asked, no, no one looks at them, they're >> auto-generated noise that we learned to block out, just as we learned >> to quickly skim over recipe patch changes that are just line number >> churn and similar non-functional changes. >> >> Instead of an inc file the generated SRC_URIs could be saved inside the >> work directory of the recipe. This will eliminate the noise and avoid a >> manual run of an update task after a recipe changes. >> > > Except for those that want the .inc file changes to be version controlled > (as well as SRC_URI changes), but maybe I'm misunderstanding what you > described above > > Why should somebody version control the generated SRC_URI? > > > Why wouldn't they ? I'm talking about when the SRC_URI is generated to git fetches (or whatever), that is part of the recipe and version controlled. My point is that this is not throw away / transient information for many use cases. It is something that can be tracked between updates to the recipes. > A generated temporary/build file is definitely more visible than something > that is programmatically done and held internally during recipe processing > and build. It opens the door for extension and doing version control on > it. So I don't object to the concept, I just don't think I have all the > details straight in my head. > > A generated build file will be saved in the work directory of the recipe > like any other generated build file. It is impossible to add it to the > version control system. The update task create a version controlled > generated source file. I don't understand why the version control is needed > because the source of the generator and the generator are version > controlled. Especially if the output is ignored during patch review. I > think it is much more straightforward to patch the source (lock file) > because it is complicated to handle manual changes during regeneration of a > generated file. > *sigh*. I'm quite aware of what can and cannot be done. That's not what I meant. I'm obviously not talking about something in WORKDIR. I'm just saying that if something is written to disk, then depending on how things are implemented it can be viewed, debugged and manipulated. If it is always generated, held internally to the classes and used, I have no options to do that sort of debug. Similarly, anything that is generated, it would be ideal if there was a way to re-use a previously generated artifact and not generate it on the fly .. that's the element that opens the door to version control and tracking. We'll agree to disagree on what is or isn't efficient or complicated. Luckily, this is all opt-in, so I'll never really have to use it. I'm just sharing what it would take to get me to consider it based on what I've learned/suffered in my time maintaining quite a few go recipes. Cheers, Bruce > Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly. >> >> We won't blindly trust a generator. There are multiple gate-keeping >> steps, some of which already work, and some should still be >> implemented: >> >> - when creating a recipe with devtool, devtool should discover all >> licenses and generate appropriate recipe metadata. For classic unix-y >> components this has to rely on 'guessing', but things like crates have >> deterministic licensing metadata (a field in Cargo.toml, and LICENSE-* >> files if I remember right). We can also propose adding such >> determinism upstream if it's not currently good enough. >> >> - when updating a recipe with devtool to a new upstream release, it >> uses the file:// entries in LIC_FILES_CHKSUM to generate a diff of >> previous license texts and the new ones, and writes that as a comment >> into the updated recipe. The diff is reviewed by a human performing >> the update, and condensed into an update to the LICENSE field (if >> needed), and an explanation of what changed in the License-Update tag >> in the commit message. This could be further automated if upstream has >> deterministic ways to specify licenses, e.g. LICENSE = >> "&".join(all_license_ids). >> >> - when sending the resulting patch for review, there's a mailing list >> bot (patchtest), which will check that any update in license checksums >> is accompanied by an explanation in License-Update tag. There are also >> humans which will check that the licensing changes are sensible. >> Otherwise we do trust that submitters spot important changes in >> licensing (from the diff in the previous step or by manual comparison, >> if they want) and summarise them in LICENSE correctly. >> >> - finally there are various license checks that run in recipe_qa task >> and implemented in insane.bbclass. They could be extended to verify >> that every dependency has a matching license entry in the recipe and >> so on. Anything that can be caught by looking at the source tree and >> the license metadata. >> >> This works for individual project but become complicated for dependencies >> because you have to handle the same change multiple times. But lets stop >> the discussion for now because license is out of scope of this series. >> >> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point. >> >> That would be much appreciated. The more I think about it the more I'm >> convinced we should have it standardized in core. >> >> What do you mean by standardized? >> >> >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> View/Reply Online (#17006): >> https://lists.openembedded.org/g/bitbake-devel/message/17006 >> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 >> Group Owner: bitbake-devel+owner@lists.openembedded.org >> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ >> bruce.ashfield@gmail.com] >> -=-=-=-=-=-=-=-=-=-=-=- >> >> > > -- > - Thou shalt not follow the NULL pointer, for chaos and madness await thee > at its end > - "Use the force Harry" - Gandalf, Star Trek II > >
On Fri, 17 Jan 2025 at 05:20, Bruce Ashfield <bruce.ashfield@gmail.com> wrote: > *sigh*. I'm quite aware of what can and cannot be done. That's not what I meant. I'm obviously not talking about something in WORKDIR. I'm just saying that if something is written to disk, then depending on how things are implemented it can be viewed, debugged and manipulated. If it is always generated, held internally to the classes and used, I have no options to do that sort of debug. Similarly, anything that is generated, it would be ideal if there was a way to re-use a previously generated artifact and not generate it on the fly .. that's the element that opens the door to version control and tracking. > > We'll agree to disagree on what is or isn't efficient or complicated. Luckily, this is all opt-in, so I'll never really have to use it. I'm just sharing what it would take to get me to consider it based on what I've learned/suffered in my time maintaining quite a few go recipes. I beg to differ, as someone who maintains a few rust/cargo recipes. I haven't once found this ability to track SRC_URIs in recipes useful. It's always been auto-generated noise and I'd be very willing to consider an implementation that keeps it neatly hidden, if this implementation is fully oe-core based. So Stefan, don't let this discourage you. Alex
Am 17.01.2025 um 05:19 schrieb Bruce Ashfield via lists.openembedded.org: > On Mon, Jan 13, 2025 at 2:11 AM Stefan Herbrechtsmeier > <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > > Am 10.01.2025 um 21:24 schrieb Bruce Ashfield: >> On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via >> lists.openembedded.org <http://lists.openembedded.org> >> <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> >> wrote: >> >> Am 10.01.2025 um 14:26 schrieb Alexander Kanavin: >>> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier >>> <stefan.herbrechtsmeier-oss@weidmueller.com> <mailto:stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >>>> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? >>> That ship has sailed. We can't remove gitsm, it has users, and they >>> will be very angry. >> >> This makes it impossible to fix wrong design decision or >> remove code with a low code quality. >> >>>> Do you really review the changes of the inc file? >>>> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) >>> But do you? I have to restate the point: a solution that can be placed >>> inside a layer is much more scalable and maintainable than adding code >>> to bitbake. That's why I'm leaning towards drawing the line at >>> existing fetchers that are wget/git convenience wrappers, and shifting >>> dependency/lockfile management to layers. It's ultimately RP's call, >>> but he does seek feedback :) >> I'm working on it. >> >>> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to >>> new versions. And since you asked, no, no one looks at them, they're >>> auto-generated noise that we learned to block out, just as we learned >>> to quickly skim over recipe patch changes that are just line number >>> churn and similar non-functional changes. >> >> Instead of an inc file the generated SRC_URIs could be saved >> inside the work directory of the recipe. This will eliminate >> the noise and avoid a manual run of an update task after a >> recipe changes. >> >> >> Except for those that want the .inc file changes to be version >> controlled (as well as SRC_URI changes), but maybe I'm >> misunderstanding what you described above > > Why should somebody version control the generated SRC_URI? > > > Why wouldn't they ? I'm talking about when the SRC_URI is generated to > git fetches (or whatever), that is part of the recipe and version > controlled. > > My point is that this is not throw away / transient information for > many use cases. It is something that can be tracked between updates to > the recipes. > >> A generated temporary/build file is definitely more visible than >> something that is programmatically done and held internally >> during recipe processing and build. It opens the door for >> extension and doing version control on it. So I don't object to >> the concept, I just don't think I have all the details straight >> in my head. > > A generated build file will be saved in the work directory of the > recipe like any other generated build file. It is impossible to > add it to the version control system. The update task create a > version controlled generated source file. I don't understand why > the version control is needed because the source of the generator > and the generator are version controlled. Especially if the output > is ignored during patch review. I think it is much more > straightforward to patch the source (lock file) because it is > complicated to handle manual changes during regeneration of a > generated file. > > *sigh*. I'm quite aware of what can and cannot be done. That's not > what I meant. I'm obviously not talking about something in WORKDIR. > I'm just saying that if something is written to disk, then depending > on how things are implemented it can be viewed, debugged and > manipulated. If it is always generated, held internally to the classes > and used, I have no options to do that sort of debug. Similarly, > anything that is generated, it would be ideal if there was a way to > re-use a previously generated artifact and not generate it on the fly > .. that's the element that opens the door to version control and tracking. Why do we need to track the generated file if the source is version control and the generated file is cached like any other task output. I working on a prototype with the following steps: 1. Fetch the sources from the recipe (do_fetch) 2. Unpack the sources from the recipe (do_unpack) 2. Apply patches which are marked as early to patch the lock file (do_patch_early) 3. Resolve dependencies from the lock file and write it into a file (do_vendor_resolve) 4. Fetch dependencies (do_vendor_fetch) 5. Unpack dependencies into a package manager cache (do_vendor_unpack) 6. Create a vendor directory below the source folder (do_vendor) 7. Apply patches (do_patch) The go, rust and npm fetchers work. The go vendor folder works. I'm still working on the vendor directory for crate, a solution for npm without JavaScript and the integration of the dynamic sources into the SBOM, archiver and so on. Do you have a recommendation for an example project for the Rust, Go and npm fetcher? > We'll agree to disagree on what is or isn't efficient or complicated. > Luckily, this is all opt-in, so I'll never really have to use it. I'm > just sharing what it would take to get me to consider it based on what > I've learned/suffered in my time maintaining quite a few go recipes. My problem is to understand the reasons or use cases behind the inc for generated content and its version control. I understand that is must be possible to manipulate the fetches dependencies, to cache the generated fetcher URIs, to make the fetcher URIs viewable and to manipulate the fetched dependency sources. Regards Stefan
On Fri, Jan 17, 2025 at 2:45 AM Stefan Herbrechtsmeier via lists.openembedded.org <stefan.herbrechtsmeier-oss= weidmueller.com@lists.openembedded.org> wrote: > Am 17.01.2025 um 05:19 schrieb Bruce Ashfield via lists.openembedded.org: > > On Mon, Jan 13, 2025 at 2:11 AM Stefan Herbrechtsmeier < > stefan.herbrechtsmeier-oss@weidmueller.com> wrote: > >> Am 10.01.2025 um 21:24 schrieb Bruce Ashfield: >> >> On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via >> lists.openembedded.org <stefan.herbrechtsmeier-oss= >> weidmueller.com@lists.openembedded.org> wrote: >> >>> Am 10.01.2025 um 14:26 schrieb Alexander Kanavin: >>> >>> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier<stefan.herbrechtsmeier-oss@weidmueller.com> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote: >>> >>> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions? >>> >>> That ship has sailed. We can't remove gitsm, it has users, and they >>> will be very angry. >>> >>> This makes it impossible to fix wrong design decision or remove code >>> with a low code quality. >>> >>> Do you really review the changes of the inc file? >>> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching) >>> >>> But do you? I have to restate the point: a solution that can be placed >>> inside a layer is much more scalable and maintainable than adding code >>> to bitbake. That's why I'm leaning towards drawing the line at >>> existing fetchers that are wget/git convenience wrappers, and shifting >>> dependency/lockfile management to layers. It's ultimately RP's call, >>> but he does seek feedback :) >>> >>> I'm working on it. >>> >>> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to >>> new versions. And since you asked, no, no one looks at them, they're >>> auto-generated noise that we learned to block out, just as we learned >>> to quickly skim over recipe patch changes that are just line number >>> churn and similar non-functional changes. >>> >>> Instead of an inc file the generated SRC_URIs could be saved inside the >>> work directory of the recipe. This will eliminate the noise and avoid a >>> manual run of an update task after a recipe changes. >>> >> >> Except for those that want the .inc file changes to be version controlled >> (as well as SRC_URI changes), but maybe I'm misunderstanding what you >> described above >> >> Why should somebody version control the generated SRC_URI? >> >> >> Why wouldn't they ? I'm talking about when the SRC_URI is generated to > git fetches (or whatever), that is part of the recipe and version > controlled. > > My point is that this is not throw away / transient information for many > use cases. It is something that can be tracked between updates to the > recipes. > > > >> A generated temporary/build file is definitely more visible than >> something that is programmatically done and held internally during recipe >> processing and build. It opens the door for extension and doing version >> control on it. So I don't object to the concept, I just don't think I have >> all the details straight in my head. >> >> A generated build file will be saved in the work directory of the recipe >> like any other generated build file. It is impossible to add it to the >> version control system. The update task create a version controlled >> generated source file. I don't understand why the version control is needed >> because the source of the generator and the generator are version >> controlled. Especially if the output is ignored during patch review. I >> think it is much more straightforward to patch the source (lock file) >> because it is complicated to handle manual changes during regeneration of a >> generated file. >> > *sigh*. I'm quite aware of what can and cannot be done. That's not what I > meant. I'm obviously not talking about something in WORKDIR. I'm just > saying that if something is written to disk, then depending on how things > are implemented it can be viewed, debugged and manipulated. If it is always > generated, held internally to the classes and used, I have no options to do > that sort of debug. Similarly, anything that is generated, it would be > ideal if there was a way to re-use a previously generated artifact and not > generate it on the fly .. that's the element that opens the door to version > control and tracking. > > Why do we need to track the generated file if the source is version > control and the generated file is cached like any other task output. I > working on a prototype with the following steps: > > 1. Fetch the sources from the recipe (do_fetch) > 2. Unpack the sources from the recipe (do_unpack) > 2. Apply patches which are marked as early to patch the lock file > (do_patch_early) > 3. Resolve dependencies from the lock file and write it into a file > (do_vendor_resolve) > 4. Fetch dependencies (do_vendor_fetch) > 5. Unpack dependencies into a package manager cache (do_vendor_unpack) > 6. Create a vendor directory below the source folder (do_vendor) > 7. Apply patches (do_patch) > I just track the vendor resolution over time. I've used it many times to figure out what has gone wrong with the go recipes that I maintain when the upstream repositories have done something odd with tags, etc, when I'm doing recipe upgrades. I use that same file to bump SRCREVs on the vendor dependency fetches when picking upstream fixes, etc. because I'm typically working on dependencies that don't have upstream releases that contain what I need and rather than patch a vendor'd file, I just bump the individual dependency or point it somewhere else (typically local to my machine) to fix the problem. It's the workflow I've developed after needing to wade into very large go recipes that went to go mod fetched vendor directories quite early on and it ensured that I'm not relying on any proxies, infrastructure or much that is hidden, so I'm able to debug, archive and be relatively sure that I can keep things working over time. I'm not even remotely saying this workflow is for everyone, I'm just trying to see if I could use some of this to resolve those base fetches and be able to use the outputs of it (what I currently have in .inc files) as part of my recipes. The .inc files are the ones that have the fetches listed/resolved, and those are the ones that are part of my recipe, so they are version controlled along with the main recipe. Cheers, Bruce > The go, rust and npm fetchers work. The go vendor folder works. I'm still > working on the vendor directory for crate, a solution for npm without > JavaScript and the integration of the dynamic sources into the SBOM, > archiver and so on. > > Do you have a recommendation for an example project for the Rust, Go and > npm fetcher? > > We'll agree to disagree on what is or isn't efficient or complicated. > Luckily, this is all opt-in, so I'll never really have to use it. I'm just > sharing what it would take to get me to consider it based on what I've > learned/suffered in my time maintaining quite a few go recipes. > > My problem is to understand the reasons or use cases behind the inc for > generated content and its version control. I understand that is must be > possible to manipulate the fetches dependencies, to cache the generated > fetcher URIs, to make the fetcher URIs viewable and to manipulate the > fetched dependency sources. > > Regards > Stefan > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17022): > https://lists.openembedded.org/g/bitbake-devel/message/17022 > Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ > bruce.ashfield@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > >
From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> The patch series improves the fetcher support for tightly coupled package manager (npm, go and cargo). It adds support for embedded dependency fetcher via a common dependency mixin. The patch series reworks the npm-shrinkwrap.json (package-lock.json) support and adds a fetcher for go.sum and cargo.lock files. The dependency mixin contains two stages. The first stage locates a local specification file or fetches an archive or git repository with a specification file. The second stage resolves the dependency URLs from the specification file and fetches the dependencies. SRC_URI = "<type>://npm-shrinkwrap.json" SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json" SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}" SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https" Additionally, the patch series reworks the npm fetcher to work without a npm binary and external package repository. It adds support for a common dependency name and version schema to integrate the dependencies into the SBOM. = Background Bitbake has diverse concepts and drawbacks for different tightly coupled package manager. The Python support uses a recipe per dependency and generates common fetcher URLs via a python function. The other languages embed the dependencies inside the recipe. The Node.js support offers a npmsw fetcher which uses a lock file beside the recipe to generates multiple common fetcher URLs on the fly and thereby hides the real download sources. This leads to a single source in the SBOM for example. The Go support contains two parallel implementations. A vendor-based solution with a common fetcher and a go-mod-based solution with a gomod fetcher. The vendor-based solution includes the individual dependencies into the SRC_URI of the recipe and uses a python function to generate common fetcher URLs which additional information for the vendor task.The gomod fetcher uses a proprietary gomod URL. It translates the URL into a common URL and prepares meta data during unpack. The Rust support includes the individual dependencies in the SRC_URI of the recipe and uses proprietary crate URLs. The crate fetcher translates a proprietary URL into a common fetcher URL and prepares meta data during unpack. The recipetool does not support the crate and the gomod fetcher. This leads to missing licenses of the dependencies in the recipe for example librsvg. The steps needed to fetch dependencies for Node.js, Go and Rust are similar: 1. Extract the dependencies from a specification file (name, version, checksum and URL) 2. Generate proprietary fetcher URIs a. npm://registry.npmjs.org/;package=glob;version= 10.3.15 b. gomod://golang.org/x/net;version=v0.9.0 gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net c. crate://crates.io/glob/0.3.1 3. Generate wget or git fetcher URIs a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=… b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=… git://go.googlesource.com/net;protocol=https; subdir=… c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=… 4. Unpack 5. Create meta files a. Update lockfile and create tar.gz archives b. Create go.mod file Create info, go.mod file and zip archives c. Create .cargo-checksum.json files It looks like the recipetool is not widely used and therefore this patch series integrates the dependency resolving into the fetcher. After an agreement on a concept the fetcher could be extended. The fetcher could download the license information per package and a new build task could run the license cruncher from the recipetool. = Open questions * Where should we download dependencies? ** Should we use a folder per fetcher (ex. git and npm)? ** Should we use the main folder (ex. crate)? ** Should we translate the name into folder (ex. gomod)? ** Should we integrate the name into the filename (ex. git)? * Where should we unpack the dependencies? ** Should we use a folder inside the parent folder (ex. node_modules)? ** Should we use a fixed folder inside unpackdir (ex. go/pkg/mod/cache/download and cargo_home/bitbake)? * How should we treat archives for package manager caches? ** Should we unpack the archives to support patching (ex. npm)? ** Should we copy the packed archive to avoid unpacking and packaging (ex. gomod)? This patch series depends on patch series 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior"). Stefan Herbrechtsmeier (21): tests: fetch: update npmsw tests to new lockfile format fetch2: npmsw: remove old lockfile format support tests: fetch: replace [url] with urls for npm fetch2: do not prefix embedded checksums fetch2: read checksum from SRC_URI flag for npm fetch2: introduce common package manager metadata fetch2: add unpack support for npm archives utils: add Go mod h1 checksum support fetch2: add destdir to FetchData fetch: npm: rework tests: fetch: adapt style in npm(sw) class tests: fetch: move npmsw test cases into npmsw test class tests: fetch: adapt npm test cases fetch: add dependency mixin tests: fetch: add test cases for dependency fetcher fetch: npmsw: migrate to dependency mixin tests: fetch: adapt npmsw test cases fetch: add gosum fetcher tests: fetch: add test cases for gosum fetch: add cargolock fetcher tests: fetch: add test cases for cargolock lib/bb/fetch2/__init__.py | 35 +- lib/bb/fetch2/cargolock.py | 73 +++ lib/bb/fetch2/dependency.py | 167 +++++++ lib/bb/fetch2/gomod.py | 5 +- lib/bb/fetch2/gosum.py | 51 +++ lib/bb/fetch2/npm.py | 244 +++------- lib/bb/fetch2/npmsw.py | 347 ++++---------- lib/bb/tests/fetch.py | 880 +++++++++++++++++------------------- lib/bb/utils.py | 25 + 9 files changed, 916 insertions(+), 911 deletions(-) create mode 100644 lib/bb/fetch2/cargolock.py create mode 100644 lib/bb/fetch2/dependency.py create mode 100644 lib/bb/fetch2/gosum.py