Message ID | 20240812122539.573337-3-enguerrand.de-ribaucourt@savoirfairelinux.com |
---|---|
State | New |
Headers | show |
Series | npm: improve fetcher and recipetool compatibility | expand |
On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote: > Our current emulation mandates that the package.json contains a version > field. Some packages may not provide it when they are not published to > the registry. The actual `npm pack` would allow such packages, so > should we. > > A first commit was required in the npm class. > > For the shrinkwrap part , we can actually use the resolved field which > contains the exact source, including the revision, to pass integrity > tests. > > v3: > - Split bitbake npmsw.py modification in another commit > > Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> > Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> > Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> > --- > bitbake/lib/bb/fetch2/npmsw.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py > index d8ed9df327..a5fa598deb 100644 > --- a/bitbake/lib/bb/fetch2/npmsw.py > +++ b/bitbake/lib/bb/fetch2/npmsw.py > @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): > > integrity = params.get("integrity", None) > resolved = params.get("resolved", None) > - version = params.get("version", None) > + version = params.get("version", resolved) > > # Handle registry sources > if is_semver(version) and integrity: I'm not convinced by this reasoning. The whole point of our SRC_URI is to precisely define something that is being built. I think this allows floating versions of dependencies and hence makes this non- reproducible? We should probably error if a version cannot be found and force the user to add one? I don't really know node well at all so if I'm incorrect about that feel free to explain. I do know our fetcher requirements which default to being fully specified and reproducible. Cheers, Richard
On Thu, Aug 15, 2024 at 7:12 AM Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote: %< snip %< > I'm not convinced by this reasoning. The whole point of our SRC_URI is > to precisely define something that is being built. I think this allows > floating versions of dependencies and hence makes this non- > reproducible? > > We should probably error if a version cannot be found and force the > user to add one? > Strong agreement with this point. YOLO'ing dependency information is probably a nice way to do rapid development and prototyping, especially if safety and security is not a concern. Otherwise I simply cannot wrap my mind around the idea of intentionally changing things to make them less traceable and less reproducible. ..Ch:W..
On 15/08/2024 16:12, Richard Purdie wrote: > On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote: >> Our current emulation mandates that the package.json contains a version >> field. Some packages may not provide it when they are not published to >> the registry. The actual `npm pack` would allow such packages, so >> should we. >> >> A first commit was required in the npm class. >> >> For the shrinkwrap part , we can actually use the resolved field which >> contains the exact source, including the revision, to pass integrity >> tests. >> >> v3: >> - Split bitbake npmsw.py modification in another commit >> >> Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> >> Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> >> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> >> --- >> bitbake/lib/bb/fetch2/npmsw.py | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py >> index d8ed9df327..a5fa598deb 100644 >> --- a/bitbake/lib/bb/fetch2/npmsw.py >> +++ b/bitbake/lib/bb/fetch2/npmsw.py >> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): >> >> integrity = params.get("integrity", None) >> resolved = params.get("resolved", None) >> - version = params.get("version", None) >> + version = params.get("version", resolved) >> >> # Handle registry sources >> if is_semver(version) and integrity: > > I'm not convinced by this reasoning. The whole point of our SRC_URI is > to precisely define something that is being built. I think this allows > floating versions of dependencies and hence makes this non- > reproducible? Hi, The version declared in package.json is purely declarative for the NPM package registry. The basic Yocto fetchers will still be used for downloading the SRC_URI (https, git, ...), with the same version reproducibility mechanisms like git hash or md5sum. In the npm.bbclass, we have a second fetch stage (actually during do_configure) to fetch all the dependencies. The npm class here behaves wildly differently than regular frameworks where each dependency is a separate recipe. But since there are myriads of dependencies in the NPM world and it would be rare to share them, the current recipe class is intended to package an application with all its dependencies using the NPM registry. To fetch all those dependencies, we rely on the NPM dependencies mechanism of package.json which lists packages with their declarative version. However this is not reproducible. NPM provides two additional files that NPM developers should include in their repositories to pin the whole dependency tree: package-lock.json and npm-shrinkwrap.json. The npm recipetool will fetch the dependencies the first time with package.json information, and produce an npm-shrinkwrap.json file that pins all versions. Back to our commit, we do not change the aforementioned behavior, we only make it compatible with packages which do not declare a version in their package.json. The npmsw.py code wants to have a version to validate what's being downloaded, so we pass a dummy version. But the npm-shrinkwrap.json file, generated by npm native, still contains the reproducible metadata. Example here with a package that doesn't declare a version field: NPM still added integrity and resolved fields which will ensure reproducibility: ``` "node_modules/cockpit-repo": { "name": "CockpitDevelopmentDependencies", "resolved": "git+ssh://git@github.com/cockpit-project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac", "integrity": "sha512-4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/fdyi1xrkysM9hxLq3Q16ofaw==", ``` > > We should probably error if a version cannot be found and force the > user to add one? An NPM project can typically pull hundreds of dependencies with some not declaring a version, the user would need to submit patches to all of their upstreams to fix that which is challenging. Meanwhile, the real NPM tool is perfectly fine with accepting that, as illustrated above. I believe we should hence behave like the official NPM tool. > > I don't really know node well at all so if I'm incorrect about that > feel free to explain. I do know our fetcher requirements which default > to being fully specified and reproducible. Let me know if something is still not clear in my answer, Best regards, > > Cheers, > > Richard > >
On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote: > On 15/08/2024 16:12, Richard Purdie wrote: > > On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote: > > > Our current emulation mandates that the package.json contains a version > > > field. Some packages may not provide it when they are not published to > > > the registry. The actual `npm pack` would allow such packages, so > > > should we. > > > > > > A first commit was required in the npm class. > > > > > > For the shrinkwrap part , we can actually use the resolved field which > > > contains the exact source, including the revision, to pass integrity > > > tests. > > > > > > v3: > > > - Split bitbake npmsw.py modification in another commit > > > > > > Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> > > > Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> > > > Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> > > > --- > > > bitbake/lib/bb/fetch2/npmsw.py | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py > > > index d8ed9df327..a5fa598deb 100644 > > > --- a/bitbake/lib/bb/fetch2/npmsw.py > > > +++ b/bitbake/lib/bb/fetch2/npmsw.py > > > @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): > > > > > > integrity = params.get("integrity", None) > > > resolved = params.get("resolved", None) > > > - version = params.get("version", None) > > > + version = params.get("version", resolved) > > > > > > # Handle registry sources > > > if is_semver(version) and integrity: > > > > I'm not convinced by this reasoning. The whole point of our SRC_URI is > > to precisely define something that is being built. I think this allows > > floating versions of dependencies and hence makes this non- > > reproducible? > > The version declared in package.json is purely declarative for the NPM > package registry. The basic Yocto fetchers will still be used for > downloading the SRC_URI (https, git, ...), with the same version > reproducibility mechanisms like git hash or md5sum. > > In the npm.bbclass, we have a second fetch stage (actually during > do_configure) to fetch all the dependencies. The npm class here behaves > wildly differently than regular frameworks where each dependency is a > separate recipe. But since there are myriads of dependencies in the NPM > world and it would be rare to share them, the current recipe class is > intended to package an application with all its dependencies using the > NPM registry. > > To fetch all those dependencies, we rely on the NPM dependencies > mechanism of package.json which lists packages with their declarative > version. However this is not reproducible. NPM provides two additional > files that NPM developers should include in their repositories to pin > the whole dependency tree: package-lock.json and npm-shrinkwrap.json. > The npm recipetool will fetch the dependencies the first time with > package.json information, and produce an npm-shrinkwrap.json file that > pins all versions. Are you saying that the codepath in question is only used to generate the npm-shrinkwrap.json file and that a normal do_fetch and a normal build will not fetch any floating version? I appreciate the npm build itself might not be deterministic but the point of our SRC_URI is to make it deterministic. I'm also worried that you implied fetching is happening during do_configure. That should not be the case and if that is happening, it needs to be changed so that all fetching happens during do_fetch. > Back to our commit, we do not change the aforementioned behavior, we > only make it compatible with packages which do not declare a version in > their package.json. The npmsw.py code wants to have a version to > validate what's being downloaded, so we pass a dummy version. But the > npm-shrinkwrap.json file, generated by npm native, still contains the > reproducible metadata. Example here with a package that doesn't declare > a version field: NPM still added integrity and resolved fields which > will ensure reproducibility: > > ``` > "node_modules/cockpit-repo": { > "name": "CockpitDevelopmentDependencies", > "resolved": > "git+ssh://git@github.com/cockpit-project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac", > "integrity": > "sha512-4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/fdyi1xrkysM9hxLq3Q16ofaw==", > ``` > > > > We should probably error if a version cannot be found and force the > > user to add one? > An NPM project can typically pull hundreds of dependencies with some not > declaring a version, the user would need to submit patches to all of > their upstreams to fix that which is challenging. Meanwhile, the real > NPM tool is perfectly fine with accepting that, as illustrated above. I > believe we should hence behave like the official NPM tool. I'm not asking for the upstream NPM projects to be locked down. I'm asking that our recipes always have versions specified. This is similar to what we do in other places like crates with carg under rust, which probably is the best example of how we can lock down situations like this. Cheers, Richard
On 20/08/2024 18:00, Richard Purdie wrote: > On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote: >> On 15/08/2024 16:12, Richard Purdie wrote: >>> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote: >>>> Our current emulation mandates that the package.json contains a version >>>> field. Some packages may not provide it when they are not published to >>>> the registry. The actual `npm pack` would allow such packages, so >>>> should we. >>>> >>>> A first commit was required in the npm class. >>>> >>>> For the shrinkwrap part , we can actually use the resolved field which >>>> contains the exact source, including the revision, to pass integrity >>>> tests. >>>> >>>> v3: >>>> - Split bitbake npmsw.py modification in another commit >>>> >>>> Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> >>>> Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com> >>>> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> >>>> --- >>>> bitbake/lib/bb/fetch2/npmsw.py | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py >>>> index d8ed9df327..a5fa598deb 100644 >>>> --- a/bitbake/lib/bb/fetch2/npmsw.py >>>> +++ b/bitbake/lib/bb/fetch2/npmsw.py >>>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): >>>> >>>> integrity = params.get("integrity", None) >>>> resolved = params.get("resolved", None) >>>> - version = params.get("version", None) >>>> + version = params.get("version", resolved) >>>> >>>> # Handle registry sources >>>> if is_semver(version) and integrity: >>> >>> I'm not convinced by this reasoning. The whole point of our SRC_URI is >>> to precisely define something that is being built. I think this allows >>> floating versions of dependencies and hence makes this non- >>> reproducible? >> >> The version declared in package.json is purely declarative for the NPM >> package registry. The basic Yocto fetchers will still be used for >> downloading the SRC_URI (https, git, ...), with the same version >> reproducibility mechanisms like git hash or md5sum. >> >> In the npm.bbclass, we have a second fetch stage (actually during >> do_configure) to fetch all the dependencies. The npm class here behaves >> wildly differently than regular frameworks where each dependency is a >> separate recipe. But since there are myriads of dependencies in the NPM >> world and it would be rare to share them, the current recipe class is >> intended to package an application with all its dependencies using the >> NPM registry. >> >> To fetch all those dependencies, we rely on the NPM dependencies >> mechanism of package.json which lists packages with their declarative >> version. However this is not reproducible. NPM provides two additional >> files that NPM developers should include in their repositories to pin >> the whole dependency tree: package-lock.json and npm-shrinkwrap.json. >> The npm recipetool will fetch the dependencies the first time with >> package.json information, and produce an npm-shrinkwrap.json file that >> pins all versions. > > Are you saying that the codepath in question is only used to generate > the npm-shrinkwrap.json file and that a normal do_fetch and a normal > build will not fetch any floating version? This codepath is used when fetching the dependencies listed in a npm-shrinkwrap.json. It will imitate what `npm install` does. It converts all the dependencies from the json format into SRC_URIs to pass to traditional bitbake fetchers like git, file and https. The `version` variable which is being modified here is taken from the optional "version" field in the dependency's json block. Before this commit, if a dependency didn't have a version field, we would throw an error and could not build this package although it was technically valid. Our modification allows to fallback to the "resolved" field which is always present in the generated npm-shrinkwrap.json files and contains similar information. As illustrated in the example bellow (cockpit-repo), if it is a git URL, it contains the commit hash which will be passed to the git fetcher. If it is an http URL, then the integrity field is mandatory and checksums the archive. If it's a local file URL, I could not see an integrity mechanism in npmsw.py, but it is not part of the scope of this modification. It looks like "file version" packages are not check-summed in this script but it is independent from my modification. > > I appreciate the npm build itself might not be deterministic but the > point of our SRC_URI is to make it deterministic. > > I'm also worried that you implied fetching is happening during > do_configure. That should not be the case and if that is happening, it > needs to be changed so that all fetching happens during do_fetch. I checked the responsibilities of each fetching task in npm.bbclass: - node modules are properly downloaded in do_fetch actually, they are individually placed in DL_DIR - do_unpack puts them into the workdir/sources-unpack/node_modules - do_configure will then put them in a local npm registry cache in the workdir/npm-cache - finally, do_compile will `npm install` them from the cache into the output node_modules directory of workdir/npm-build/node_modules So fetching is properly done is do_fetch :p > >> Back to our commit, we do not change the aforementioned behavior, we >> only make it compatible with packages which do not declare a version in >> their package.json. The npmsw.py code wants to have a version to >> validate what's being downloaded, so we pass a dummy version. But the >> npm-shrinkwrap.json file, generated by npm native, still contains the >> reproducible metadata. Example here with a package that doesn't declare >> a version field: NPM still added integrity and resolved fields which >> will ensure reproducibility: >> >> ``` >> "node_modules/cockpit-repo": { >> "name": "CockpitDevelopmentDependencies", >> "resolved": >> "git+ssh://git@github.com/cockpit-project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac", >> "integrity": >> "sha512-4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/fdyi1xrkysM9hxLq3Q16ofaw==", >> ``` >>> >>> We should probably error if a version cannot be found and force the >>> user to add one? >> An NPM project can typically pull hundreds of dependencies with some not >> declaring a version, the user would need to submit patches to all of >> their upstreams to fix that which is challenging. Meanwhile, the real >> NPM tool is perfectly fine with accepting that, as illustrated above. I >> believe we should hence behave like the official NPM tool. > > I'm not asking for the upstream NPM projects to be locked down. I'm > asking that our recipes always have versions specified. This is similar > to what we do in other places like crates with carg under rust, which > probably is the best example of how we can lock down situations like > this. This patch doesn't introduce nondeterministic fetching because the resolved and integrity field contain checksums/srcrev that will be handled just as before to ensure reproducibility. This is true for https and git dependencies. However from my analysis, it looks like if a npm-shrinkwrap.json is pointing to a local "file" version as a dependency, there is no checksum mechanism, but it may very well be a limitation of the npm-shrinkwrap.json format. This will only concern users who rely on local files which shouldn't be the norm. But this was the pre-existing behavior and is not caused by this patch. Specific npm-shrinkwrap.json pointing to local files without a version field couldn't be built without this modification. With this patch, they will now be accepted but inherit this lack of an existing integrity mechanism for local tarballs as it would have been even with a version field (which would not be used). I haven't played with `file` dependencies but maybe this is something we can improve in a distinct patchset. If the integrity field is always generated in npm-shrinkwrap.json, then we could add the integrity check for this kind of dependency. TLDR: Even if the version field is not specified, the fallback resolved/integrity fields contain the git commit SHA, or the HTTP tarball's checksum. However local file tarballs do not have an integrity mechanism, but it was already the case even if they declared a "version" field. > > Cheers, > > Richard >
On 21/08/2024 10:42, Herbrechtsmeier, Stefan wrote: > Hi, > > I'm not that deep into the topic anymore, but I can hopefully explain some implementation details. > >> -----Ursprüngliche Nachricht----- >> Von: Richard Purdie <richard.purdie@linuxfoundation.org> >> Gesendet: Dienstag, 20. August 2024 18:00 >> An: Enguerrand de Ribaucourt <enguerrand.de- >> ribaucourt@savoirfairelinux.com>; bitbake-devel@lists.openembedded.org >> Cc: tanguy.raufflet@savoirfairelinux.com; alexandre.belloni@bootlin.com; >> Herbrechtsmeier, Stefan <Stefan.Herbrechtsmeier@weidmueller.com> >> Betreff: [EXTERNAL] Re: [bitbake-devel][PATCH v4 2/2] npmsw: accept >> unspecified versions in package.json >> >> CAUTION: This email originated from outside the organization. Do not click >> links or open attachments unless you recognize the sender and know the >> content is safe. >> >> >> >> On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote: >>> On 15/08/2024 16:12, Richard Purdie wrote: >>>> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote: >>>>> Our current emulation mandates that the package.json contains a >>>>> version field. Some packages may not provide it when they are not >>>>> published to the registry. The actual `npm pack` would allow such >>>>> packages, so should we. >>>>> >>>>> A first commit was required in the npm class. >>>>> >>>>> For the shrinkwrap part , we can actually use the resolved field >>>>> which contains the exact source, including the revision, to pass >>>>> integrity tests. >>>>> >>>>> v3: >>>>> - Split bitbake npmsw.py modification in another commit >>>>> >>>>> Co-authored-by: Tanguy Raufflet >>>>> <tanguy.raufflet@savoirfairelinux.com> >>>>> Signed-off-by: Tanguy Raufflet >>>>> <tanguy.raufflet@savoirfairelinux.com> >>>>> Signed-off-by: Enguerrand de Ribaucourt >>>>> <enguerrand.de-ribaucourt@savoirfairelinux.com> >>>>> --- >>>>> bitbake/lib/bb/fetch2/npmsw.py | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py >>>>> b/bitbake/lib/bb/fetch2/npmsw.py index d8ed9df327..a5fa598deb >>>>> 100644 >>>>> --- a/bitbake/lib/bb/fetch2/npmsw.py >>>>> +++ b/bitbake/lib/bb/fetch2/npmsw.py >>>>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): >>>>> >>>>> integrity = params.get("integrity", None) >>>>> resolved = params.get("resolved", None) >>>>> - version = params.get("version", None) >>>>> + version = params.get("version", resolved) >>>>> >>>>> # Handle registry sources >>>>> if is_semver(version) and integrity: >>>> >>>> I'm not convinced by this reasoning. The whole point of our SRC_URI >>>> is to precisely define something that is being built. I think this >>>> allows floating versions of dependencies and hence makes this non- >>>> reproducible? >>> >>> The version declared in package.json is purely declarative for the NPM >>> package registry. The basic Yocto fetchers will still be used for >>> downloading the SRC_URI (https, git, ...), with the same version >>> reproducibility mechanisms like git hash or md5sum. >>> >>> In the npm.bbclass, we have a second fetch stage (actually during >>> do_configure) to fetch all the dependencies. The npm class here >>> behaves wildly differently than regular frameworks where each >>> dependency is a separate recipe. But since there are myriads of >>> dependencies in the NPM world and it would be rare to share them, the >>> current recipe class is intended to package an application with all >>> its dependencies using the NPM registry. >>> >>> To fetch all those dependencies, we rely on the NPM dependencies >>> mechanism of package.json which lists packages with their declarative >>> version. However this is not reproducible. NPM provides two additional >>> files that NPM developers should include in their repositories to pin >>> the whole dependency tree: package-lock.json and npm-shrinkwrap.json. >>> The npm recipetool will fetch the dependencies the first time with >>> package.json information, and produce an npm-shrinkwrap.json file that >>> pins all versions. >> >> Are you saying that the codepath in question is only used to generate the >> npm-shrinkwrap.json file and that a normal do_fetch and a normal build will >> not fetch any floating version? > > The npm-shrinkwrap.json is generated by npm. In contrast to rust and go the lock file isn't processed during recipe creation but during build. This leads to the problem that the recipe author need to fix the npm-shrinkwrap.json. > > The npm-shrinkwrap.json is a common lock file and locks the download sources. It contains the download source inside the `resolved` field [1]. In case of git it contains the full git url with commit sha. This means the fetch use an unspecified release version but not a floating version. > > [1] https://docs.npmjs.com/cli/v10/configuring-npm/package-lock-json#packages > > >> I appreciate the npm build itself might not be deterministic but the point of >> our SRC_URI is to make it deterministic. >> >> I'm also worried that you implied fetching is happening during do_configure. >> That should not be the case and if that is happening, it needs to be changed so >> that all fetching happens during do_fetch. > > The npm command in do_configure fetches the packages from a local temporary repository. The local repository is filled by pre fetch packages. The packages are fetched during do_fetch. Therefore bitbake parses the npm-shrinkwrap.json during fetch and creates a list of fetch uris on-the-fly . > > >>> Back to our commit, we do not change the aforementioned behavior, we >>> only make it compatible with packages which do not declare a version >>> in their package.json. The npmsw.py code wants to have a version to >>> validate what's being downloaded, so we pass a dummy version. But the >>> npm-shrinkwrap.json file, generated by npm native, still contains the >>> reproducible metadata. Example here with a package that doesn't >>> declare a version field: NPM still added integrity and resolved fields >>> which will ensure reproducibility: >>> >>> ``` >>> "node_modules/cockpit-repo": { >>> "name": "CockpitDevelopmentDependencies", >>> "resolved": >>> "git+ssh://git@github.com/cockpit- >> project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac", >>> "integrity": >>> "sha512- >> 4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/ >>> fdyi1xrkysM9hxLq3Q16ofaw==", >>> ``` >>>> >>>> We should probably error if a version cannot be found and force the >>>> user to add one? >>> An NPM project can typically pull hundreds of dependencies with some >>> not declaring a version, the user would need to submit patches to all >>> of their upstreams to fix that which is challenging. Meanwhile, the >>> real NPM tool is perfectly fine with accepting that, as illustrated >>> above. I believe we should hence behave like the official NPM tool. >> >> I'm not asking for the upstream NPM projects to be locked down. I'm asking >> that our recipes always have versions specified. This is similar to what we do in >> other places like crates with carg under rust, which probably is the best >> example of how we can lock down situations like this. > > The rust solution has the advantage, that the lock is parsed during recipe creation and the recipe author can rework the result. > > But the real problem is that the project wasn't published to the registry and has no release. Both cargo and npm support git repositories without a version. Maybe the parser should set the version to `0-git<short git commit sha>` and extract the hash from the `resolved` field. We do declare a dummy version in another patchset somewhere else indeed to pass some checks and allow building such packages. However in this specific place, the version field is what's used to fetch the packages and it needs a URL (git://, ...) which is conveniently provided by the `resolved` field! > > Regards > Stefan > > ________________________________ > Kommanditgesellschaft - Sitz: Detmold - Amtsgericht Lemgo HRA 2790 - > Komplementärin: Weidmüller Interface Führungsgesellschaft mbH - > Sitz: Detmold - Amtsgericht Lemgo HRB 3924; > Geschäftsführer: Dr. Timo Berger, Volker Bibelhausen, Dr. Sebastian Durst, André Sombecki; > USt-ID-Nr. DE124599660
This still doesn't entirely compute. We do have npm fetcher tests in bitbake, but they test for existence of 'npm', and are skipped if npm is not there. And so they're never run on the autobuilder, and npm fetcher regularly regresses. On the other hand, you seem to be claiming that the fetcher does not require npm. But then this skipping is unnecessary and should be dropped, and all tests should pass. Once we resolve that point, we should ensure that tests do check both possibilities: - modules that have checksum information result in a successful fetch opetation - if something does not have such information, the fetch fails, and that is interpreted as the test succeeding This would reassure everyone that the fetcher does fulfil source integrity requirements a lot more than a hand-written description of how it operates. Alex On Wed, 21 Aug 2024 at 12:08, Enguerrand de Ribaucourt via lists.openembedded.org <enguerrand.de-ribaucourt=savoirfairelinux.com@lists.openembedded.org> wrote: > > > > On 21/08/2024 10:42, Herbrechtsmeier, Stefan wrote: > > Hi, > > > > I'm not that deep into the topic anymore, but I can hopefully explain some implementation details. > > > >> -----Ursprüngliche Nachricht----- > >> Von: Richard Purdie <richard.purdie@linuxfoundation.org> > >> Gesendet: Dienstag, 20. August 2024 18:00 > >> An: Enguerrand de Ribaucourt <enguerrand.de- > >> ribaucourt@savoirfairelinux.com>; bitbake-devel@lists.openembedded.org > >> Cc: tanguy.raufflet@savoirfairelinux.com; alexandre.belloni@bootlin.com; > >> Herbrechtsmeier, Stefan <Stefan.Herbrechtsmeier@weidmueller.com> > >> Betreff: [EXTERNAL] Re: [bitbake-devel][PATCH v4 2/2] npmsw: accept > >> unspecified versions in package.json > >> > >> CAUTION: This email originated from outside the organization. Do not click > >> links or open attachments unless you recognize the sender and know the > >> content is safe. > >> > >> > >> > >> On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote: > >>> On 15/08/2024 16:12, Richard Purdie wrote: > >>>> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote: > >>>>> Our current emulation mandates that the package.json contains a > >>>>> version field. Some packages may not provide it when they are not > >>>>> published to the registry. The actual `npm pack` would allow such > >>>>> packages, so should we. > >>>>> > >>>>> A first commit was required in the npm class. > >>>>> > >>>>> For the shrinkwrap part , we can actually use the resolved field > >>>>> which contains the exact source, including the revision, to pass > >>>>> integrity tests. > >>>>> > >>>>> v3: > >>>>> - Split bitbake npmsw.py modification in another commit > >>>>> > >>>>> Co-authored-by: Tanguy Raufflet > >>>>> <tanguy.raufflet@savoirfairelinux.com> > >>>>> Signed-off-by: Tanguy Raufflet > >>>>> <tanguy.raufflet@savoirfairelinux.com> > >>>>> Signed-off-by: Enguerrand de Ribaucourt > >>>>> <enguerrand.de-ribaucourt@savoirfairelinux.com> > >>>>> --- > >>>>> bitbake/lib/bb/fetch2/npmsw.py | 2 +- > >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py > >>>>> b/bitbake/lib/bb/fetch2/npmsw.py index d8ed9df327..a5fa598deb > >>>>> 100644 > >>>>> --- a/bitbake/lib/bb/fetch2/npmsw.py > >>>>> +++ b/bitbake/lib/bb/fetch2/npmsw.py > >>>>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): > >>>>> > >>>>> integrity = params.get("integrity", None) > >>>>> resolved = params.get("resolved", None) > >>>>> - version = params.get("version", None) > >>>>> + version = params.get("version", resolved) > >>>>> > >>>>> # Handle registry sources > >>>>> if is_semver(version) and integrity: > >>>> > >>>> I'm not convinced by this reasoning. The whole point of our SRC_URI > >>>> is to precisely define something that is being built. I think this > >>>> allows floating versions of dependencies and hence makes this non- > >>>> reproducible? > >>> > >>> The version declared in package.json is purely declarative for the NPM > >>> package registry. The basic Yocto fetchers will still be used for > >>> downloading the SRC_URI (https, git, ...), with the same version > >>> reproducibility mechanisms like git hash or md5sum. > >>> > >>> In the npm.bbclass, we have a second fetch stage (actually during > >>> do_configure) to fetch all the dependencies. The npm class here > >>> behaves wildly differently than regular frameworks where each > >>> dependency is a separate recipe. But since there are myriads of > >>> dependencies in the NPM world and it would be rare to share them, the > >>> current recipe class is intended to package an application with all > >>> its dependencies using the NPM registry. > >>> > >>> To fetch all those dependencies, we rely on the NPM dependencies > >>> mechanism of package.json which lists packages with their declarative > >>> version. However this is not reproducible. NPM provides two additional > >>> files that NPM developers should include in their repositories to pin > >>> the whole dependency tree: package-lock.json and npm-shrinkwrap.json. > >>> The npm recipetool will fetch the dependencies the first time with > >>> package.json information, and produce an npm-shrinkwrap.json file that > >>> pins all versions. > >> > >> Are you saying that the codepath in question is only used to generate the > >> npm-shrinkwrap.json file and that a normal do_fetch and a normal build will > >> not fetch any floating version? > > > > The npm-shrinkwrap.json is generated by npm. In contrast to rust and go the lock file isn't processed during recipe creation but during build. This leads to the problem that the recipe author need to fix the npm-shrinkwrap.json. > > > > The npm-shrinkwrap.json is a common lock file and locks the download sources. It contains the download source inside the `resolved` field [1]. In case of git it contains the full git url with commit sha. This means the fetch use an unspecified release version but not a floating version. > > > > [1] https://docs.npmjs.com/cli/v10/configuring-npm/package-lock-json#packages > > > > > >> I appreciate the npm build itself might not be deterministic but the point of > >> our SRC_URI is to make it deterministic. > >> > >> I'm also worried that you implied fetching is happening during do_configure. > >> That should not be the case and if that is happening, it needs to be changed so > >> that all fetching happens during do_fetch. > > > > The npm command in do_configure fetches the packages from a local temporary repository. The local repository is filled by pre fetch packages. The packages are fetched during do_fetch. Therefore bitbake parses the npm-shrinkwrap.json during fetch and creates a list of fetch uris on-the-fly . > > > > > >>> Back to our commit, we do not change the aforementioned behavior, we > >>> only make it compatible with packages which do not declare a version > >>> in their package.json. The npmsw.py code wants to have a version to > >>> validate what's being downloaded, so we pass a dummy version. But the > >>> npm-shrinkwrap.json file, generated by npm native, still contains the > >>> reproducible metadata. Example here with a package that doesn't > >>> declare a version field: NPM still added integrity and resolved fields > >>> which will ensure reproducibility: > >>> > >>> ``` > >>> "node_modules/cockpit-repo": { > >>> "name": "CockpitDevelopmentDependencies", > >>> "resolved": > >>> "git+ssh://git@github.com/cockpit- > >> project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac", > >>> "integrity": > >>> "sha512- > >> 4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/ > >>> fdyi1xrkysM9hxLq3Q16ofaw==", > >>> ``` > >>>> > >>>> We should probably error if a version cannot be found and force the > >>>> user to add one? > >>> An NPM project can typically pull hundreds of dependencies with some > >>> not declaring a version, the user would need to submit patches to all > >>> of their upstreams to fix that which is challenging. Meanwhile, the > >>> real NPM tool is perfectly fine with accepting that, as illustrated > >>> above. I believe we should hence behave like the official NPM tool. > >> > >> I'm not asking for the upstream NPM projects to be locked down. I'm asking > >> that our recipes always have versions specified. This is similar to what we do in > >> other places like crates with carg under rust, which probably is the best > >> example of how we can lock down situations like this. > > > > The rust solution has the advantage, that the lock is parsed during recipe creation and the recipe author can rework the result. > > > > But the real problem is that the project wasn't published to the registry and has no release. Both cargo and npm support git repositories without a version. Maybe the parser should set the version to `0-git<short git commit sha>` and extract the hash from the `resolved` field. > We do declare a dummy version in another patchset somewhere else indeed > to pass some checks and allow building such packages. However in this > specific place, the version field is what's used to fetch the packages > and it needs a URL (git://, ...) which is conveniently provided by the > `resolved` field! > > > > Regards > > Stefan > > > > ________________________________ > > Kommanditgesellschaft - Sitz: Detmold - Amtsgericht Lemgo HRA 2790 - > > Komplementärin: Weidmüller Interface Führungsgesellschaft mbH - > > Sitz: Detmold - Amtsgericht Lemgo HRB 3924; > > Geschäftsführer: Dr. Timo Berger, Volker Bibelhausen, Dr. Sebastian Durst, André Sombecki; > > USt-ID-Nr. DE124599660 > > -- > Savoir-faire Linux > Enguerrand de Ribaucourt > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#16505): https://lists.openembedded.org/g/bitbake-devel/message/16505 > Mute This Topic: https://lists.openembedded.org/mt/107855179/1686489 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- >
[snip] >>>>> >>>> Back to our commit, we do not change the aforementioned behavior, we >>>> only make it compatible with packages which do not declare a version >>>> in their package.json. The npmsw.py code wants to have a version to >>>> validate what's being downloaded, so we pass a dummy version. But the >>>> npm-shrinkwrap.json file, generated by npm native, still contains the >>>> reproducible metadata. Example here with a package that doesn't >>>> declare a version field: NPM still added integrity and resolved fields >>>> which will ensure reproducibility: >>>> >>>> ``` >>>> "node_modules/cockpit-repo": { >>>> "name": "CockpitDevelopmentDependencies", >>>> "resolved": >>>> "git+ssh://git@github.com/cockpit- >>> project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac", >>>> "integrity": >>>> "sha512- >>> 4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/ >>>> fdyi1xrkysM9hxLq3Q16ofaw==", >>>> ``` >>>>> >>>>> We should probably error if a version cannot be found and force the >>>>> user to add one? >>>> An NPM project can typically pull hundreds of dependencies with some >>>> not declaring a version, the user would need to submit patches to all >>>> of their upstreams to fix that which is challenging. Meanwhile, the >>>> real NPM tool is perfectly fine with accepting that, as illustrated >>>> above. I believe we should hence behave like the official NPM tool. >>> >>> I'm not asking for the upstream NPM projects to be locked down. I'm >>> asking >>> that our recipes always have versions specified. This is similar to >>> what we do in >>> other places like crates with carg under rust, which probably is the >>> best >>> example of how we can lock down situations like this. >> >> The rust solution has the advantage, that the lock is parsed during >> recipe creation and the recipe author can rework the result. >> >> But the real problem is that the project wasn't published to the >> registry and has no release. Both cargo and npm support git >> repositories without a version. Maybe the parser should set the >> version to `0-git<short git commit sha>` and extract the hash from >> the `resolved` field. > We do declare a dummy version in another patchset somewhere else > indeed to pass some checks and allow building such packages. However > in this specific place, the version field is what's used to fetch the > packages and it needs a URL (git://, ...) which is conveniently > provided by the `resolved` field! You are right the npm version field isn't always a semantic version. Maybe the git sources handler should always use the `resolved` field.
On Wed, 2024-08-21 at 12:06 +0200, Enguerrand de Ribaucourt wrote: > This patch doesn't introduce nondeterministic fetching because the > resolved and integrity field contain checksums/srcrev that will be > handled just as before to ensure reproducibility. This is true for https > and git dependencies. > > However from my analysis, it looks like if a npm-shrinkwrap.json is > pointing to a local "file" version as a dependency, there is no checksum > mechanism, but it may very well be a limitation of the > npm-shrinkwrap.json format. This will only concern users who rely on > local files which shouldn't be the norm. But this was the pre-existing > behavior and is not caused by this patch. Specific npm-shrinkwrap.json > pointing to local files without a version field couldn't be built > without this modification. With this patch, they will now be accepted > but inherit this lack of an existing integrity mechanism for local > tarballs as it would have been even with a version field (which would > not be used). > > I haven't played with `file` dependencies but maybe this is something we > can improve in a distinct patchset. If the integrity field is always > generated in npm-shrinkwrap.json, then we could add the integrity check > for this kind of dependency. > > TLDR: Even if the version field is not specified, the fallback > resolved/integrity fields contain the git commit SHA, or the HTTP > tarball's checksum. However local file tarballs do not have an integrity > mechanism, but it was already the case even if they declared a "version" > field. > I think this all makes sense and I agree, it doesn't sound like there is an issue here with the use of the resolved field. What I would like to ask is that the commit message be improved so that this is better documented and the commit more easily understood from it. In particular, it should be clear why it doesn't break determinism. I suspect we also need to find a way to test the npm fetcher on the autobuilder, maybe in a standalone job that pulls in meta-oe and builds the tools it needs. We can then start expanding the test cases to cover these issues as they arise. For better or worse, the fetcher had a ton of corner cases and we need to account for them all with test cases so we can make changes to the code without fear of regressing anything. Outside of npm, I think we do a reasonable job but right now we don't run the npm cases sadly. Cheers, Richard
diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py index d8ed9df327..a5fa598deb 100644 --- a/bitbake/lib/bb/fetch2/npmsw.py +++ b/bitbake/lib/bb/fetch2/npmsw.py @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod): integrity = params.get("integrity", None) resolved = params.get("resolved", None) - version = params.get("version", None) + version = params.get("version", resolved) # Handle registry sources if is_semver(version) and integrity: