diff mbox series

[bitbake-devel,v4,2/2] npmsw: accept unspecified versions in package.json

Message ID 20240812122539.573337-3-enguerrand.de-ribaucourt@savoirfairelinux.com
State New
Headers show
Series npm: improve fetcher and recipetool compatibility | expand

Commit Message

Enguerrand de Ribaucourt Aug. 12, 2024, 12:25 p.m. UTC
Our current emulation mandates that the package.json contains a version
field. Some packages may not provide it when they are not published to
the registry. The actual `npm pack` would allow such packages, so
should we.

A first commit was required in the npm class.

For the shrinkwrap part , we can actually use the resolved field which
contains the exact source, including the revision, to pass integrity
tests.

v3:
 - Split bitbake npmsw.py modification in another commit

Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
---
 bitbake/lib/bb/fetch2/npmsw.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Purdie Aug. 15, 2024, 2:12 p.m. UTC | #1
On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote:
> Our current emulation mandates that the package.json contains a version
> field. Some packages may not provide it when they are not published to
> the registry. The actual `npm pack` would allow such packages, so
> should we.
> 
> A first commit was required in the npm class.
> 
> For the shrinkwrap part , we can actually use the resolved field which
> contains the exact source, including the revision, to pass integrity
> tests.
> 
> v3:
>  - Split bitbake npmsw.py modification in another commit
> 
> Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
> Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
> ---
>  bitbake/lib/bb/fetch2/npmsw.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py
> index d8ed9df327..a5fa598deb 100644
> --- a/bitbake/lib/bb/fetch2/npmsw.py
> +++ b/bitbake/lib/bb/fetch2/npmsw.py
> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod):
>  
>              integrity = params.get("integrity", None)
>              resolved = params.get("resolved", None)
> -            version = params.get("version", None)
> +            version = params.get("version", resolved)
>  
>              # Handle registry sources
>              if is_semver(version) and integrity:

I'm not convinced by this reasoning. The whole point of our SRC_URI is
to precisely define something that is being built. I think this allows
floating versions of dependencies and hence makes this non-
reproducible?

We should probably error if a version cannot be found and force the
user to add one?

I don't really know node well at all so if I'm incorrect about that
feel free to explain. I do know our fetcher requirements which default
to being fully specified and reproducible.

Cheers,

Richard
Chuck Wolber Aug. 16, 2024, 7:12 p.m. UTC | #2
On Thu, Aug 15, 2024 at 7:12 AM Richard Purdie via lists.openembedded.org
<richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:

%< snip %<


> I'm not convinced by this reasoning. The whole point of our SRC_URI is
> to precisely define something that is being built. I think this allows
> floating versions of dependencies and hence makes this non-
> reproducible?
>
> We should probably error if a version cannot be found and force the
> user to add one?
>

Strong agreement with this point.

YOLO'ing dependency information is probably a nice way to do rapid
development
and prototyping, especially if safety and security is not a concern.
Otherwise
I simply cannot wrap my mind around the idea of intentionally changing
things
to make them less traceable and less reproducible.

..Ch:W..
Enguerrand de Ribaucourt Aug. 19, 2024, 7:58 a.m. UTC | #3
On 15/08/2024 16:12, Richard Purdie wrote:
> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote:
>> Our current emulation mandates that the package.json contains a version
>> field. Some packages may not provide it when they are not published to
>> the registry. The actual `npm pack` would allow such packages, so
>> should we.
>>
>> A first commit was required in the npm class.
>>
>> For the shrinkwrap part , we can actually use the resolved field which
>> contains the exact source, including the revision, to pass integrity
>> tests.
>>
>> v3:
>>   - Split bitbake npmsw.py modification in another commit
>>
>> Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
>> Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
>> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
>> ---
>>   bitbake/lib/bb/fetch2/npmsw.py | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py
>> index d8ed9df327..a5fa598deb 100644
>> --- a/bitbake/lib/bb/fetch2/npmsw.py
>> +++ b/bitbake/lib/bb/fetch2/npmsw.py
>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod):
>>   
>>               integrity = params.get("integrity", None)
>>               resolved = params.get("resolved", None)
>> -            version = params.get("version", None)
>> +            version = params.get("version", resolved)
>>   
>>               # Handle registry sources
>>               if is_semver(version) and integrity:
> 
> I'm not convinced by this reasoning. The whole point of our SRC_URI is
> to precisely define something that is being built. I think this allows
> floating versions of dependencies and hence makes this non-
> reproducible?
Hi,

The version declared in package.json is purely declarative for the NPM 
package registry. The basic Yocto fetchers will still be used for 
downloading the SRC_URI (https, git, ...), with the same version 
reproducibility mechanisms like git hash or md5sum.

In the npm.bbclass, we have a second fetch stage (actually during 
do_configure) to fetch all the dependencies. The npm class here behaves 
wildly differently than regular frameworks where each dependency is a 
separate recipe. But since there are myriads of dependencies in the NPM 
world and it would be rare to share them, the current recipe class is 
intended to package an application with all its dependencies using the 
NPM registry.

To fetch all those dependencies, we rely on the NPM dependencies 
mechanism of package.json which lists packages with their declarative 
version. However this is not reproducible. NPM provides two additional 
files that NPM developers should include in their repositories to pin 
the whole dependency tree: package-lock.json and npm-shrinkwrap.json. 
The npm recipetool will fetch the dependencies the first time with 
package.json information, and produce an npm-shrinkwrap.json file that 
pins all versions.

Back to our commit, we do not change the aforementioned behavior, we 
only make it compatible with packages which do not declare a version in 
their package.json. The npmsw.py code wants to have a version to 
validate what's being downloaded, so we pass a dummy version. But the 
npm-shrinkwrap.json file, generated by npm native, still contains the 
reproducible metadata. Example here with a package that doesn't declare 
a version field: NPM still added integrity and resolved fields which 
will ensure reproducibility:

```
     "node_modules/cockpit-repo": {
       "name": "CockpitDevelopmentDependencies",
       "resolved": 
"git+ssh://git@github.com/cockpit-project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac",
       "integrity": 
"sha512-4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/fdyi1xrkysM9hxLq3Q16ofaw==",
```
> 
> We should probably error if a version cannot be found and force the
> user to add one?
An NPM project can typically pull hundreds of dependencies with some not 
declaring a version, the user would need to submit patches to all of 
their upstreams to fix that which is challenging. Meanwhile, the real 
NPM tool is perfectly fine with accepting that, as illustrated above. I 
believe we should hence behave like the official NPM tool.
> 
> I don't really know node well at all so if I'm incorrect about that
> feel free to explain. I do know our fetcher requirements which default
> to being fully specified and reproducible.
Let me know if something is still not clear in my answer,

Best regards,
>
> Cheers,
> 
> Richard
> 
>
Richard Purdie Aug. 20, 2024, 4 p.m. UTC | #4
On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote:
> On 15/08/2024 16:12, Richard Purdie wrote:
> > On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote:
> > > Our current emulation mandates that the package.json contains a version
> > > field. Some packages may not provide it when they are not published to
> > > the registry. The actual `npm pack` would allow such packages, so
> > > should we.
> > > 
> > > A first commit was required in the npm class.
> > > 
> > > For the shrinkwrap part , we can actually use the resolved field which
> > > contains the exact source, including the revision, to pass integrity
> > > tests.
> > > 
> > > v3:
> > >   - Split bitbake npmsw.py modification in another commit
> > > 
> > > Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
> > > Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
> > > Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
> > > ---
> > >   bitbake/lib/bb/fetch2/npmsw.py | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py
> > > index d8ed9df327..a5fa598deb 100644
> > > --- a/bitbake/lib/bb/fetch2/npmsw.py
> > > +++ b/bitbake/lib/bb/fetch2/npmsw.py
> > > @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod):
> > >   
> > >               integrity = params.get("integrity", None)
> > >               resolved = params.get("resolved", None)
> > > -            version = params.get("version", None)
> > > +            version = params.get("version", resolved)
> > >   
> > >               # Handle registry sources
> > >               if is_semver(version) and integrity:
> > 
> > I'm not convinced by this reasoning. The whole point of our SRC_URI is
> > to precisely define something that is being built. I think this allows
> > floating versions of dependencies and hence makes this non-
> > reproducible?
> 
> The version declared in package.json is purely declarative for the NPM 
> package registry. The basic Yocto fetchers will still be used for 
> downloading the SRC_URI (https, git, ...), with the same version 
> reproducibility mechanisms like git hash or md5sum.
> 
> In the npm.bbclass, we have a second fetch stage (actually during 
> do_configure) to fetch all the dependencies. The npm class here behaves 
> wildly differently than regular frameworks where each dependency is a
> separate recipe. But since there are myriads of dependencies in the NPM 
> world and it would be rare to share them, the current recipe class is
> intended to package an application with all its dependencies using the 
> NPM registry.
> 
> To fetch all those dependencies, we rely on the NPM dependencies 
> mechanism of package.json which lists packages with their declarative
> version. However this is not reproducible. NPM provides two additional 
> files that NPM developers should include in their repositories to pin
> the whole dependency tree: package-lock.json and npm-shrinkwrap.json.
> The npm recipetool will fetch the dependencies the first time with 
> package.json information, and produce an npm-shrinkwrap.json file that 
> pins all versions.

Are you saying that the codepath in question is only used to generate
the npm-shrinkwrap.json file and that a normal do_fetch and a normal
build will not fetch any floating version?

I appreciate the npm build itself might not be deterministic but the
point of our SRC_URI is to make it deterministic.

I'm also worried that you implied fetching is happening during
do_configure. That should not be the case and if that is happening, it
needs to be changed so that all fetching happens during do_fetch.

> Back to our commit, we do not change the aforementioned behavior, we 
> only make it compatible with packages which do not declare a version in 
> their package.json. The npmsw.py code wants to have a version to 
> validate what's being downloaded, so we pass a dummy version. But the
> npm-shrinkwrap.json file, generated by npm native, still contains the
> reproducible metadata. Example here with a package that doesn't declare 
> a version field: NPM still added integrity and resolved fields which 
> will ensure reproducibility:
> 
> ```
>      "node_modules/cockpit-repo": {
>        "name": "CockpitDevelopmentDependencies",
>        "resolved": 
> "git+ssh://git@github.com/cockpit-project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac",
>        "integrity": 
> "sha512-4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/fdyi1xrkysM9hxLq3Q16ofaw==",
> ```
> > 
> > We should probably error if a version cannot be found and force the
> > user to add one?
> An NPM project can typically pull hundreds of dependencies with some not 
> declaring a version, the user would need to submit patches to all of 
> their upstreams to fix that which is challenging. Meanwhile, the real
> NPM tool is perfectly fine with accepting that, as illustrated above. I 
> believe we should hence behave like the official NPM tool.

I'm not asking for the upstream NPM projects to be locked down. I'm
asking that our recipes always have versions specified. This is similar
to what we do in other places like crates with carg under rust, which
probably is the best example of how we can lock down situations like
this.

Cheers,

Richard
Enguerrand de Ribaucourt Aug. 21, 2024, 10:06 a.m. UTC | #5
On 20/08/2024 18:00, Richard Purdie wrote:
> On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote:
>> On 15/08/2024 16:12, Richard Purdie wrote:
>>> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote:
>>>> Our current emulation mandates that the package.json contains a version
>>>> field. Some packages may not provide it when they are not published to
>>>> the registry. The actual `npm pack` would allow such packages, so
>>>> should we.
>>>>
>>>> A first commit was required in the npm class.
>>>>
>>>> For the shrinkwrap part , we can actually use the resolved field which
>>>> contains the exact source, including the revision, to pass integrity
>>>> tests.
>>>>
>>>> v3:
>>>>    - Split bitbake npmsw.py modification in another commit
>>>>
>>>> Co-authored-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
>>>> Signed-off-by: Tanguy Raufflet <tanguy.raufflet@savoirfairelinux.com>
>>>> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
>>>> ---
>>>>    bitbake/lib/bb/fetch2/npmsw.py | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py
>>>> index d8ed9df327..a5fa598deb 100644
>>>> --- a/bitbake/lib/bb/fetch2/npmsw.py
>>>> +++ b/bitbake/lib/bb/fetch2/npmsw.py
>>>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod):
>>>>    
>>>>                integrity = params.get("integrity", None)
>>>>                resolved = params.get("resolved", None)
>>>> -            version = params.get("version", None)
>>>> +            version = params.get("version", resolved)
>>>>    
>>>>                # Handle registry sources
>>>>                if is_semver(version) and integrity:
>>>
>>> I'm not convinced by this reasoning. The whole point of our SRC_URI is
>>> to precisely define something that is being built. I think this allows
>>> floating versions of dependencies and hence makes this non-
>>> reproducible?
>>
>> The version declared in package.json is purely declarative for the NPM
>> package registry. The basic Yocto fetchers will still be used for
>> downloading the SRC_URI (https, git, ...), with the same version
>> reproducibility mechanisms like git hash or md5sum.
>>
>> In the npm.bbclass, we have a second fetch stage (actually during
>> do_configure) to fetch all the dependencies. The npm class here behaves
>> wildly differently than regular frameworks where each dependency is a
>> separate recipe. But since there are myriads of dependencies in the NPM
>> world and it would be rare to share them, the current recipe class is
>> intended to package an application with all its dependencies using the
>> NPM registry.
>>
>> To fetch all those dependencies, we rely on the NPM dependencies
>> mechanism of package.json which lists packages with their declarative
>> version. However this is not reproducible. NPM provides two additional
>> files that NPM developers should include in their repositories to pin
>> the whole dependency tree: package-lock.json and npm-shrinkwrap.json.
>> The npm recipetool will fetch the dependencies the first time with
>> package.json information, and produce an npm-shrinkwrap.json file that
>> pins all versions.
> 
> Are you saying that the codepath in question is only used to generate
> the npm-shrinkwrap.json file and that a normal do_fetch and a normal
> build will not fetch any floating version?
This codepath is used when fetching the dependencies listed in a 
npm-shrinkwrap.json. It will imitate what `npm install` does. It 
converts all the dependencies from the json format into SRC_URIs to pass 
to traditional bitbake fetchers like git, file and https. The `version` 
variable which is being modified here is taken from the optional 
"version" field in the dependency's json block. Before this commit, if a 
dependency didn't have a version field, we would throw an error and 
could not build this package although it was technically valid.

Our modification allows to fallback to the "resolved" field which is 
always present in the generated npm-shrinkwrap.json files and contains 
similar information. As illustrated in the example bellow 
(cockpit-repo), if it is a git URL, it contains the commit hash which 
will be passed to the git fetcher. If it is an http URL, then the 
integrity field is mandatory and checksums the archive. If it's a local 
file URL, I could not see an integrity mechanism in npmsw.py, but it is 
not part of the scope of this modification. It looks like "file version" 
packages are not check-summed in this script but it is independent from 
my modification.
> 
> I appreciate the npm build itself might not be deterministic but the
> point of our SRC_URI is to make it deterministic.
> 
> I'm also worried that you implied fetching is happening during
> do_configure. That should not be the case and if that is happening, it
> needs to be changed so that all fetching happens during do_fetch.
I checked the responsibilities of each fetching task in npm.bbclass:

  - node modules are properly downloaded in do_fetch actually,
    they are individually placed in DL_DIR
  - do_unpack puts them into the workdir/sources-unpack/node_modules
  - do_configure will then put them in a local npm registry cache in the 
workdir/npm-cache
  - finally, do_compile will `npm install` them from the cache into the 
output node_modules directory of workdir/npm-build/node_modules

So fetching is properly done is do_fetch :p
> 
>> Back to our commit, we do not change the aforementioned behavior, we
>> only make it compatible with packages which do not declare a version in
>> their package.json. The npmsw.py code wants to have a version to
>> validate what's being downloaded, so we pass a dummy version. But the
>> npm-shrinkwrap.json file, generated by npm native, still contains the
>> reproducible metadata. Example here with a package that doesn't declare
>> a version field: NPM still added integrity and resolved fields which
>> will ensure reproducibility:
>>
>> ```
>>       "node_modules/cockpit-repo": {
>>         "name": "CockpitDevelopmentDependencies",
>>         "resolved":
>> "git+ssh://git@github.com/cockpit-project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac",
>>         "integrity":
>> "sha512-4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/fdyi1xrkysM9hxLq3Q16ofaw==",
>> ```
>>>
>>> We should probably error if a version cannot be found and force the
>>> user to add one?
>> An NPM project can typically pull hundreds of dependencies with some not
>> declaring a version, the user would need to submit patches to all of
>> their upstreams to fix that which is challenging. Meanwhile, the real
>> NPM tool is perfectly fine with accepting that, as illustrated above. I
>> believe we should hence behave like the official NPM tool.
> 
> I'm not asking for the upstream NPM projects to be locked down. I'm
> asking that our recipes always have versions specified. This is similar
> to what we do in other places like crates with carg under rust, which
> probably is the best example of how we can lock down situations like
> this.
This patch doesn't introduce nondeterministic fetching because the 
resolved and integrity field contain checksums/srcrev that will be 
handled just as before to ensure reproducibility. This is true for https 
and git dependencies.

However from my analysis, it looks like if a npm-shrinkwrap.json is 
pointing to a local "file" version as a dependency, there is no checksum 
mechanism, but it may very well be a limitation of the 
npm-shrinkwrap.json format. This will only concern users who rely on 
local files which shouldn't be the norm. But this was the pre-existing 
behavior and is not caused by this patch. Specific npm-shrinkwrap.json 
pointing to local files without a version field couldn't be built 
without this modification. With this patch, they will now be accepted 
but inherit this lack of an existing integrity mechanism for local 
tarballs as it would have been even with a version field (which would 
not be used).

I haven't played with `file` dependencies but maybe this is something we 
can improve in a distinct patchset. If the integrity field is always 
generated in npm-shrinkwrap.json, then we could add the integrity check 
for this kind of dependency.

TLDR: Even if the version field is not specified, the fallback 
resolved/integrity fields contain the git commit SHA, or the HTTP 
tarball's checksum. However local file tarballs do not have an integrity 
mechanism, but it was already the case even if they declared a "version" 
field.
> 
> Cheers,
> 
> Richard
>
Enguerrand de Ribaucourt Aug. 21, 2024, 10:08 a.m. UTC | #6
On 21/08/2024 10:42, Herbrechtsmeier, Stefan wrote:
> Hi,
> 
> I'm not that deep into the topic anymore, but I can hopefully explain some implementation details.
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Richard Purdie <richard.purdie@linuxfoundation.org>
>> Gesendet: Dienstag, 20. August 2024 18:00
>> An: Enguerrand de Ribaucourt <enguerrand.de-
>> ribaucourt@savoirfairelinux.com>; bitbake-devel@lists.openembedded.org
>> Cc: tanguy.raufflet@savoirfairelinux.com; alexandre.belloni@bootlin.com;
>> Herbrechtsmeier, Stefan <Stefan.Herbrechtsmeier@weidmueller.com>
>> Betreff: [EXTERNAL] Re: [bitbake-devel][PATCH v4 2/2] npmsw: accept
>> unspecified versions in package.json
>>
>> CAUTION: This email originated from outside the organization. Do not click
>> links or open attachments unless you recognize the sender and know the
>> content is safe.
>>
>>
>>
>> On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote:
>>> On 15/08/2024 16:12, Richard Purdie wrote:
>>>> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote:
>>>>> Our current emulation mandates that the package.json contains a
>>>>> version field. Some packages may not provide it when they are not
>>>>> published to the registry. The actual `npm pack` would allow such
>>>>> packages, so should we.
>>>>>
>>>>> A first commit was required in the npm class.
>>>>>
>>>>> For the shrinkwrap part , we can actually use the resolved field
>>>>> which contains the exact source, including the revision, to pass
>>>>> integrity tests.
>>>>>
>>>>> v3:
>>>>>    - Split bitbake npmsw.py modification in another commit
>>>>>
>>>>> Co-authored-by: Tanguy Raufflet
>>>>> <tanguy.raufflet@savoirfairelinux.com>
>>>>> Signed-off-by: Tanguy Raufflet
>>>>> <tanguy.raufflet@savoirfairelinux.com>
>>>>> Signed-off-by: Enguerrand de Ribaucourt
>>>>> <enguerrand.de-ribaucourt@savoirfairelinux.com>
>>>>> ---
>>>>>    bitbake/lib/bb/fetch2/npmsw.py | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py
>>>>> b/bitbake/lib/bb/fetch2/npmsw.py index d8ed9df327..a5fa598deb
>>>>> 100644
>>>>> --- a/bitbake/lib/bb/fetch2/npmsw.py
>>>>> +++ b/bitbake/lib/bb/fetch2/npmsw.py
>>>>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod):
>>>>>
>>>>>                integrity = params.get("integrity", None)
>>>>>                resolved = params.get("resolved", None)
>>>>> -            version = params.get("version", None)
>>>>> +            version = params.get("version", resolved)
>>>>>
>>>>>                # Handle registry sources
>>>>>                if is_semver(version) and integrity:
>>>>
>>>> I'm not convinced by this reasoning. The whole point of our SRC_URI
>>>> is to precisely define something that is being built. I think this
>>>> allows floating versions of dependencies and hence makes this non-
>>>> reproducible?
>>>
>>> The version declared in package.json is purely declarative for the NPM
>>> package registry. The basic Yocto fetchers will still be used for
>>> downloading the SRC_URI (https, git, ...), with the same version
>>> reproducibility mechanisms like git hash or md5sum.
>>>
>>> In the npm.bbclass, we have a second fetch stage (actually during
>>> do_configure) to fetch all the dependencies. The npm class here
>>> behaves wildly differently than regular frameworks where each
>>> dependency is a separate recipe. But since there are myriads of
>>> dependencies in the NPM world and it would be rare to share them, the
>>> current recipe class is intended to package an application with all
>>> its dependencies using the NPM registry.
>>>
>>> To fetch all those dependencies, we rely on the NPM dependencies
>>> mechanism of package.json which lists packages with their declarative
>>> version. However this is not reproducible. NPM provides two additional
>>> files that NPM developers should include in their repositories to pin
>>> the whole dependency tree: package-lock.json and npm-shrinkwrap.json.
>>> The npm recipetool will fetch the dependencies the first time with
>>> package.json information, and produce an npm-shrinkwrap.json file that
>>> pins all versions.
>>
>> Are you saying that the codepath in question is only used to generate the
>> npm-shrinkwrap.json file and that a normal do_fetch and a normal build will
>> not fetch any floating version?
> 
> The npm-shrinkwrap.json is generated by npm. In contrast to rust and go the lock file isn't processed during recipe creation but during build. This leads to the problem that the recipe author need to fix the npm-shrinkwrap.json.
> 
> The npm-shrinkwrap.json is a common lock file and locks the download sources. It contains the download source inside the `resolved` field [1]. In case of git it contains the full git url with commit sha. This means the fetch use an unspecified release version but not a floating version.
> 
> [1] https://docs.npmjs.com/cli/v10/configuring-npm/package-lock-json#packages
> 
> 
>> I appreciate the npm build itself might not be deterministic but the point of
>> our SRC_URI is to make it deterministic.
>>
>> I'm also worried that you implied fetching is happening during do_configure.
>> That should not be the case and if that is happening, it needs to be changed so
>> that all fetching happens during do_fetch.
> 
> The npm command in do_configure fetches the packages from a local temporary repository. The local repository is filled by pre fetch packages. The packages are fetched during do_fetch. Therefore bitbake parses the npm-shrinkwrap.json during fetch and creates a list of fetch uris on-the-fly .
> 
> 
>>> Back to our commit, we do not change the aforementioned behavior, we
>>> only make it compatible with packages which do not declare a version
>>> in their package.json. The npmsw.py code wants to have a version to
>>> validate what's being downloaded, so we pass a dummy version. But the
>>> npm-shrinkwrap.json file, generated by npm native, still contains the
>>> reproducible metadata. Example here with a package that doesn't
>>> declare a version field: NPM still added integrity and resolved fields
>>> which will ensure reproducibility:
>>>
>>> ```
>>>       "node_modules/cockpit-repo": {
>>>         "name": "CockpitDevelopmentDependencies",
>>>         "resolved":
>>> "git+ssh://git@github.com/cockpit-
>> project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac",
>>>         "integrity":
>>> "sha512-
>> 4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/
>>> fdyi1xrkysM9hxLq3Q16ofaw==",
>>> ```
>>>>
>>>> We should probably error if a version cannot be found and force the
>>>> user to add one?
>>> An NPM project can typically pull hundreds of dependencies with some
>>> not declaring a version, the user would need to submit patches to all
>>> of their upstreams to fix that which is challenging. Meanwhile, the
>>> real NPM tool is perfectly fine with accepting that, as illustrated
>>> above. I believe we should hence behave like the official NPM tool.
>>
>> I'm not asking for the upstream NPM projects to be locked down. I'm asking
>> that our recipes always have versions specified. This is similar to what we do in
>> other places like crates with carg under rust, which probably is the best
>> example of how we can lock down situations like this.
> 
> The rust solution has the advantage, that the lock is parsed during recipe creation and the recipe author can rework the result.
> 
> But the real problem is that the project wasn't published to the registry and has no release. Both cargo and npm support git repositories without a version. Maybe the parser should set the version to `0-git<short git commit sha>` and extract the hash from the `resolved` field.
We do declare a dummy version in another patchset somewhere else indeed 
to pass some checks and allow building such packages. However in this 
specific place, the version field is what's used to fetch the packages 
and it needs a URL (git://, ...) which is conveniently provided by the 
`resolved` field!
> 
> Regards
>    Stefan
> 
> ________________________________
> Kommanditgesellschaft - Sitz: Detmold - Amtsgericht Lemgo HRA 2790 -
> Komplementärin: Weidmüller Interface Führungsgesellschaft mbH -
> Sitz: Detmold - Amtsgericht Lemgo HRB 3924;
> Geschäftsführer: Dr. Timo Berger, Volker Bibelhausen, Dr. Sebastian Durst, André Sombecki;
> USt-ID-Nr. DE124599660
Alexander Kanavin Aug. 21, 2024, 10:32 a.m. UTC | #7
This still doesn't entirely compute. We do have npm fetcher tests in
bitbake, but they test for existence of 'npm', and are skipped if npm
is not there. And so they're never run on the autobuilder, and npm
fetcher regularly regresses.

On the other hand, you seem to be claiming that the fetcher does not
require npm. But then this skipping is unnecessary and should be
dropped, and all tests should pass.

Once we resolve that point, we should ensure that tests do check both
possibilities:

- modules that have checksum information result in a successful fetch opetation
- if something does not have such information, the fetch fails, and
that is interpreted as the test succeeding

This would reassure everyone that the fetcher does fulfil source
integrity requirements a lot more than a hand-written description of
how it operates.

Alex


On Wed, 21 Aug 2024 at 12:08, Enguerrand de Ribaucourt via
lists.openembedded.org
<enguerrand.de-ribaucourt=savoirfairelinux.com@lists.openembedded.org>
wrote:
>
>
>
> On 21/08/2024 10:42, Herbrechtsmeier, Stefan wrote:
> > Hi,
> >
> > I'm not that deep into the topic anymore, but I can hopefully explain some implementation details.
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Richard Purdie <richard.purdie@linuxfoundation.org>
> >> Gesendet: Dienstag, 20. August 2024 18:00
> >> An: Enguerrand de Ribaucourt <enguerrand.de-
> >> ribaucourt@savoirfairelinux.com>; bitbake-devel@lists.openembedded.org
> >> Cc: tanguy.raufflet@savoirfairelinux.com; alexandre.belloni@bootlin.com;
> >> Herbrechtsmeier, Stefan <Stefan.Herbrechtsmeier@weidmueller.com>
> >> Betreff: [EXTERNAL] Re: [bitbake-devel][PATCH v4 2/2] npmsw: accept
> >> unspecified versions in package.json
> >>
> >> CAUTION: This email originated from outside the organization. Do not click
> >> links or open attachments unless you recognize the sender and know the
> >> content is safe.
> >>
> >>
> >>
> >> On Mon, 2024-08-19 at 09:58 +0200, Enguerrand de Ribaucourt wrote:
> >>> On 15/08/2024 16:12, Richard Purdie wrote:
> >>>> On Mon, 2024-08-12 at 14:25 +0200, Enguerrand de Ribaucourt wrote:
> >>>>> Our current emulation mandates that the package.json contains a
> >>>>> version field. Some packages may not provide it when they are not
> >>>>> published to the registry. The actual `npm pack` would allow such
> >>>>> packages, so should we.
> >>>>>
> >>>>> A first commit was required in the npm class.
> >>>>>
> >>>>> For the shrinkwrap part , we can actually use the resolved field
> >>>>> which contains the exact source, including the revision, to pass
> >>>>> integrity tests.
> >>>>>
> >>>>> v3:
> >>>>>    - Split bitbake npmsw.py modification in another commit
> >>>>>
> >>>>> Co-authored-by: Tanguy Raufflet
> >>>>> <tanguy.raufflet@savoirfairelinux.com>
> >>>>> Signed-off-by: Tanguy Raufflet
> >>>>> <tanguy.raufflet@savoirfairelinux.com>
> >>>>> Signed-off-by: Enguerrand de Ribaucourt
> >>>>> <enguerrand.de-ribaucourt@savoirfairelinux.com>
> >>>>> ---
> >>>>>    bitbake/lib/bb/fetch2/npmsw.py | 2 +-
> >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/bitbake/lib/bb/fetch2/npmsw.py
> >>>>> b/bitbake/lib/bb/fetch2/npmsw.py index d8ed9df327..a5fa598deb
> >>>>> 100644
> >>>>> --- a/bitbake/lib/bb/fetch2/npmsw.py
> >>>>> +++ b/bitbake/lib/bb/fetch2/npmsw.py
> >>>>> @@ -97,7 +97,7 @@ class NpmShrinkWrap(FetchMethod):
> >>>>>
> >>>>>                integrity = params.get("integrity", None)
> >>>>>                resolved = params.get("resolved", None)
> >>>>> -            version = params.get("version", None)
> >>>>> +            version = params.get("version", resolved)
> >>>>>
> >>>>>                # Handle registry sources
> >>>>>                if is_semver(version) and integrity:
> >>>>
> >>>> I'm not convinced by this reasoning. The whole point of our SRC_URI
> >>>> is to precisely define something that is being built. I think this
> >>>> allows floating versions of dependencies and hence makes this non-
> >>>> reproducible?
> >>>
> >>> The version declared in package.json is purely declarative for the NPM
> >>> package registry. The basic Yocto fetchers will still be used for
> >>> downloading the SRC_URI (https, git, ...), with the same version
> >>> reproducibility mechanisms like git hash or md5sum.
> >>>
> >>> In the npm.bbclass, we have a second fetch stage (actually during
> >>> do_configure) to fetch all the dependencies. The npm class here
> >>> behaves wildly differently than regular frameworks where each
> >>> dependency is a separate recipe. But since there are myriads of
> >>> dependencies in the NPM world and it would be rare to share them, the
> >>> current recipe class is intended to package an application with all
> >>> its dependencies using the NPM registry.
> >>>
> >>> To fetch all those dependencies, we rely on the NPM dependencies
> >>> mechanism of package.json which lists packages with their declarative
> >>> version. However this is not reproducible. NPM provides two additional
> >>> files that NPM developers should include in their repositories to pin
> >>> the whole dependency tree: package-lock.json and npm-shrinkwrap.json.
> >>> The npm recipetool will fetch the dependencies the first time with
> >>> package.json information, and produce an npm-shrinkwrap.json file that
> >>> pins all versions.
> >>
> >> Are you saying that the codepath in question is only used to generate the
> >> npm-shrinkwrap.json file and that a normal do_fetch and a normal build will
> >> not fetch any floating version?
> >
> > The npm-shrinkwrap.json is generated by npm. In contrast to rust and go the lock file isn't processed during recipe creation but during build. This leads to the problem that the recipe author need to fix the npm-shrinkwrap.json.
> >
> > The npm-shrinkwrap.json is a common lock file and locks the download sources. It contains the download source inside the `resolved` field [1]. In case of git it contains the full git url with commit sha. This means the fetch use an unspecified release version but not a floating version.
> >
> > [1] https://docs.npmjs.com/cli/v10/configuring-npm/package-lock-json#packages
> >
> >
> >> I appreciate the npm build itself might not be deterministic but the point of
> >> our SRC_URI is to make it deterministic.
> >>
> >> I'm also worried that you implied fetching is happening during do_configure.
> >> That should not be the case and if that is happening, it needs to be changed so
> >> that all fetching happens during do_fetch.
> >
> > The npm command in do_configure fetches the packages from a local temporary repository. The local repository is filled by pre fetch packages. The packages are fetched during do_fetch. Therefore bitbake parses the npm-shrinkwrap.json during fetch and creates a list of fetch uris on-the-fly .
> >
> >
> >>> Back to our commit, we do not change the aforementioned behavior, we
> >>> only make it compatible with packages which do not declare a version
> >>> in their package.json. The npmsw.py code wants to have a version to
> >>> validate what's being downloaded, so we pass a dummy version. But the
> >>> npm-shrinkwrap.json file, generated by npm native, still contains the
> >>> reproducible metadata. Example here with a package that doesn't
> >>> declare a version field: NPM still added integrity and resolved fields
> >>> which will ensure reproducibility:
> >>>
> >>> ```
> >>>       "node_modules/cockpit-repo": {
> >>>         "name": "CockpitDevelopmentDependencies",
> >>>         "resolved":
> >>> "git+ssh://git@github.com/cockpit-
> >> project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac",
> >>>         "integrity":
> >>> "sha512-
> >> 4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/
> >>> fdyi1xrkysM9hxLq3Q16ofaw==",
> >>> ```
> >>>>
> >>>> We should probably error if a version cannot be found and force the
> >>>> user to add one?
> >>> An NPM project can typically pull hundreds of dependencies with some
> >>> not declaring a version, the user would need to submit patches to all
> >>> of their upstreams to fix that which is challenging. Meanwhile, the
> >>> real NPM tool is perfectly fine with accepting that, as illustrated
> >>> above. I believe we should hence behave like the official NPM tool.
> >>
> >> I'm not asking for the upstream NPM projects to be locked down. I'm asking
> >> that our recipes always have versions specified. This is similar to what we do in
> >> other places like crates with carg under rust, which probably is the best
> >> example of how we can lock down situations like this.
> >
> > The rust solution has the advantage, that the lock is parsed during recipe creation and the recipe author can rework the result.
> >
> > But the real problem is that the project wasn't published to the registry and has no release. Both cargo and npm support git repositories without a version. Maybe the parser should set the version to `0-git<short git commit sha>` and extract the hash from the `resolved` field.
> We do declare a dummy version in another patchset somewhere else indeed
> to pass some checks and allow building such packages. However in this
> specific place, the version field is what's used to fetch the packages
> and it needs a URL (git://, ...) which is conveniently provided by the
> `resolved` field!
> >
> > Regards
> >    Stefan
> >
> > ________________________________
> > Kommanditgesellschaft - Sitz: Detmold - Amtsgericht Lemgo HRA 2790 -
> > Komplementärin: Weidmüller Interface Führungsgesellschaft mbH -
> > Sitz: Detmold - Amtsgericht Lemgo HRB 3924;
> > Geschäftsführer: Dr. Timo Berger, Volker Bibelhausen, Dr. Sebastian Durst, André Sombecki;
> > USt-ID-Nr. DE124599660
>
> --
> Savoir-faire Linux
> Enguerrand de Ribaucourt
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16505): https://lists.openembedded.org/g/bitbake-devel/message/16505
> Mute This Topic: https://lists.openembedded.org/mt/107855179/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Stefan Herbrechtsmeier Aug. 21, 2024, 10:43 a.m. UTC | #8
[snip]

>>>>>
>>>> Back to our commit, we do not change the aforementioned behavior, we
>>>> only make it compatible with packages which do not declare a version
>>>> in their package.json. The npmsw.py code wants to have a version to
>>>> validate what's being downloaded, so we pass a dummy version. But the
>>>> npm-shrinkwrap.json file, generated by npm native, still contains the
>>>> reproducible metadata. Example here with a package that doesn't
>>>> declare a version field: NPM still added integrity and resolved fields
>>>> which will ensure reproducibility:
>>>>
>>>> ```
>>>>       "node_modules/cockpit-repo": {
>>>>         "name": "CockpitDevelopmentDependencies",
>>>>         "resolved":
>>>> "git+ssh://git@github.com/cockpit-
>>> project/cockpit.git#d34cabacb8e5e1e028c7eea3d6e3b606d862b8ac",
>>>>         "integrity":
>>>> "sha512-
>>> 4nBWhDtVLwF++ZPY/67QtpsXc9y/Jpa/jj1LW/gScSafDKoHmtg1BwAQ7sqpN/
>>>> fdyi1xrkysM9hxLq3Q16ofaw==",
>>>> ```
>>>>>
>>>>> We should probably error if a version cannot be found and force the
>>>>> user to add one?
>>>> An NPM project can typically pull hundreds of dependencies with some
>>>> not declaring a version, the user would need to submit patches to all
>>>> of their upstreams to fix that which is challenging. Meanwhile, the
>>>> real NPM tool is perfectly fine with accepting that, as illustrated
>>>> above. I believe we should hence behave like the official NPM tool.
>>>
>>> I'm not asking for the upstream NPM projects to be locked down. I'm 
>>> asking
>>> that our recipes always have versions specified. This is similar to 
>>> what we do in
>>> other places like crates with carg under rust, which probably is the 
>>> best
>>> example of how we can lock down situations like this.
>>
>> The rust solution has the advantage, that the lock is parsed during 
>> recipe creation and the recipe author can rework the result.
>>
>> But the real problem is that the project wasn't published to the 
>> registry and has no release. Both cargo and npm support git 
>> repositories without a version. Maybe the parser should set the 
>> version to `0-git<short git commit sha>` and extract the hash from 
>> the `resolved` field.
> We do declare a dummy version in another patchset somewhere else 
> indeed to pass some checks and allow building such packages. However 
> in this specific place, the version field is what's used to fetch the 
> packages and it needs a URL (git://, ...) which is conveniently 
> provided by the `resolved` field!

You are right the npm version field isn't always a semantic version. 
Maybe the git sources handler should always use the `resolved` field.
Richard Purdie Aug. 21, 2024, 1 p.m. UTC | #9
On Wed, 2024-08-21 at 12:06 +0200, Enguerrand de Ribaucourt wrote:
> This patch doesn't introduce nondeterministic fetching because the 
> resolved and integrity field contain checksums/srcrev that will be 
> handled just as before to ensure reproducibility. This is true for https 
> and git dependencies.
> 
> However from my analysis, it looks like if a npm-shrinkwrap.json is 
> pointing to a local "file" version as a dependency, there is no checksum 
> mechanism, but it may very well be a limitation of the 
> npm-shrinkwrap.json format. This will only concern users who rely on 
> local files which shouldn't be the norm. But this was the pre-existing 
> behavior and is not caused by this patch. Specific npm-shrinkwrap.json 
> pointing to local files without a version field couldn't be built 
> without this modification. With this patch, they will now be accepted
> but inherit this lack of an existing integrity mechanism for local 
> tarballs as it would have been even with a version field (which would
> not be used).
> 
> I haven't played with `file` dependencies but maybe this is something we 
> can improve in a distinct patchset. If the integrity field is always 
> generated in npm-shrinkwrap.json, then we could add the integrity check 
> for this kind of dependency.
> 
> TLDR: Even if the version field is not specified, the fallback 
> resolved/integrity fields contain the git commit SHA, or the HTTP 
> tarball's checksum. However local file tarballs do not have an integrity 
> mechanism, but it was already the case even if they declared a "version" 
> field.
> 

I think this all makes sense and I agree, it doesn't sound like there
is an issue here with the use of the resolved field.

What I would like to ask is that the commit message be improved so that
this is better documented and the commit more easily understood from
it. In particular, it should be clear why it doesn't break determinism.

I suspect we also need to find a way to test the npm fetcher on the
autobuilder, maybe in a standalone job that pulls in meta-oe and builds
the tools it needs. We can then start expanding the test cases to cover
these issues as they arise.

For better or worse, the fetcher had a ton of corner cases and we need
to account for them all with test cases so we can make changes to the
code without fear of regressing anything. Outside of npm, I think we do
a reasonable job but right now we don't run the npm cases sadly.

Cheers,

Richard
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/npmsw.py b/bitbake/lib/bb/fetch2/npmsw.py
index d8ed9df327..a5fa598deb 100644
--- a/bitbake/lib/bb/fetch2/npmsw.py
+++ b/bitbake/lib/bb/fetch2/npmsw.py
@@ -97,7 +97,7 @@  class NpmShrinkWrap(FetchMethod):
 
             integrity = params.get("integrity", None)
             resolved = params.get("resolved", None)
-            version = params.get("version", None)
+            version = params.get("version", resolved)
 
             # Handle registry sources
             if is_semver(version) and integrity: