| Message ID | 20250902065507.35737-1-stefan.herbrechtsmeier-oss@weidmueller.com |
|---|---|
| Headers | show |
| Series | fetch2: add support for implicit urls | expand |
On Tue Sep 2, 2025 at 8:55 AM CEST, Stefan Herbrechtsmeier via lists.openembedded.org wrote: > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> > > The patch series add support for implicit URLs inside the fetcher. The > implicit URLs could be defined inside a source like a version control > system (git submodule) or a lock file (package-lock.json, cargo.lock or > go.sum). The integration of implicit URLs beside explicit URLs > simplifies the fetcher classes and avoid bugs because of iterations > between the Fetch and FetchMethod classes. > > The series remove most methods inside the gitsm fetcher and only leaves > the parsing of the git submodules and the unpack functionality. It > allows the gitsm fetcher to use the premirror only feature. The current > implementation leads to problems because the download of the git > submodules is triggered via the download method which is called deeply > inside the fetcher code. > > > Stefan Herbrechtsmeier (6): > fetch2: rename u to url in Fetch class > fetch2: call functions within loops of Fetch class > fetch2: add helper to get urldata in Fetch class > fetch2: add support for implicit urls > fetch2: gitsm: use implicit urls feature > tests: fetch: add test case for gitsm implicit local paths > > lib/bb/fetch2/__init__.py | 128 +++++++++++++++++++++++++++----------- > lib/bb/fetch2/gitsm.py | 46 ++------------ > lib/bb/tests/fetch.py | 12 ++++ > 3 files changed, 109 insertions(+), 77 deletions(-) Hi Stefan, I know it is just an RFC so far, but I did launch a build on the autobuilder. It was mostly correct but fails a selftest: ERROR: git-submodule-test-1.0-r0 do_ar_mirror: Error executing a python function in exec_func_python() autogenerated: ... File: '/srv/pokybuild/yocto-worker/oe-selftest-debian/build/bitbake/lib/bb/fetch2/__init__.py', lineno: 2102, function: expand_urldata ... Exception: UnboundLocalError: cannot access local variable 'urldata' where it is not associated with a value ... 2025-09-03 06:21:46,098 - oe-selftest - INFO - archiver.Archiver.test_archiver_mode_mirror_gitsm (subunit.RemotedTestCase) 2025-09-03 06:21:46,099 - oe-selftest - INFO - ... FAIL And a similar error with archiver.Archiver.test_archiver_mode_mirror_gitsm_shallow. Thanks, Mathieu
Hi Mathieu, Thank you for testing. Until now I only run the bitbake tests. I will look into the error and will add a test case to bitbake to catch the problem early. Thanks Stefan Am 04.09.2025 um 08:00 schrieb Mathieu Dubois-Briand: > On Tue Sep 2, 2025 at 8:55 AM CEST, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> >> >> The patch series add support for implicit URLs inside the fetcher. The >> implicit URLs could be defined inside a source like a version control >> system (git submodule) or a lock file (package-lock.json, cargo.lock or >> go.sum). The integration of implicit URLs beside explicit URLs >> simplifies the fetcher classes and avoid bugs because of iterations >> between the Fetch and FetchMethod classes. >> >> The series remove most methods inside the gitsm fetcher and only leaves >> the parsing of the git submodules and the unpack functionality. It >> allows the gitsm fetcher to use the premirror only feature. The current >> implementation leads to problems because the download of the git >> submodules is triggered via the download method which is called deeply >> inside the fetcher code. >> >> >> Stefan Herbrechtsmeier (6): >> fetch2: rename u to url in Fetch class >> fetch2: call functions within loops of Fetch class >> fetch2: add helper to get urldata in Fetch class >> fetch2: add support for implicit urls >> fetch2: gitsm: use implicit urls feature >> tests: fetch: add test case for gitsm implicit local paths >> >> lib/bb/fetch2/__init__.py | 128 +++++++++++++++++++++++++++----------- >> lib/bb/fetch2/gitsm.py | 46 ++------------ >> lib/bb/tests/fetch.py | 12 ++++ >> 3 files changed, 109 insertions(+), 77 deletions(-) > Hi Stefan, > > I know it is just an RFC so far, but I did launch a build on the > autobuilder. It was mostly correct but fails a selftest: > > ERROR: git-submodule-test-1.0-r0 do_ar_mirror: Error executing a python function in exec_func_python() autogenerated: > ... > File: '/srv/pokybuild/yocto-worker/oe-selftest-debian/build/bitbake/lib/bb/fetch2/__init__.py', lineno: 2102, function: expand_urldata > ... > Exception: UnboundLocalError: cannot access local variable 'urldata' where it is not associated with a value > ... > 2025-09-03 06:21:46,098 - oe-selftest - INFO - archiver.Archiver.test_archiver_mode_mirror_gitsm (subunit.RemotedTestCase) > 2025-09-03 06:21:46,099 - oe-selftest - INFO - ... FAIL > > And a similar error with > archiver.Archiver.test_archiver_mode_mirror_gitsm_shallow. > > Thanks, > Mathieu >
Am 04.09.2025 um 08:00 schrieb Mathieu Dubois-Briand: > On Tue Sep 2, 2025 at 8:55 AM CEST, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> >> >> The patch series add support for implicit URLs inside the fetcher. The >> implicit URLs could be defined inside a source like a version control >> system (git submodule) or a lock file (package-lock.json, cargo.lock or >> go.sum). The integration of implicit URLs beside explicit URLs >> simplifies the fetcher classes and avoid bugs because of iterations >> between the Fetch and FetchMethod classes. >> >> The series remove most methods inside the gitsm fetcher and only leaves >> the parsing of the git submodules and the unpack functionality. It >> allows the gitsm fetcher to use the premirror only feature. The current >> implementation leads to problems because the download of the git >> submodules is triggered via the download method which is called deeply >> inside the fetcher code. >> >> >> Stefan Herbrechtsmeier (6): >> fetch2: rename u to url in Fetch class >> fetch2: call functions within loops of Fetch class >> fetch2: add helper to get urldata in Fetch class >> fetch2: add support for implicit urls >> fetch2: gitsm: use implicit urls feature >> tests: fetch: add test case for gitsm implicit local paths >> >> lib/bb/fetch2/__init__.py | 128 +++++++++++++++++++++++++++----------- >> lib/bb/fetch2/gitsm.py | 46 ++------------ >> lib/bb/tests/fetch.py | 12 ++++ >> 3 files changed, 109 insertions(+), 77 deletions(-) > I know it is just an RFC so far, but I did launch a build on the > autobuilder. It was mostly correct but fails a selftest: > > ERROR: git-submodule-test-1.0-r0 do_ar_mirror: Error executing a python function in exec_func_python() autogenerated: > ... > File: '/srv/pokybuild/yocto-worker/oe-selftest-debian/build/bitbake/lib/bb/fetch2/__init__.py', lineno: 2102, function: expand_urldata > ... > Exception: UnboundLocalError: cannot access local variable 'urldata' where it is not associated with a value > ... > 2025-09-03 06:21:46,098 - oe-selftest - INFO - archiver.Archiver.test_archiver_mode_mirror_gitsm (subunit.RemotedTestCase) > 2025-09-03 06:21:46,099 - oe-selftest - INFO - ... FAIL > > And a similar error with > archiver.Archiver.test_archiver_mode_mirror_gitsm_shallow. I've fixed the issue and added a BitBake test case. Perhaps you could test the second version of the two patch series. Regards Stefan
On Tue, 2025-09-02 at 08:55 +0200, Stefan Herbrechtsmeier via lists.openembedded.org wrote: > The patch series add support for implicit URLs inside the fetcher. The > implicit URLs could be defined inside a source like a version control > system (git submodule) or a lock file (package-lock.json, cargo.lock or > go.sum). The integration of implicit URLs beside explicit URLs > simplifies the fetcher classes and avoid bugs because of iterations > between the Fetch and FetchMethod classes. > > The series remove most methods inside the gitsm fetcher and only leaves > the parsing of the git submodules and the unpack functionality. It > allows the gitsm fetcher to use the premirror only feature. The current > implementation leads to problems because the download of the git > submodules is triggered via the download method which is called deeply > inside the fetcher code. We had the discussion a while back and the conclusion seemed to be that implict urls were disliked by a significant number of people as the were too unclear about what was going on behind the scenes and also made things like software manifests harder. There was a strong preference for metadata helpers and explicit lists of components which we have for crates/rust and now for go too. It feels like this series is moving us back to the other direction. Is that correct and if so, what has changed in the approach since the last discussion? Cheers, Richard
Am 07.09.2025 um 17:52 schrieb Richard Purdie via lists.openembedded.org: > On Tue, 2025-09-02 at 08:55 +0200, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >> The patch series add support for implicit URLs inside the fetcher. The >> implicit URLs could be defined inside a source like a version control >> system (git submodule) or a lock file (package-lock.json, cargo.lock or >> go.sum). The integration of implicit URLs beside explicit URLs >> simplifies the fetcher classes and avoid bugs because of iterations >> between the Fetch and FetchMethod classes. >> >> The series remove most methods inside the gitsm fetcher and only leaves >> the parsing of the git submodules and the unpack functionality. It >> allows the gitsm fetcher to use the premirror only feature. The current >> implementation leads to problems because the download of the git >> submodules is triggered via the download method which is called deeply >> inside the fetcher code. > We had the discussion a while back and the conclusion seemed to be that > implict urls were disliked by a significant number of people as the > were too unclear about what was going on behind the scenes and also > made things like software manifests harder. It looks like I miss some discussion and especially the conclusion. I assume you mean recipes by software manifests. > There was a strong > preference for metadata helpers and explicit lists of components which > we have for crates/rust and now for go too. Does this mean the npmsw and gitsm fetchers are obsolete and should be replace by a metadata helpers to fix open issues? > It feels like this series is moving us back to the other direction. Is > that correct and if so, what has changed in the approach since the last > discussion? Do you mean the response to my last RFC? In this case it wasn't clear to me that the project is against implicit URLs and that the npmsw and gitsm fetcher are the wrong direction. This series is only a cleanup of the existing functionally. The gitsm fetcher uses implicit URLs but doesn't work correct because of the misuses of the download code. It is useless to start the discussion again. The project doesn't like implicit URLs. It prefers a special task as a replacement for the separate recipetool. It decides against the on-the-fly parse inside the fetcher. How should I proceed? I have working code which parse the cargo.lock, go.sum and package-lock.json files and only use Git and Wget fetchers. The code uses the vendor feature of the package managers to create a patchable folder of the sources. It simplifies the npm class and add additional classes to build packages. The code integrates valid URLs inside the SBOM and creates components with name and version per dependency. Furthermore, I have rework the gitsm fetcher to hopefully fix some open issues. I can convert my cargo, go, npm and gitsm parser into metadata helpers but this is useless if the project dislike python functions to generate URLs like the pypi class or prefer package manager specific code inside the fetcher. Regard Stefan
On Mon, 2025-09-08 at 11:20 +0200, Stefan Herbrechtsmeier wrote: > Am 07.09.2025 um 17:52 schrieb Richard Purdie via lists.openembedded.org: > On Tue, 2025-09-02 at 08:55 +0200, Stefan Herbrechtsmeier via lists.openembedded.org wrote: > > > The patch series add support for implicit URLs inside the fetcher. The > > > implicit URLs could be defined inside a source like a version control > > > system (git submodule) or a lock file (package-lock.json, cargo.lock or > > > go.sum). The integration of implicit URLs beside explicit URLs > > > simplifies the fetcher classes and avoid bugs because of iterations > > > between the Fetch and FetchMethod classes. > > > > > > The series remove most methods inside the gitsm fetcher and only leaves > > > the parsing of the git submodules and the unpack functionality. It > > > allows the gitsm fetcher to use the premirror only feature. The current > > > implementation leads to problems because the download of the git > > > submodules is triggered via the download method which is called deeply > > > inside the fetcher code. > > We had the discussion a while back and the conclusion seemed to be that > > implict urls were disliked by a significant number of people as the > > were too unclear about what was going on behind the scenes and also > > made things like software manifests harder. > > > > > > It looks like I miss some discussion and especially the conclusion. > > > I assume you mean recipes by software manifests. No, I did mean software manifests. If the urls are explict, it makes generating manifests of the sources being used more obvious for people to understand. Yes, there are programmatic ways of doing it with implicit urls but people don't like them. By conclusions, I was taking that as the outcome of the last set of discussions but there was a lot of different emails and it was hard to follow. Perhaps i got the conclusion wrong, I don't know. > > There was a strong > > preference for metadata helpers and explicit lists of components > > which > > we have for crates/rust and now for go too. > > > > Does this mean the npmsw and gitsm fetchers are obsolete and should > be replace by a metadata helpers to fix open issues? I really don't know about npmsw. I don't use it and I don't really follow development there. I'm don't know much about the current set of issues it may have. With gitsm, I think that is generally accepted by people and I don't see a strong reason to change it at present. I would be interested in a clear summary of what the known issues are (e.g. the premirror issue you mentioned). > > It feels like this series is moving us back to the other direction. > > Is > > that correct and if so, what has changed in the approach since the > > last > > discussion? > > > > Do you mean the response to my last RFC? In this case it wasn't clear > to me that the project is against implicit URLs and that the npmsw > and gitsm fetcher are the wrong direction. I'm trying to read the "mood" of our developer community and right now, it feels like putting a lot of complexity hidden in the fetcher isn't what people want to see as they don't understand it and can't "see" what is going on. During the discussions, I think we identified some key fundamental issues with implict urls for some fetch types too. There is some hard work needs to be done in trying to summarise those discussions and writing down the "results" so that we don't have to redo this every time a new patch series comes along. By that, I mean a non-emotive list of the current advantages, disdtantages and known bugs of the current approach and any proposed alternatives we might choose. It perhaps falls to me as the developer lead for bitbake to try and do it but I'd very much welcome help from anyone else in trying to do it as I simply don't have the mental bandwidth to try and do that for this topic right now (due to e.g. bitbake-setup). I certainly don't have to be the one who does it. I appreciate it isn't a fun task though. > This series is only a cleanup of the existing functionally. The gitsm > fetcher uses implicit URLs but doesn't work correct because of the > misuses of the download code. I'm worried about where the series is trying to take the project and codebase though, hence the questions about intent. > It is useless to start the discussion again. The project doesn't like > implicit URLs. It prefers a special task as a replacement for the > separate recipetool. It decides against the on-the-fly parse inside > the fetcher. The developers using the project feel much happier with that approach, yes. > How should I proceed? I have working code which parse the cargo.lock, > go.sum and package-lock.json files and only use Git and Wget > fetchers. The code uses the vendor feature of the package managers to > create a patchable folder of the sources. It simplifies the npm class > and add additional classes to build packages. The code integrates > valid URLs inside the SBOM and creates components with name and > version per dependency. Furthermore, I have rework the gitsm fetcher > to hopefully fix some open issues. I can convert my cargo, go, npm > and gitsm parser into metadata helpers but this is useless if the > project dislike python functions to generate URLs like the pypi class > or prefer package manager specific code inside the fetcher. So you are proposing we drop the crate and gomod fetchers in favour of implict urls? I'd suggest sharing a branch of your changes so that others can see and understand the implications and hopefully experiment a bit, see if it can convince some people that implict urls are the way forward. Cheers, Richard
Am 08.09.2025 um 12:26 schrieb Richard Purdie: > On Mon, 2025-09-08 at 11:20 +0200, Stefan Herbrechtsmeier wrote: >> Am 07.09.2025 um 17:52 schrieb Richard Purdie via lists.openembedded.org: >> On Tue, 2025-09-02 at 08:55 +0200, Stefan Herbrechtsmeier via lists.openembedded.org wrote: >>>> The patch series add support for implicit URLs inside the fetcher. The >>>> implicit URLs could be defined inside a source like a version control >>>> system (git submodule) or a lock file (package-lock.json, cargo.lock or >>>> go.sum). The integration of implicit URLs beside explicit URLs >>>> simplifies the fetcher classes and avoid bugs because of iterations >>>> between the Fetch and FetchMethod classes. >>>> >>>> The series remove most methods inside the gitsm fetcher and only leaves >>>> the parsing of the git submodules and the unpack functionality. It >>>> allows the gitsm fetcher to use the premirror only feature. The current >>>> implementation leads to problems because the download of the git >>>> submodules is triggered via the download method which is called deeply >>>> inside the fetcher code. >>> We had the discussion a while back and the conclusion seemed to be that >>> implict urls were disliked by a significant number of people as the >>> were too unclear about what was going on behind the scenes and also >>> made things like software manifests harder. >>> >>> >> It looks like I miss some discussion and especially the conclusion. >> >> >> I assume you mean recipes by software manifests. > No, I did mean software manifests. If the urls are explict, it makes > generating manifests of the sources being used more obvious for people > to understand. Do you think the following is really helpful for a software manifest? npm://registry.npmjs.org/;package=xyz;version=1.2.3 gomod://xyz;version=1.2.3 crate://crates.io/xyz/1.2.3 I assume software manifests needs the real URLs and a common style for the name and version. How do we solve the problem if we keep the gitsm fetcher and doesn't propose an alternative solution? > Yes, there are programmatic ways of doing it with > implicit urls but people don't like them. Based on your reaction regarding the gitsm fetcher I would say a lot of people like the implicit URLs. But without details it is impossible to understand the real problem. Maybe a task to generate a generic include file or a file in the work directory with explicit and implicit URLs solve their problems. > By conclusions, I was taking that as the outcome of the last set of > discussions but there was a lot of different emails and it was hard to > follow. Perhaps i got the conclusion wrong, I don't know. My understand was that implicit URLs have drawbacks but nothing we couldn't overcome. >>> There was a strong >>> preference for metadata helpers and explicit lists of components >>> which >>> we have for crates/rust and now for go too. >>> >> >> Does this mean the npmsw and gitsm fetchers are obsolete and should >> be replace by a metadata helpers to fix open issues? > I really don't know about npmsw. I don't use it and I don't really > follow development there. I'm don't know much about the current set of > issues it may have. > > With gitsm, I think that is generally accepted by people and I don't > see a strong reason to change it at present. Doesn't you say that people don't like implicit URLs and that they are problematical for software manifests? > I would be interested in a > clear summary of what the known issues are (e.g. the premirror issue > you mentioned). https://bugzilla.yoctoproject.org/show_bug.cgi?id=15875 The combination of BB_FETCH_PREMIRRORONLY and gitsm doesn’t work because the download of the submodules is handled inside the try_mirror_url function after the enable of BB_NO_NETWORK. The current code has a recurrence between the download method of the Fetch class and the download function of the FetchMethod class. The need_update method is used to trigger a download of the submodules and the extract_urldata function is only acceptable after a download. It doesn't matter if bitbake support one or more fetchers with implicit URLs. Any user needs to handle it anyway. Either we should deprecate implicit URLs to simply the code or we should support it inside the Fetch class. >>> It feels like this series is moving us back to the other direction. >>> Is >>> that correct and if so, what has changed in the approach since the >>> last >>> discussion? >>> >> >> Do you mean the response to my last RFC? In this case it wasn't clear >> to me that the project is against implicit URLs and that the npmsw >> and gitsm fetcher are the wrong direction. > I'm trying to read the "mood" of our developer community and right now, > it feels like putting a lot of complexity hidden in the fetcher isn't > what people want to see as they don't understand it and can't "see" > what is going on. During the discussions, I think we identified some > key fundamental issues with implict urls for some fetch types too. That's the reason for this series. It adds native support for implicit URLs and doesn't misuse the need_update function. The implicit URL fetchers only need to implement the implicit_urls function. > There is some hard work needs to be done in trying to summarise those > discussions and writing down the "results" so that we don't have to > redo this every time a new patch series comes along. By that, I mean a > non-emotive list of the current advantages, disdtantages and known bugs > of the current approach and any proposed alternatives we might choose. > It perhaps falls to me as the developer lead for bitbake to try and do > it but I'd very much welcome help from anyone else in trying to do it > as I simply don't have the mental bandwidth to try and do that for this > topic right now (due to e.g. bitbake-setup). I certainly don't have to > be the one who does it. I appreciate it isn't a fun task though. What is the current approach? Do you mean implicit vs explicit URLs? >> This series is only a cleanup of the existing functionally. The gitsm >> fetcher uses implicit URLs but doesn't work correct because of the >> misuses of the download code. > I'm worried about where the series is trying to take the project and > codebase though, hence the questions about intent. I only rework the code and make features explicit. The fetchers already use implicit URLs but now it is clear that the localpaths method doesn't return the local paths of the implicit URLs. The implicit URLs are handled by the download method direct and not deeply inside the download method via the download method of the fetcher. A recursion inside the git submodules shouldn't be a problem because the outer download is finished before the inner download starts. >> It is useless to start the discussion again. The project doesn't like >> implicit URLs. It prefers a special task as a replacement for the >> separate recipetool. It decides against the on-the-fly parse inside >> the fetcher. > The developers using the project feel much happier with that approach, > yes. I wonder why these developers prefer a single gitsm SRC_URI entry instead of a list of git entries because they lose the fine grain control in this case. >> How should I proceed? I have working code which parse the cargo.lock, >> go.sum and package-lock.json files and only use Git and Wget >> fetchers. The code uses the vendor feature of the package managers to >> create a patchable folder of the sources. It simplifies the npm class >> and add additional classes to build packages. The code integrates >> valid URLs inside the SBOM and creates components with name and >> version per dependency. Furthermore, I have rework the gitsm fetcher >> to hopefully fix some open issues. I can convert my cargo, go, npm >> and gitsm parser into metadata helpers but this is useless if the >> project dislike python functions to generate URLs like the pypi class >> or prefer package manager specific code inside the fetcher. > So you are proposing we drop the crate and gomod fetchers in favour of > implict urls? No. These changes are mutual exclusivity. In my case the crate and gomod fetcher aren't needed. > I'd suggest sharing a branch of your changes so that others can see and > understand the implications and hopefully experiment a bit, see if it > can convince some people that implict urls are the way forward. I will do but need some time because I focus on the rework of the gitsm fetcher. Regards Stefan
From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com> The patch series add support for implicit URLs inside the fetcher. The implicit URLs could be defined inside a source like a version control system (git submodule) or a lock file (package-lock.json, cargo.lock or go.sum). The integration of implicit URLs beside explicit URLs simplifies the fetcher classes and avoid bugs because of iterations between the Fetch and FetchMethod classes. The series remove most methods inside the gitsm fetcher and only leaves the parsing of the git submodules and the unpack functionality. It allows the gitsm fetcher to use the premirror only feature. The current implementation leads to problems because the download of the git submodules is triggered via the download method which is called deeply inside the fetcher code. Stefan Herbrechtsmeier (6): fetch2: rename u to url in Fetch class fetch2: call functions within loops of Fetch class fetch2: add helper to get urldata in Fetch class fetch2: add support for implicit urls fetch2: gitsm: use implicit urls feature tests: fetch: add test case for gitsm implicit local paths lib/bb/fetch2/__init__.py | 128 +++++++++++++++++++++++++++----------- lib/bb/fetch2/gitsm.py | 46 ++------------ lib/bb/tests/fetch.py | 12 ++++ 3 files changed, 109 insertions(+), 77 deletions(-)