Message ID | 20250528035625.1262266-1-patrick@stwcx.xyz |
---|---|
State | New |
Headers | show |
Series | systemd.bbclass: fix postinst for real systemd-systemctl-native | expand |
On Wed, 28 May 2025 at 05:56, Patrick Williams via lists.openembedded.org <patrick=stwcx.xyz@lists.openembedded.org> wrote: > The commit 7a580800db39 switched from a small Python implementation > of `systemctl` to using the real systemctl executable from the > systemd package. In the systemd.bbclass, systemd-systemctl-native > is used to default-enable services, based on the SYSTEMD_SERVICES > variable, but this was only done if `systemctl` can be executed > without error. The problem is that the real systemctl executable > treats a zero argument call as if `systemctl list-units` were ran. > This cannot be done when cross-compiling and yields: > > ``` > Failed to connect to system scope bus via local transport: Operation > not permitted (consider using --machine=<user>@.host --user to > connect to bus of other user) > ``` > > The end result is that the `systemd_postinst` effectively turns into > a silent no-op and services are not correctly enabled. > > Switch the systemd.bbclass to use `systemctl --help` instead, which > does not require any dbus access to be functional. > > Fixes: 7a580800db39 ("systemd: Build the systemctl executable") I don't disagree with the patch, but I have to ask, why isn't this causing failures in our testing? There are images built using native systemctl, they contain services, and they do not produce fails on boot. There has to be a gap in testing somewhere, or systemctl somehow manages to do what it's supposed to. Can you set up poky-altcfg qemux86-64 on master, build core-image-weston and core-image-sato-sdk, and check what is happening? (e.g. by booting them in with runqemu). Alex
On Wed May 28, 2025 at 5:56 AM CEST, Patrick Williams wrote: > The commit 7a580800db39 switched from a small Python implementation > of `systemctl` to using the real systemctl executable from the > systemd package. In the systemd.bbclass, systemd-systemctl-native > is used to default-enable services, based on the SYSTEMD_SERVICES > variable, but this was only done if `systemctl` can be executed > without error. The problem is that the real systemctl executable > treats a zero argument call as if `systemctl list-units` were ran. > This cannot be done when cross-compiling and yields: > > ``` > Failed to connect to system scope bus via local transport: Operation > not permitted (consider using --machine=<user>@.host --user to > connect to bus of other user) > ``` > > The end result is that the `systemd_postinst` effectively turns into > a silent no-op and services are not correctly enabled. > > Switch the systemd.bbclass to use `systemctl --help` instead, which > does not require any dbus access to be functional. > > Fixes: 7a580800db39 ("systemd: Build the systemctl executable") > Signed-off-by: Patrick Williams <patrick@stwcx.xyz> > --- Hi Patrick, Thanks for your patch. It looks like this is breaking postinstall in some configurations, with the following error: ERROR: core-image-sato-1.0-r0 do_rootfs: Postinstall scriptlets of ['bluez5', 'ofono', 'connman', 'xserver-nodm-init', 'avahi-daemon', 'rpcbind', 'run-postinsts'] have failed. If the intention is to defer them to first boot, then please place them into pkg_postinst_ontarget:${PN} (). Deferring to first boot via 'exit 1' is no longer supported. https://autobuilder.yoctoproject.org/valkyrie/#/builders/26/builds/1704 Looking at the build, I believe following configuration should be enough to reproduce on core-image-sato or even core-image-minimal. INIT_MANAGER = 'systemd' DISTRO_FEATURES_BACKFILL_CONSIDERED:remove = 'sysvinit' TEST_SUITES:append = ' systemd' Can you have a look at the source of this issue please?
Hi Alexander, On Wed, May 28, 2025 at 11:07:54AM +0200, Alexander Kanavin wrote: > On Wed, 28 May 2025 at 05:56, Patrick Williams via > lists.openembedded.org <patrick=stwcx.xyz@lists.openembedded.org> > wrote: > > The commit 7a580800db39 switched from a small Python implementation > > of `systemctl` to using the real systemctl executable from the > > systemd package. In the systemd.bbclass, systemd-systemctl-native > > is used to default-enable services, based on the SYSTEMD_SERVICES > > variable, but this was only done if `systemctl` can be executed > > without error. The problem is that the real systemctl executable > > treats a zero argument call as if `systemctl list-units` were ran. > > This cannot be done when cross-compiling and yields: > > > > ``` > > Failed to connect to system scope bus via local transport: Operation > > not permitted (consider using --machine=<user>@.host --user to > > connect to bus of other user) > > ``` > > > > The end result is that the `systemd_postinst` effectively turns into > > a silent no-op and services are not correctly enabled. > > > > Switch the systemd.bbclass to use `systemctl --help` instead, which > > does not require any dbus access to be functional. > > > > Fixes: 7a580800db39 ("systemd: Build the systemctl executable") > > I don't disagree with the patch, but I have to ask, why isn't this > causing failures in our testing? There are images built using native > systemctl, they contain services, and they do not produce fails on > boot. There has to be a gap in testing somewhere, or systemctl somehow > manages to do what it's supposed to. > > Can you set up poky-altcfg qemux86-64 on master, build > core-image-weston and core-image-sato-sdk, and check what is > happening? (e.g. by booting them in with runqemu). I did fire this up in poky-altcfg, and you are correct that we do have services enabled correctly there. As an example, looking at the rootfs we have symlinks out there as expected: ``` $ find tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/rootfs/etc/systemd/ -type l ... tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/rootfs/etc/systemd/system/multi-user.target.wants/avahi-daemon.service ``` I then dug into the `log.do_rootfs` to figure out where this is coming from and this is what is going on: ``` DEBUG: Executing shell function systemd_handle_machine_id ... Created symlink '/home/apwillia/local/sync/openbmc-sources/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/rootfs/etc/systemd/system/dbus-org.freedesktop.Avahi.service' → '/usr/lib/systemd/system/avahi-daemon.service'. ... DEBUG: Shell function systemd_handle_machine_id finished ``` The symlinks we are currently getting are a side-effect of a `systemctl --preset-mode=enable-only preset-all` call in `systemd_handle_machine_id`. This is different from the `systemd_postinst` hook which is what I was changing here. The preset files are also being generated by `systemd_create_presets` in systemd.bbclass: ``` $ cat usr/lib/systemd/system-preset/98-avahi-daemon.preset enable avahi-daemon.service ``` Both me and Martin[1] are experiencing the issue not with typical services but with template services though. There are no template services in poky-altcfg except for the `serial-getty@.service` files which have symlinks manually created in systemd-serialgetty.bb. I've dug into that a big and what I've found is that the systemctl.preset manpage[2] has a different format for template services than what `systemd_create_presets` is currently generating. What I see generated is: ``` $ cat ./usr/lib/systemd/system-preset/98-phosphor-ipmi-net.preset enable phosphor-ipmi-net@eth0.service enable phosphor-ipmi-net@eth0.socket ``` What the systemd manpage gives as an example is: ``` # /usr/lib/systemd/system-preset/80-dirsrv.preset enable dirsrv@.service foo bar baz ``` The old Python fake-systemctl `enable` use to handle initializing the template instances and calling the systemd-systemctl-native one does as well. The problem is that systemd-systemctl-native is also much more picky about service files that we call `systemctl enable` against and as Mathieu[3] points out, a number of packages currently fail using it. So, this patch that I have here doesn't actually work for _those_ packages. Walnascar is currently broken for template services and so is master. My suggestion would be: - Revert 7a580800db39 in walnascar. - Improve systemd_create_presets to handle template services in master. - [Maybe] fix all the services failing per Mathieu on master. - [Maybe] apply this change so that `systemd_postinst` runs as it use to. [1]: https://lore.kernel.org/openembedded-core/Z2ch.1747051947055246176.oktf@lists.openembedded.org/ [2]: https://www.freedesktop.org/software/systemd/man/latest/systemd.preset.html [3]: https://lore.kernel.org/openembedded-core/DA7SP4NN2WXR.1NUSV0GI05ZO5@bootlin.com/
On Wed, 28 May 2025 at 21:47, Patrick Williams <patrick@stwcx.xyz> wrote: > Walnascar is currently broken for template services and so is master. > My suggestion would be: > > - Revert 7a580800db39 in walnascar. > - Improve systemd_create_presets to handle template services in master. > - [Maybe] fix all the services failing per Mathieu on master. > - [Maybe] apply this change so that `systemd_postinst` runs > as it use to. Thanks, I appreciate you digging into the issue. That said, the plan to fix it needs to be reordered. We have a strict policy of fixing issues in master first, for good reason. Also, backporting the fixes to stable branches is much easier to get accepted than major reverts like that. Also, if something currently quietly fails in poky-altcfg (it wasn't entirely clear from your analysis), then the fix shouldn't just address the issue, it should also make any possible fail a loud and hard one. So it doesn't quietly regress again. Alex
On Fri, May 30, 2025 at 01:19:25PM +0200, Alexander Kanavin wrote: > On Wed, 28 May 2025 at 21:47, Patrick Williams <patrick@stwcx.xyz> wrote: > > > Walnascar is currently broken for template services and so is master. > > My suggestion would be: > > > > - Revert 7a580800db39 in walnascar. > > - Improve systemd_create_presets to handle template services in master. > > - [Maybe] fix all the services failing per Mathieu on master. > > - [Maybe] apply this change so that `systemd_postinst` runs > > as it use to. > > Thanks, I appreciate you digging into the issue. Thank you for the feedback Alex. > That said, the plan to fix it needs to be reordered. We have a strict > policy of fixing issues in master first, for good reason. Also, > backporting the fixes to stable branches is much easier to get > accepted than major reverts like that. Agreed. I tried to do a simple revert locally and it had conflicts already. It was easier to just do the needful in systemd_create_presets. This is done and ready for feedback: https://lore.kernel.org/openembedded-core/20250529174427.2731440-1-patrick@stwcx.xyz/T/#u > Also, if something currently quietly fails in poky-altcfg (it wasn't > entirely clear from your analysis), then the fix shouldn't just > address the issue, it should also make any possible fail a loud and > hard one. So it doesn't quietly regress again. poky-altcfg does not currently fail (or fail to perform). - There are no template instances in any package in poky-altcfg that rely on the use of SYSTEMD_SERVICE. - The only template instance in poky-altcfg is created manually with symlinks (see `systemd-serialgetty.bb:do_install`). This problem only affects template instances: SYSTEMD_SERVICE = "foo@bar.service" # Template instance SYSTEMD_SERVICE = "foo.service" # Non-template instance Prior to 7a580800db39, template instances listed in SYSTEMD_SERVICE would have been enabled by the systemd_postinst hook that called [fake] systemctl. After 7a580800db39, the systemd_postinst hook silently skips. Prior to 7a580800db39, non-template instances listed in SYSTEMD_SERVICE were enabled by systemd_create_presets files (and also duplicatively enabled by systemd_postinst). After 7a580800db39, non-template instances are enabled only by the systemd_create_presets. With my change linked above, the template instances will be enabled by systemd_create_presets as well. Thus all instances (template or not) will be handled by systemd_create_presets, and the systemd_postinst hook only has effect when installing via an RPM and not when building a rootfs. entirely.
diff --git a/meta/classes-recipe/systemd.bbclass b/meta/classes-recipe/systemd.bbclass index 4c9f51d33d..e1fe4ed072 100644 --- a/meta/classes-recipe/systemd.bbclass +++ b/meta/classes-recipe/systemd.bbclass @@ -29,7 +29,7 @@ python __anonymous() { } systemd_postinst() { -if systemctl >/dev/null 2>/dev/null; then +if systemctl --help >/dev/null 2>/dev/null; then OPTS="" if [ -n "$D" ]; then @@ -66,7 +66,7 @@ fi } systemd_prerm() { -if systemctl >/dev/null 2>/dev/null; then +if systemctl --help >/dev/null 2>/dev/null; then if [ -z "$D" ]; then if [ -n "${@systemd_filter_services("${SYSTEMD_SERVICE_ESCAPED}", False, d)}" ]; then systemctl stop ${@systemd_filter_services("${SYSTEMD_SERVICE_ESCAPED}", False, d)}
The commit 7a580800db39 switched from a small Python implementation of `systemctl` to using the real systemctl executable from the systemd package. In the systemd.bbclass, systemd-systemctl-native is used to default-enable services, based on the SYSTEMD_SERVICES variable, but this was only done if `systemctl` can be executed without error. The problem is that the real systemctl executable treats a zero argument call as if `systemctl list-units` were ran. This cannot be done when cross-compiling and yields: ``` Failed to connect to system scope bus via local transport: Operation not permitted (consider using --machine=<user>@.host --user to connect to bus of other user) ``` The end result is that the `systemd_postinst` effectively turns into a silent no-op and services are not correctly enabled. Switch the systemd.bbclass to use `systemctl --help` instead, which does not require any dbus access to be functional. Fixes: 7a580800db39 ("systemd: Build the systemctl executable") Signed-off-by: Patrick Williams <patrick@stwcx.xyz> --- meta/classes-recipe/systemd.bbclass | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)