diff mbox series

systemd.bbclass: fix postinst for real systemd-systemctl-native

Message ID 20250528035625.1262266-1-patrick@stwcx.xyz
State New
Headers show
Series systemd.bbclass: fix postinst for real systemd-systemctl-native | expand

Commit Message

Patrick Williams May 28, 2025, 3:56 a.m. UTC
The commit 7a580800db39 switched from a small Python implementation
of `systemctl` to using the real systemctl executable from the
systemd package.  In the systemd.bbclass, systemd-systemctl-native
is used to default-enable services, based on the SYSTEMD_SERVICES
variable, but this was only done if `systemctl` can be executed
without error.  The problem is that the real systemctl executable
treats a zero argument call as if `systemctl list-units` were ran.
This cannot be done when cross-compiling and yields:

```
    Failed to connect to system scope bus via local transport: Operation
    not permitted (consider using --machine=<user>@.host --user to
    connect to bus of other user)
```

The end result is that the `systemd_postinst` effectively turns into
a silent no-op and services are not correctly enabled.

Switch the systemd.bbclass to use `systemctl --help` instead, which
does not require any dbus access to be functional.

Fixes: 7a580800db39 ("systemd: Build the systemctl executable")
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
---
 meta/classes-recipe/systemd.bbclass | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Alexander Kanavin May 28, 2025, 9:07 a.m. UTC | #1
On Wed, 28 May 2025 at 05:56, Patrick Williams via
lists.openembedded.org <patrick=stwcx.xyz@lists.openembedded.org>
wrote:
> The commit 7a580800db39 switched from a small Python implementation
> of `systemctl` to using the real systemctl executable from the
> systemd package.  In the systemd.bbclass, systemd-systemctl-native
> is used to default-enable services, based on the SYSTEMD_SERVICES
> variable, but this was only done if `systemctl` can be executed
> without error.  The problem is that the real systemctl executable
> treats a zero argument call as if `systemctl list-units` were ran.
> This cannot be done when cross-compiling and yields:
>
> ```
>     Failed to connect to system scope bus via local transport: Operation
>     not permitted (consider using --machine=<user>@.host --user to
>     connect to bus of other user)
> ```
>
> The end result is that the `systemd_postinst` effectively turns into
> a silent no-op and services are not correctly enabled.
>
> Switch the systemd.bbclass to use `systemctl --help` instead, which
> does not require any dbus access to be functional.
>
> Fixes: 7a580800db39 ("systemd: Build the systemctl executable")

I don't disagree with the patch, but I have to ask, why isn't this
causing failures in our testing? There are images built using native
systemctl, they contain services, and they do not produce fails on
boot. There has to be a gap in testing somewhere, or systemctl somehow
manages to do what it's supposed to.

Can you set up poky-altcfg qemux86-64 on master, build
core-image-weston and core-image-sato-sdk, and check what is
happening? (e.g. by booting them in with runqemu).

Alex
Mathieu Dubois-Briand May 28, 2025, 12:33 p.m. UTC | #2
On Wed May 28, 2025 at 5:56 AM CEST, Patrick Williams wrote:
> The commit 7a580800db39 switched from a small Python implementation
> of `systemctl` to using the real systemctl executable from the
> systemd package.  In the systemd.bbclass, systemd-systemctl-native
> is used to default-enable services, based on the SYSTEMD_SERVICES
> variable, but this was only done if `systemctl` can be executed
> without error.  The problem is that the real systemctl executable
> treats a zero argument call as if `systemctl list-units` were ran.
> This cannot be done when cross-compiling and yields:
>
> ```
>     Failed to connect to system scope bus via local transport: Operation
>     not permitted (consider using --machine=<user>@.host --user to
>     connect to bus of other user)
> ```
>
> The end result is that the `systemd_postinst` effectively turns into
> a silent no-op and services are not correctly enabled.
>
> Switch the systemd.bbclass to use `systemctl --help` instead, which
> does not require any dbus access to be functional.
>
> Fixes: 7a580800db39 ("systemd: Build the systemctl executable")
> Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
> ---

Hi Patrick,

Thanks for your patch.

It looks like this is breaking postinstall in some configurations, with
the following error:

ERROR: core-image-sato-1.0-r0 do_rootfs: Postinstall scriptlets of ['bluez5', 'ofono', 'connman', 'xserver-nodm-init', 'avahi-daemon', 'rpcbind', 'run-postinsts'] have failed. If the intention is to defer them to first boot,
then please place them into pkg_postinst_ontarget:${PN} ().
Deferring to first boot via 'exit 1' is no longer supported.

https://autobuilder.yoctoproject.org/valkyrie/#/builders/26/builds/1704

Looking at the build, I believe following configuration should be enough
to reproduce on core-image-sato or even core-image-minimal.

INIT_MANAGER = 'systemd'
DISTRO_FEATURES_BACKFILL_CONSIDERED:remove = 'sysvinit'
TEST_SUITES:append = ' systemd'

Can you have a look at the source of this issue please?
Patrick Williams May 28, 2025, 7:47 p.m. UTC | #3
Hi Alexander,

On Wed, May 28, 2025 at 11:07:54AM +0200, Alexander Kanavin wrote:
> On Wed, 28 May 2025 at 05:56, Patrick Williams via
> lists.openembedded.org <patrick=stwcx.xyz@lists.openembedded.org>
> wrote:
> > The commit 7a580800db39 switched from a small Python implementation
> > of `systemctl` to using the real systemctl executable from the
> > systemd package.  In the systemd.bbclass, systemd-systemctl-native
> > is used to default-enable services, based on the SYSTEMD_SERVICES
> > variable, but this was only done if `systemctl` can be executed
> > without error.  The problem is that the real systemctl executable
> > treats a zero argument call as if `systemctl list-units` were ran.
> > This cannot be done when cross-compiling and yields:
> >
> > ```
> >     Failed to connect to system scope bus via local transport: Operation
> >     not permitted (consider using --machine=<user>@.host --user to
> >     connect to bus of other user)
> > ```
> >
> > The end result is that the `systemd_postinst` effectively turns into
> > a silent no-op and services are not correctly enabled.
> >
> > Switch the systemd.bbclass to use `systemctl --help` instead, which
> > does not require any dbus access to be functional.
> >
> > Fixes: 7a580800db39 ("systemd: Build the systemctl executable")
> 
> I don't disagree with the patch, but I have to ask, why isn't this
> causing failures in our testing? There are images built using native
> systemctl, they contain services, and they do not produce fails on
> boot. There has to be a gap in testing somewhere, or systemctl somehow
> manages to do what it's supposed to.
> 
> Can you set up poky-altcfg qemux86-64 on master, build
> core-image-weston and core-image-sato-sdk, and check what is
> happening? (e.g. by booting them in with runqemu).

I did fire this up in poky-altcfg, and you are correct that we do have
services enabled correctly there.  As an example, looking at the rootfs
we have symlinks out there as expected:

```
$ find tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/rootfs/etc/systemd/ -type l
...
tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/rootfs/etc/systemd/system/multi-user.target.wants/avahi-daemon.service
```

I then dug into the `log.do_rootfs` to figure out where this is coming
from and this is what is going on:

```
DEBUG: Executing shell function systemd_handle_machine_id
...

Created symlink '/home/apwillia/local/sync/openbmc-sources/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/rootfs/etc/systemd/system/dbus-org.freedesktop.Avahi.service' → '/usr/lib/systemd/system/avahi-daemon.service'.
...
DEBUG: Shell function systemd_handle_machine_id finished
```

The symlinks we are currently getting are a side-effect of a `systemctl
--preset-mode=enable-only preset-all` call in
`systemd_handle_machine_id`.  This is different from the
`systemd_postinst` hook which is what I was changing here.  The preset
files are also being generated by `systemd_create_presets` in
systemd.bbclass:

```
$ cat usr/lib/systemd/system-preset/98-avahi-daemon.preset
enable avahi-daemon.service
```

Both me and Martin[1] are experiencing the issue not with
typical services but with template services though.  There are no
template services in poky-altcfg except for the `serial-getty@.service`
files which have symlinks manually created in systemd-serialgetty.bb.

I've dug into that a big and what I've found is that the
systemctl.preset manpage[2] has a different format for template services
than what `systemd_create_presets` is currently generating.  What I see
generated is:

```
$ cat ./usr/lib/systemd/system-preset/98-phosphor-ipmi-net.preset              
enable phosphor-ipmi-net@eth0.service
enable phosphor-ipmi-net@eth0.socket
```

What the systemd manpage gives as an example is:

```
# /usr/lib/systemd/system-preset/80-dirsrv.preset
enable dirsrv@.service foo bar baz
```

The old Python fake-systemctl `enable` use to  handle initializing the template
instances and calling the systemd-systemctl-native one does as well.
The problem is that systemd-systemctl-native is also much more picky
about service files that we call `systemctl enable` against and as
Mathieu[3] points out, a number of packages currently fail using it.
So, this patch that I have here doesn't actually work for _those_
packages.

Walnascar is currently broken for template services and so is master.
My suggestion would be:

   - Revert 7a580800db39 in walnascar.
   - Improve systemd_create_presets to handle template services in master.
   - [Maybe] fix all the services failing per Mathieu on master.
   - [Maybe] apply this change so that `systemd_postinst` runs
             as it use to.

[1]: https://lore.kernel.org/openembedded-core/Z2ch.1747051947055246176.oktf@lists.openembedded.org/
[2]: https://www.freedesktop.org/software/systemd/man/latest/systemd.preset.html
[3]: https://lore.kernel.org/openembedded-core/DA7SP4NN2WXR.1NUSV0GI05ZO5@bootlin.com/
Alexander Kanavin May 30, 2025, 11:19 a.m. UTC | #4
On Wed, 28 May 2025 at 21:47, Patrick Williams <patrick@stwcx.xyz> wrote:

> Walnascar is currently broken for template services and so is master.
> My suggestion would be:
>
>    - Revert 7a580800db39 in walnascar.
>    - Improve systemd_create_presets to handle template services in master.
>    - [Maybe] fix all the services failing per Mathieu on master.
>    - [Maybe] apply this change so that `systemd_postinst` runs
>              as it use to.

Thanks, I appreciate you digging into the issue.

That said, the plan to fix it needs to be reordered. We have a strict
policy of fixing issues in master first, for good reason. Also,
backporting the fixes to stable branches is much easier to get
accepted than major reverts like that.

Also, if something currently quietly fails in poky-altcfg (it wasn't
entirely clear from your analysis), then the fix shouldn't just
address the issue, it should also make any possible fail a loud and
hard one. So it doesn't quietly regress again.

Alex
Patrick Williams May 30, 2025, 1:04 p.m. UTC | #5
On Fri, May 30, 2025 at 01:19:25PM +0200, Alexander Kanavin wrote:
> On Wed, 28 May 2025 at 21:47, Patrick Williams <patrick@stwcx.xyz> wrote:
> 
> > Walnascar is currently broken for template services and so is master.
> > My suggestion would be:
> >
> >    - Revert 7a580800db39 in walnascar.
> >    - Improve systemd_create_presets to handle template services in master.
> >    - [Maybe] fix all the services failing per Mathieu on master.
> >    - [Maybe] apply this change so that `systemd_postinst` runs
> >              as it use to.
> 
> Thanks, I appreciate you digging into the issue.

Thank you for the feedback Alex.

> That said, the plan to fix it needs to be reordered. We have a strict
> policy of fixing issues in master first, for good reason. Also,
> backporting the fixes to stable branches is much easier to get
> accepted than major reverts like that.

Agreed.  I tried to do a simple revert locally and it had conflicts
already.  It was easier to just do the needful in
systemd_create_presets.

This is done and ready for feedback:
    https://lore.kernel.org/openembedded-core/20250529174427.2731440-1-patrick@stwcx.xyz/T/#u

> Also, if something currently quietly fails in poky-altcfg (it wasn't
> entirely clear from your analysis), then the fix shouldn't just
> address the issue, it should also make any possible fail a loud and
> hard one. So it doesn't quietly regress again.

poky-altcfg does not currently fail (or fail to perform).
    - There are no template instances in any package in poky-altcfg that
      rely on the use of SYSTEMD_SERVICE.
    - The only template instance in poky-altcfg is created manually with
      symlinks (see `systemd-serialgetty.bb:do_install`).

This problem only affects template instances: 
    SYSTEMD_SERVICE = "foo@bar.service" # Template instance
    SYSTEMD_SERVICE = "foo.service" # Non-template instance

Prior to 7a580800db39, template instances listed in SYSTEMD_SERVICE
would have been enabled by the systemd_postinst hook that called [fake]
systemctl.  After 7a580800db39, the systemd_postinst hook silently
skips.

Prior to 7a580800db39, non-template instances listed in SYSTEMD_SERVICE
were enabled by systemd_create_presets files (and also duplicatively
enabled by systemd_postinst).  After 7a580800db39, non-template
instances are enabled only by the systemd_create_presets.

With my change linked above, the template instances will be enabled by
systemd_create_presets as well.  Thus all instances (template or not) will be
handled by systemd_create_presets, and the systemd_postinst hook only has
effect when installing via an RPM and not when building a rootfs.
entirely.
diff mbox series

Patch

diff --git a/meta/classes-recipe/systemd.bbclass b/meta/classes-recipe/systemd.bbclass
index 4c9f51d33d..e1fe4ed072 100644
--- a/meta/classes-recipe/systemd.bbclass
+++ b/meta/classes-recipe/systemd.bbclass
@@ -29,7 +29,7 @@  python __anonymous() {
 }
 
 systemd_postinst() {
-if systemctl >/dev/null 2>/dev/null; then
+if systemctl --help >/dev/null 2>/dev/null; then
 	OPTS=""
 
 	if [ -n "$D" ]; then
@@ -66,7 +66,7 @@  fi
 }
 
 systemd_prerm() {
-if systemctl >/dev/null 2>/dev/null; then
+if systemctl --help >/dev/null 2>/dev/null; then
 	if [ -z "$D" ]; then
 		if [ -n "${@systemd_filter_services("${SYSTEMD_SERVICE_ESCAPED}", False, d)}" ]; then
 			systemctl stop ${@systemd_filter_services("${SYSTEMD_SERVICE_ESCAPED}", False, d)}