mbox series

[PATCHv6,0/5] Display manager proposal for x11 and wayland

Message ID 20250520233935.740242-1-rs@ti.com
Headers show
Series Display manager proposal for x11 and wayland | expand

Message

Randolph Sapp May 20, 2025, 11:39 p.m. UTC
From: Randolph Sapp <rs@ti.com>

We've recently run into some issues with weston-init attempting to start Weston
prior to all drm devices being registered. There's not really a good, scriptable
mechanism to listen in to device registration events that works with the
existing weston-init package. Well, at least one that doesn't involve polling
files or introducing more dependency on the init system being used.

I also see there is also a lot of scripting around starting X11,
xserver-nodm-init, that (from my limited review) should experience the same
issue.

I'd like to introduce the following display manager for oe-core, emptty [1].
This display manager is, as described upstream, a "Dead simple CLI Display
Manager on TTY". It supports both x11 and wayland sessions, with togglable build
parameters to completely remove x11 and pam dependencies. It's licensed MIT,
which shouldn't be an issue for any users. (It is written in Go, if you have
opinions about that.)

With this, both weston-init and the xserver-nodm-init packages can be re-tuned
to leverage this display manager and simply add a user and emptty config for an
autologin session. This can resolve the current behavior across init systems
without additional scripting, and move some development out of this layer.

This lists myself as a maintainer of emptty as well as xserver-nodm-init and
xuser-account since these are currently unassigned and I've reworked them
significantly here.

Sorry for the delay on this series. I found a few bugs in emptty that I wanted
to address before submitting this officially.

[1] https://github.com/tvrzna/emptty

v2:
	- Address spelling issues in commit messages
	- Attempt to resolve some test related issues with weston
	- Add additional logs to X11 related tests
v3:
	- Reset AUTOLOGIN_MAX_RETRY to the default value of 2. When running
	  under QEMU the first auth attempt almost always fails.
v4:
	- Add a tmpfile entry for the x11 domain socket directory.
	- Remove some scripts associated with weston-init that were being
	  shipped with weston
v5:
	- Move tmpfile data to individual files
	- Add explicit entries for these in the FILES variable
v6:
	- Do not attempt to ship a tmpfiles.d entry in libx11


Randolph Sapp (5):
  libx11: create tmpfile dir for x11 domain socket
  emptty: add version 0.14.0
  weston-init: convert to virtual-emptty-conf
  weston: remove deprecated weston-start scripts
  xserver-nodm-init: convert to virtual-emptty-conf

 .../conf/distro/include/default-providers.inc |   1 +
 meta/conf/distro/include/maintainers.inc      |   6 +-
 meta/lib/oeqa/runtime/cases/weston.py         |  18 +-
 meta/lib/oeqa/runtime/cases/xorg.py           |   8 +
 meta/recipes-graphics/emptty/emptty-conf.bb   |  14 +
 meta/recipes-graphics/emptty/emptty.inc       |  26 ++
 .../recipes-graphics/emptty/emptty/emptty.tab |   1 +
 meta/recipes-graphics/emptty/emptty/pamconf   |  10 +
 meta/recipes-graphics/emptty/emptty_0.14.0.bb |  53 +++
 meta/recipes-graphics/wayland/weston-init.bb  |  61 +--
 .../wayland/weston-init/emptty.conf           |  77 ++++
 .../recipes-graphics/wayland/weston-init/init |  54 ---
 .../wayland/weston-init/weston-autologin      |  11 -
 .../wayland/weston-init/weston-socket.sh      |  20 -
 .../wayland/weston-init/weston-start          |  76 ----
 .../wayland/weston-init/weston.env            |   0
 .../wayland/weston-init/weston.service        |  71 ----
 .../wayland/weston-init/weston.socket         |  14 -
 .../weston/systemd-notify.weston-start        |   9 -
 .../wayland/weston/xwayland.weston-start      |   6 -
 .../recipes-graphics/wayland/weston_14.0.1.bb |  10 -
 .../x11-common/xserver-nodm-init/X11/Xsession |  38 --
 .../X11/Xsession.d/13xdgbasedirs.sh           |  19 -
 .../X11/Xsession.d/89xdgautostart.sh          |   7 -
 .../X11/Xsession.d/90XWindowManager.sh        |   7 -
 .../x11-common/xserver-nodm-init/Xserver      |  25 --
 .../xserver-nodm-init/capability.conf         |   2 -
 .../xserver-nodm-init/default.desktop         |   5 +
 .../xserver-nodm-init/emptty.conf.in          |  77 ++++
 .../xserver-nodm-init/gplv2-license.patch     | 355 ------------------
 .../x11-common/xserver-nodm-init/xserver-nodm |  75 ----
 .../xserver-nodm-init/xserver-nodm.conf.in    |   7 -
 .../xserver-nodm-init/xserver-nodm.service.in |  11 -
 .../x11-common/xserver-nodm-init_3.0.bb       |  57 +--
 meta/recipes-graphics/xorg-lib/libx11/99_x11  |   1 +
 .../xorg-lib/libx11_1.8.12.bb                 |  15 +-
 .../user-creation/xuser-account_0.1.bb        |   3 +-
 37 files changed, 332 insertions(+), 918 deletions(-)
 create mode 100644 meta/recipes-graphics/emptty/emptty-conf.bb
 create mode 100644 meta/recipes-graphics/emptty/emptty.inc
 create mode 100644 meta/recipes-graphics/emptty/emptty/emptty.tab
 create mode 100644 meta/recipes-graphics/emptty/emptty/pamconf
 create mode 100644 meta/recipes-graphics/emptty/emptty_0.14.0.bb
 create mode 100644 meta/recipes-graphics/wayland/weston-init/emptty.conf
 delete mode 100644 meta/recipes-graphics/wayland/weston-init/init
 delete mode 100644 meta/recipes-graphics/wayland/weston-init/weston-autologin
 delete mode 100755 meta/recipes-graphics/wayland/weston-init/weston-socket.sh
 delete mode 100755 meta/recipes-graphics/wayland/weston-init/weston-start
 delete mode 100644 meta/recipes-graphics/wayland/weston-init/weston.env
 delete mode 100644 meta/recipes-graphics/wayland/weston-init/weston.service
 delete mode 100644 meta/recipes-graphics/wayland/weston-init/weston.socket
 delete mode 100644 meta/recipes-graphics/wayland/weston/systemd-notify.weston-start
 delete mode 100644 meta/recipes-graphics/wayland/weston/xwayland.weston-start
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/X11/Xsession
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/X11/Xsession.d/13xdgbasedirs.sh
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/X11/Xsession.d/89xdgautostart.sh
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/X11/Xsession.d/90XWindowManager.sh
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/Xserver
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/capability.conf
 create mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/default.desktop
 create mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/emptty.conf.in
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/gplv2-license.patch
 delete mode 100755 meta/recipes-graphics/x11-common/xserver-nodm-init/xserver-nodm
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/xserver-nodm.conf.in
 delete mode 100644 meta/recipes-graphics/x11-common/xserver-nodm-init/xserver-nodm.service.in
 create mode 100644 meta/recipes-graphics/xorg-lib/libx11/99_x11

Comments

Mathieu Dubois-Briand May 21, 2025, 3:58 p.m. UTC | #1
On Wed May 21, 2025 at 1:39 AM CEST, rs wrote:
> From: Randolph Sapp <rs@ti.com>
>
> We've recently run into some issues with weston-init attempting to start Weston
> prior to all drm devices being registered. There's not really a good, scriptable
> mechanism to listen in to device registration events that works with the
> existing weston-init package. Well, at least one that doesn't involve polling
> files or introducing more dependency on the init system being used.
>
> I also see there is also a lot of scripting around starting X11,
> xserver-nodm-init, that (from my limited review) should experience the same
> issue.
>
> I'd like to introduce the following display manager for oe-core, emptty [1].
> This display manager is, as described upstream, a "Dead simple CLI Display
> Manager on TTY". It supports both x11 and wayland sessions, with togglable build
> parameters to completely remove x11 and pam dependencies. It's licensed MIT,
> which shouldn't be an issue for any users. (It is written in Go, if you have
> opinions about that.)
>
> With this, both weston-init and the xserver-nodm-init packages can be re-tuned
> to leverage this display manager and simply add a user and emptty config for an
> autologin session. This can resolve the current behavior across init systems
> without additional scripting, and move some development out of this layer.
>
> This lists myself as a maintainer of emptty as well as xserver-nodm-init and
> xuser-account since these are currently unassigned and I've reworked them
> significantly here.
>
> Sorry for the delay on this series. I found a few bugs in emptty that I wanted
> to address before submitting this officially.
>
> [1] https://github.com/tvrzna/emptty
>
> v2:
> 	- Address spelling issues in commit messages
> 	- Attempt to resolve some test related issues with weston
> 	- Add additional logs to X11 related tests
> v3:
> 	- Reset AUTOLOGIN_MAX_RETRY to the default value of 2. When running
> 	  under QEMU the first auth attempt almost always fails.
> v4:
> 	- Add a tmpfile entry for the x11 domain socket directory.
> 	- Remove some scripts associated with weston-init that were being
> 	  shipped with weston
> v5:
> 	- Move tmpfile data to individual files
> 	- Add explicit entries for these in the FILES variable
> v6:
> 	- Do not attempt to ship a tmpfiles.d entry in libx11
>

Hi Randolph,

Thanks for the version, but again a previously seen error. Sorry for
the bad news :(

RESULTS - xorg.XorgTest.test_xorg_running: FAILED (1.31s)
...
AssertionError: 1 != 0 : Xorg does not appear to be running   PID USER       VSZ STAT COMMAND

https://autobuilder.yoctoproject.org/valkyrie/#/builders/20/builds/1637
https://autobuilder.yoctoproject.org/valkyrie/#/builders/95/builds/1638
https://autobuilder.yoctoproject.org/valkyrie/#/builders/9/builds/1649

I will drop the patch on my side, but I will keep the a-full build
running, we might see some other failures in the coming hours:

https://autobuilder.yoctoproject.org/valkyrie/#/builders/29/builds/1630
Randolph Sapp May 27, 2025, 10:16 p.m. UTC | #2
On Wed May 21, 2025 at 10:58 AM CDT, Mathieu Dubois-Briand wrote:
> On Wed May 21, 2025 at 1:39 AM CEST, rs wrote:
>> From: Randolph Sapp <rs@ti.com>
>>
>> We've recently run into some issues with weston-init attempting to start Weston
>> prior to all drm devices being registered. There's not really a good, scriptable
>> mechanism to listen in to device registration events that works with the
>> existing weston-init package. Well, at least one that doesn't involve polling
>> files or introducing more dependency on the init system being used.
>>
>> I also see there is also a lot of scripting around starting X11,
>> xserver-nodm-init, that (from my limited review) should experience the same
>> issue.
>>
>> I'd like to introduce the following display manager for oe-core, emptty [1].
>> This display manager is, as described upstream, a "Dead simple CLI Display
>> Manager on TTY". It supports both x11 and wayland sessions, with togglable build
>> parameters to completely remove x11 and pam dependencies. It's licensed MIT,
>> which shouldn't be an issue for any users. (It is written in Go, if you have
>> opinions about that.)
>>
>> With this, both weston-init and the xserver-nodm-init packages can be re-tuned
>> to leverage this display manager and simply add a user and emptty config for an
>> autologin session. This can resolve the current behavior across init systems
>> without additional scripting, and move some development out of this layer.
>>
>> This lists myself as a maintainer of emptty as well as xserver-nodm-init and
>> xuser-account since these are currently unassigned and I've reworked them
>> significantly here.
>>
>> Sorry for the delay on this series. I found a few bugs in emptty that I wanted
>> to address before submitting this officially.
>>
>> [1] https://github.com/tvrzna/emptty
>>
>> v2:
>> 	- Address spelling issues in commit messages
>> 	- Attempt to resolve some test related issues with weston
>> 	- Add additional logs to X11 related tests
>> v3:
>> 	- Reset AUTOLOGIN_MAX_RETRY to the default value of 2. When running
>> 	  under QEMU the first auth attempt almost always fails.
>> v4:
>> 	- Add a tmpfile entry for the x11 domain socket directory.
>> 	- Remove some scripts associated with weston-init that were being
>> 	  shipped with weston
>> v5:
>> 	- Move tmpfile data to individual files
>> 	- Add explicit entries for these in the FILES variable
>> v6:
>> 	- Do not attempt to ship a tmpfiles.d entry in libx11
>>
>
> Hi Randolph,
>
> Thanks for the version, but again a previously seen error. Sorry for
> the bad news :(
>
> RESULTS - xorg.XorgTest.test_xorg_running: FAILED (1.31s)
> ...
> AssertionError: 1 != 0 : Xorg does not appear to be running   PID USER       VSZ STAT COMMAND
>
> https://autobuilder.yoctoproject.org/valkyrie/#/builders/20/builds/1637
> https://autobuilder.yoctoproject.org/valkyrie/#/builders/95/builds/1638
> https://autobuilder.yoctoproject.org/valkyrie/#/builders/9/builds/1649
>
> I will drop the patch on my side, but I will keep the a-full build
> running, we might see some other failures in the coming hours:
>
> https://autobuilder.yoctoproject.org/valkyrie/#/builders/29/builds/1630

Ah, the auth failures persist. Normally this is caused when the user attempts to
login before the user account was created or something unusual like that. I
don't suppose there's any way this could be happening on the test machines?

I'm not able to reproduce this locally, but my build config is different. I tend
to use oe-core and nodistro instead of poky. They aren't playing around with the
reproducible build date stuff or anything like that are they?

Well, I mean I did see this on occasion. The first login attempt almost always
failed, but the subsequent attempts were fine. Bumping the attempt count back up
the default (2) was enough to resolve it on my end, but evidently it's not
enough here.

This is not happening on hardware from what I'm seeing.
Mathieu Dubois-Briand May 28, 2025, 11:53 a.m. UTC | #3
On Wed May 28, 2025 at 12:16 AM CEST, Randolph Sapp wrote:
> On Wed May 21, 2025 at 10:58 AM CDT, Mathieu Dubois-Briand wrote:
>> On Wed May 21, 2025 at 1:39 AM CEST, rs wrote:
>>> From: Randolph Sapp <rs@ti.com>
>>>
>>> We've recently run into some issues with weston-init attempting to start Weston
>>> prior to all drm devices being registered. There's not really a good, scriptable
>>> mechanism to listen in to device registration events that works with the
>>> existing weston-init package. Well, at least one that doesn't involve polling
>>> files or introducing more dependency on the init system being used.
>>>
>>> I also see there is also a lot of scripting around starting X11,
>>> xserver-nodm-init, that (from my limited review) should experience the same
>>> issue.
>>>
>>> I'd like to introduce the following display manager for oe-core, emptty [1].
>>> This display manager is, as described upstream, a "Dead simple CLI Display
>>> Manager on TTY". It supports both x11 and wayland sessions, with togglable build
>>> parameters to completely remove x11 and pam dependencies. It's licensed MIT,
>>> which shouldn't be an issue for any users. (It is written in Go, if you have
>>> opinions about that.)
>>>
>>> With this, both weston-init and the xserver-nodm-init packages can be re-tuned
>>> to leverage this display manager and simply add a user and emptty config for an
>>> autologin session. This can resolve the current behavior across init systems
>>> without additional scripting, and move some development out of this layer.
>>>
>>> This lists myself as a maintainer of emptty as well as xserver-nodm-init and
>>> xuser-account since these are currently unassigned and I've reworked them
>>> significantly here.
>>>
>>> Sorry for the delay on this series. I found a few bugs in emptty that I wanted
>>> to address before submitting this officially.
>>>
>>> [1] https://github.com/tvrzna/emptty
>>>
>>> v2:
>>> 	- Address spelling issues in commit messages
>>> 	- Attempt to resolve some test related issues with weston
>>> 	- Add additional logs to X11 related tests
>>> v3:
>>> 	- Reset AUTOLOGIN_MAX_RETRY to the default value of 2. When running
>>> 	  under QEMU the first auth attempt almost always fails.
>>> v4:
>>> 	- Add a tmpfile entry for the x11 domain socket directory.
>>> 	- Remove some scripts associated with weston-init that were being
>>> 	  shipped with weston
>>> v5:
>>> 	- Move tmpfile data to individual files
>>> 	- Add explicit entries for these in the FILES variable
>>> v6:
>>> 	- Do not attempt to ship a tmpfiles.d entry in libx11
>>>
>>
>> Hi Randolph,
>>
>> Thanks for the version, but again a previously seen error. Sorry for
>> the bad news :(
>>
>> RESULTS - xorg.XorgTest.test_xorg_running: FAILED (1.31s)
>> ...
>> AssertionError: 1 != 0 : Xorg does not appear to be running   PID USER       VSZ STAT COMMAND
>>
>> https://autobuilder.yoctoproject.org/valkyrie/#/builders/20/builds/1637
>> https://autobuilder.yoctoproject.org/valkyrie/#/builders/95/builds/1638
>> https://autobuilder.yoctoproject.org/valkyrie/#/builders/9/builds/1649
>>
>> I will drop the patch on my side, but I will keep the a-full build
>> running, we might see some other failures in the coming hours:
>>
>> https://autobuilder.yoctoproject.org/valkyrie/#/builders/29/builds/1630
>
> Ah, the auth failures persist. Normally this is caused when the user attempts to
> login before the user account was created or something unusual like that. I
> don't suppose there's any way this could be happening on the test machines?
>
> I'm not able to reproduce this locally, but my build config is different. I tend
> to use oe-core and nodistro instead of poky. They aren't playing around with the
> reproducible build date stuff or anything like that are they?
>
> Well, I mean I did see this on occasion. The first login attempt almost always
> failed, but the subsequent attempts were fine. Bumping the attempt count back up
> the default (2) was enough to resolve it on my end, but evidently it's not
> enough here.
>
> This is not happening on hardware from what I'm seeing.

Hi Randolph,

I was able to reproduce it locally, with the following configuration:
- https://git.yoctoproject.org/poky-ci-archive on tag
  autobuilder.yoctoproject.org/valkyrie/a-full-1630.
- template local.conf, with the following modifications:
  MACHINE = "qemux86"
  DISTRO = "poky-altcfg"
  SDKMACHINE = "x86_64"
  PACKAGE_CLASSES = "package_rpm package_deb package_ipk"
  INHERIT += 'image-buildinfo'
  IMAGE_BUILDINFO_VARS:append = ' IMAGE_BASENAME IMAGE_NAME'
  PACKAGE_CLASSES = 'package_ipk package_rpm package_deb'
  IMAGE_ROOTFS_EXTRA_SPACE:append = '${@bb.utils.contains('IMAGE_FEATURES', 'package-management', ' + 262144', '', d)}'
  IMAGE_INSTALL:append = ' ssh-pregen-hostkeys'
  SANITY_TESTED_DISTROS = ''
  OE_FRAGMENTS += 'core/yocto-autobuilder/autobuilder core/yocto-autobuilder/autobuilder-resource-constraints'
  # (Plus my own sstate cache and downloads dir).
- bitbake core-image-sato
- bitbake core-image-sato:do_testimage

Of course using the tag from poky-ci-archive is not mandatory, you
should see the same behaviour with master branch and your patches.

I saw some oe-selftest was also failing on the autobuilder, with an
error related to emptty. You can reproduce it with:

oe-selftest -r sstatetests.SStateHashSameSigs3.test_sstate_sametune_samesigs
Randolph Sapp June 5, 2025, 12:12 a.m. UTC | #4
On Wed May 28, 2025 at 6:53 AM CDT, Mathieu Dubois-Briand wrote:
> On Wed May 28, 2025 at 12:16 AM CEST, Randolph Sapp wrote:

[snip]

>> Ah, the auth failures persist. Normally this is caused when the user attempts to
>> login before the user account was created or something unusual like that. I
>> don't suppose there's any way this could be happening on the test machines?
>>
>> I'm not able to reproduce this locally, but my build config is different. I tend
>> to use oe-core and nodistro instead of poky. They aren't playing around with the
>> reproducible build date stuff or anything like that are they?
>>
>> Well, I mean I did see this on occasion. The first login attempt almost always
>> failed, but the subsequent attempts were fine. Bumping the attempt count back up
>> the default (2) was enough to resolve it on my end, but evidently it's not
>> enough here.
>>
>> This is not happening on hardware from what I'm seeing.
>
> Hi Randolph,
>
> I was able to reproduce it locally, with the following configuration:
> - https://git.yoctoproject.org/poky-ci-archive on tag
>   autobuilder.yoctoproject.org/valkyrie/a-full-1630.
> - template local.conf, with the following modifications:
>   MACHINE = "qemux86"
>   DISTRO = "poky-altcfg"
>   SDKMACHINE = "x86_64"
>   PACKAGE_CLASSES = "package_rpm package_deb package_ipk"
>   INHERIT += 'image-buildinfo'
>   IMAGE_BUILDINFO_VARS:append = ' IMAGE_BASENAME IMAGE_NAME'
>   PACKAGE_CLASSES = 'package_ipk package_rpm package_deb'
>   IMAGE_ROOTFS_EXTRA_SPACE:append = '${@bb.utils.contains('IMAGE_FEATURES', 'package-management', ' + 262144', '', d)}'
>   IMAGE_INSTALL:append = ' ssh-pregen-hostkeys'
>   SANITY_TESTED_DISTROS = ''
>   OE_FRAGMENTS += 'core/yocto-autobuilder/autobuilder core/yocto-autobuilder/autobuilder-resource-constraints'
>   # (Plus my own sstate cache and downloads dir).
> - bitbake core-image-sato
> - bitbake core-image-sato:do_testimage
>
> Of course using the tag from poky-ci-archive is not mandatory, you
> should see the same behaviour with master branch and your patches.
>
> I saw some oe-selftest was also failing on the autobuilder, with an
> error related to emptty. You can reproduce it with:
>
> oe-selftest -r sstatetests.SStateHashSameSigs3.test_sstate_sametune_samesigs

Status report, since it's been a little bit.

Seems this some i386/x86 fault only exposed in this specific environment. I was
unable to reproduce it on Archlinux32. I've traced it back to a SEGFAULT when we
attempt to execute mcookie as the target user. Odd stuff. Delve and gdb aren't
helping me here. Running some more tests to see if I can corner this some other
way.

- Randolph
Randolph Sapp June 6, 2025, 11:49 p.m. UTC | #5
On Wed Jun 4, 2025 at 7:12 PM CDT, Randolph Sapp wrote:
> On Wed May 28, 2025 at 6:53 AM CDT, Mathieu Dubois-Briand wrote:
>> On Wed May 28, 2025 at 12:16 AM CEST, Randolph Sapp wrote:
>
> [snip]
>
>>> Ah, the auth failures persist. Normally this is caused when the user attempts to
>>> login before the user account was created or something unusual like that. I
>>> don't suppose there's any way this could be happening on the test machines?
>>>
>>> I'm not able to reproduce this locally, but my build config is different. I tend
>>> to use oe-core and nodistro instead of poky. They aren't playing around with the
>>> reproducible build date stuff or anything like that are they?
>>>
>>> Well, I mean I did see this on occasion. The first login attempt almost always
>>> failed, but the subsequent attempts were fine. Bumping the attempt count back up
>>> the default (2) was enough to resolve it on my end, but evidently it's not
>>> enough here.
>>>
>>> This is not happening on hardware from what I'm seeing.
>>
>> Hi Randolph,
>>
>> I was able to reproduce it locally, with the following configuration:
>> - https://git.yoctoproject.org/poky-ci-archive on tag
>>   autobuilder.yoctoproject.org/valkyrie/a-full-1630.
>> - template local.conf, with the following modifications:
>>   MACHINE = "qemux86"
>>   DISTRO = "poky-altcfg"
>>   SDKMACHINE = "x86_64"
>>   PACKAGE_CLASSES = "package_rpm package_deb package_ipk"
>>   INHERIT += 'image-buildinfo'
>>   IMAGE_BUILDINFO_VARS:append = ' IMAGE_BASENAME IMAGE_NAME'
>>   PACKAGE_CLASSES = 'package_ipk package_rpm package_deb'
>>   IMAGE_ROOTFS_EXTRA_SPACE:append = '${@bb.utils.contains('IMAGE_FEATURES', 'package-management', ' + 262144', '', d)}'
>>   IMAGE_INSTALL:append = ' ssh-pregen-hostkeys'
>>   SANITY_TESTED_DISTROS = ''
>>   OE_FRAGMENTS += 'core/yocto-autobuilder/autobuilder core/yocto-autobuilder/autobuilder-resource-constraints'
>>   # (Plus my own sstate cache and downloads dir).
>> - bitbake core-image-sato
>> - bitbake core-image-sato:do_testimage
>>
>> Of course using the tag from poky-ci-archive is not mandatory, you
>> should see the same behaviour with master branch and your patches.
>>
>> I saw some oe-selftest was also failing on the autobuilder, with an
>> error related to emptty. You can reproduce it with:
>>
>> oe-selftest -r sstatetests.SStateHashSameSigs3.test_sstate_sametune_samesigs
>
> Status report, since it's been a little bit.
>
> Seems this some i386/x86 fault only exposed in this specific environment. I was
> unable to reproduce it on Archlinux32. I've traced it back to a SEGFAULT when we
> attempt to execute mcookie as the target user. Odd stuff. Delve and gdb aren't
> helping me here. Running some more tests to see if I can corner this some other
> way.
>
> - Randolph

Well, I managed to determine this is specifically an issue after loading libpam
and attempting to authenticate.

Seems like loading both common-auth and common-account before calling
pam_acct_mgmt results in the above described segfault.

There are 3 pam interactions made after the initial auth.
	1. pam_acct_mgmt
	2. pam_set_item
	3. pam_setcred

Loading only common-account reports that pam_acct_mgmt fails with the PAM_SILENT
flag set.

Removing that flag makes the later pam_setcred fail, but allows pam_acct_mgmt
and pam_set_item to work correctly. Removing the call to pam_acct_mgmt
altogether seems to allow pam_setcred to work.

Removing the call to pam_acct_mgmt also allows both common-auth and
common-account to be loaded without triggering the segfault.

Very unusual. Everything points at the obvious "arch specific problems with the
go pam wrapper", but other distributions don't seem to have this issue and
nothing seems particularly wrong with the wrapper itself. Could be a
manifestation of multiple issues though, considering how much the behavior
changes depending on what pam modules are loaded.

Guess I have some reading to do this weekend.

- Randolph
Randolph Sapp June 14, 2025, 1:04 a.m. UTC | #6
On Fri Jun 6, 2025 at 6:49 PM CDT, Randolph Sapp via lists.openembedded.org wrote:
> On Wed Jun 4, 2025 at 7:12 PM CDT, Randolph Sapp wrote:
>> On Wed May 28, 2025 at 6:53 AM CDT, Mathieu Dubois-Briand wrote:
>>> On Wed May 28, 2025 at 12:16 AM CEST, Randolph Sapp wrote:
>>
>> [snip]
>>
>>>> Ah, the auth failures persist. Normally this is caused when the user attempts to
>>>> login before the user account was created or something unusual like that. I
>>>> don't suppose there's any way this could be happening on the test machines?
>>>>
>>>> I'm not able to reproduce this locally, but my build config is different. I tend
>>>> to use oe-core and nodistro instead of poky. They aren't playing around with the
>>>> reproducible build date stuff or anything like that are they?
>>>>
>>>> Well, I mean I did see this on occasion. The first login attempt almost always
>>>> failed, but the subsequent attempts were fine. Bumping the attempt count back up
>>>> the default (2) was enough to resolve it on my end, but evidently it's not
>>>> enough here.
>>>>
>>>> This is not happening on hardware from what I'm seeing.
>>>
>>> Hi Randolph,
>>>
>>> I was able to reproduce it locally, with the following configuration:
>>> - https://git.yoctoproject.org/poky-ci-archive on tag
>>>   autobuilder.yoctoproject.org/valkyrie/a-full-1630.
>>> - template local.conf, with the following modifications:
>>>   MACHINE = "qemux86"
>>>   DISTRO = "poky-altcfg"
>>>   SDKMACHINE = "x86_64"
>>>   PACKAGE_CLASSES = "package_rpm package_deb package_ipk"
>>>   INHERIT += 'image-buildinfo'
>>>   IMAGE_BUILDINFO_VARS:append = ' IMAGE_BASENAME IMAGE_NAME'
>>>   PACKAGE_CLASSES = 'package_ipk package_rpm package_deb'
>>>   IMAGE_ROOTFS_EXTRA_SPACE:append = '${@bb.utils.contains('IMAGE_FEATURES', 'package-management', ' + 262144', '', d)}'
>>>   IMAGE_INSTALL:append = ' ssh-pregen-hostkeys'
>>>   SANITY_TESTED_DISTROS = ''
>>>   OE_FRAGMENTS += 'core/yocto-autobuilder/autobuilder core/yocto-autobuilder/autobuilder-resource-constraints'
>>>   # (Plus my own sstate cache and downloads dir).
>>> - bitbake core-image-sato
>>> - bitbake core-image-sato:do_testimage
>>>
>>> Of course using the tag from poky-ci-archive is not mandatory, you
>>> should see the same behaviour with master branch and your patches.
>>>
>>> I saw some oe-selftest was also failing on the autobuilder, with an
>>> error related to emptty. You can reproduce it with:
>>>
>>> oe-selftest -r sstatetests.SStateHashSameSigs3.test_sstate_sametune_samesigs
>>
>> Status report, since it's been a little bit.
>>
>> Seems this some i386/x86 fault only exposed in this specific environment. I was
>> unable to reproduce it on Archlinux32. I've traced it back to a SEGFAULT when we
>> attempt to execute mcookie as the target user. Odd stuff. Delve and gdb aren't
>> helping me here. Running some more tests to see if I can corner this some other
>> way.
>>
>> - Randolph
>
> Well, I managed to determine this is specifically an issue after loading libpam
> and attempting to authenticate.
>
> Seems like loading both common-auth and common-account before calling
> pam_acct_mgmt results in the above described segfault.
>
> There are 3 pam interactions made after the initial auth.
> 	1. pam_acct_mgmt
> 	2. pam_set_item
> 	3. pam_setcred
>
> Loading only common-account reports that pam_acct_mgmt fails with the PAM_SILENT
> flag set.
>
> Removing that flag makes the later pam_setcred fail, but allows pam_acct_mgmt
> and pam_set_item to work correctly. Removing the call to pam_acct_mgmt
> altogether seems to allow pam_setcred to work.
>
> Removing the call to pam_acct_mgmt also allows both common-auth and
> common-account to be loaded without triggering the segfault.
>
> Very unusual. Everything points at the obvious "arch specific problems with the
> go pam wrapper", but other distributions don't seem to have this issue and
> nothing seems particularly wrong with the wrapper itself. Could be a
> manifestation of multiple issues though, considering how much the behavior
> changes depending on what pam modules are loaded.
>
> Guess I have some reading to do this weekend.
>
> - Randolph

Found out that specifying the noreap option for pam_unix in the account context
fixes things. Suppose go was having issues with the new signal handler being
registered in _unix_run_verify_binary. Still weird that it was only really an
issue on i386/x86, and only in this environment.

This means we can't inherit the common-account, we'll have to require pam_unix
as part of the pam.conf directly. There's also a conditional fork in the
pam_unix password path with another signal handler that we should probably guard
against to prevent issues with selinux configs.

I'm still not entirely satisfied with this debug. Something seems off, but given
go's use of coroutines it would make sense this would cause *some* issue. Still
odd. I'll poke around a bit more.

- Randolph