diff mbox series

gcc: fix buildpaths QA with LTO

Message ID 20260325113951.1278864-1-patrick@stwcx.xyz
State Changes Requested
Headers show
Series gcc: fix buildpaths QA with LTO | expand

Commit Message

Patrick Williams March 25, 2026, 11:39 a.m. UTC
When LTO is enabled, due to a gcc bug[1], the linker needs the same
flags in DEBUG_PREFIX_MAP as the compiler.  Without this the buildpaths
QA failure can occur due to unstripped build directory strings in the
DWARF data.

With GCC 15.2 this can be noticed by setting many meson-built packages,
such as systemd, with:

    EXTRA_OEMESON:append:class-target = " -Db_lto=true"

Add the DEBUG_PREFIX_MAP to the TARGET_LDFLAGS for gcc.

While the lto.inc enables LTO across the whole image, some packages
either manually enable LTO on their own or downstream recipe
maintainers have explicitly set LTO in specific packages, so it is not
sufficient to set this in lto.inc only.

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109805

Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
---
 meta/classes/toolchain/gcc.bbclass | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Patrick Williams March 25, 2026, 11:46 a.m. UTC | #1
On Wed, Mar 25, 2026 at 07:39:51AM -0400, Patrick Williams wrote:
> When LTO is enabled, due to a gcc bug[1], the linker needs the same
> flags in DEBUG_PREFIX_MAP as the compiler.  Without this the buildpaths
> QA failure can occur due to unstripped build directory strings in the
> DWARF data.
> 
> With GCC 15.2 this can be noticed by setting many meson-built packages,
> such as systemd, with:
> 
>     EXTRA_OEMESON:append:class-target = " -Db_lto=true"
> 
> Add the DEBUG_PREFIX_MAP to the TARGET_LDFLAGS for gcc.
> 
> While the lto.inc enables LTO across the whole image, some packages
> either manually enable LTO on their own or downstream recipe
> maintainers have explicitly set LTO in specific packages, so it is not
> sufficient to set this in lto.inc only.
> 
> [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109805
> 
> Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
> ---

A few other mailing list threads also talk about this problem.

One alternative is to re-add it to the TARGET_LDFLAGS in bitbake.conf by
reverting 1797741aad02b8bf429fac4b81e30cdda64b5448 [1], but that seems to
have problems for cgo applications which might be resolved by another
patch[2].  Adding to lto.inc[3] isn't sufficient for the reasons I
mentioned above.

[1]: https://lore.kernel.org/openembedded-core/20260122194959.13457-2-rs@ti.com/#t
[2]: https://lore.kernel.org/openembedded-core/20260127170344.2960247-2-rs@ti.com/
[3]: https://lore.kernel.org/openembedded-core/20260204052638.284617-1-changqing.li@windriver.com/
Richard Purdie March 25, 2026, 12:34 p.m. UTC | #2
On Wed, 2026-03-25 at 07:46 -0400, Patrick Williams via lists.openembedded.org wrote:
> On Wed, Mar 25, 2026 at 07:39:51AM -0400, Patrick Williams wrote:
> > When LTO is enabled, due to a gcc bug[1], the linker needs the same
> > flags in DEBUG_PREFIX_MAP as the compiler.  Without this the buildpaths
> > QA failure can occur due to unstripped build directory strings in the
> > DWARF data.
> > 
> > With GCC 15.2 this can be noticed by setting many meson-built packages,
> > such as systemd, with:
> > 
> >     EXTRA_OEMESON:append:class-target = " -Db_lto=true"
> > 
> > Add the DEBUG_PREFIX_MAP to the TARGET_LDFLAGS for gcc.
> > 
> > While the lto.inc enables LTO across the whole image, some packages
> > either manually enable LTO on their own or downstream recipe
> > maintainers have explicitly set LTO in specific packages, so it is not
> > sufficient to set this in lto.inc only.
> > 
> > [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109805
> > 
> > Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
> > ---
> 
> A few other mailing list threads also talk about this problem.
> 
> One alternative is to re-add it to the TARGET_LDFLAGS in bitbake.conf by
> reverting 1797741aad02b8bf429fac4b81e30cdda64b5448 [1], but that seems to
> have problems for cgo applications which might be resolved by another
> patch[2].  Adding to lto.inc[3] isn't sufficient for the reasons I
> mentioned above.
> 
> [1]: https://lore.kernel.org/openembedded-core/20260122194959.13457-2-rs@ti.com/#t
> [2]: https://lore.kernel.org/openembedded-core/20260127170344.2960247-2-rs@ti.com/
> [3]: https://lore.kernel.org/openembedded-core/20260204052638.284617-1-changqing.li@windriver.com/

We're not seeing the error with OE-Core today in any of our automated
testing. Reading the above, it implies that we should see some kind of
failure with some components auto-selecting it? Something therefore
isn't adding up.

Whilst there is an open gcc bug report, that bug report is basically
asking for specific test cases to debug and fix the issue further.

So I think we need more info on where/when things are breaking since
we're not seeing it out the box.

Cheers,

Richard
Patrick Williams March 25, 2026, 12:44 p.m. UTC | #3
On Wed, Mar 25, 2026 at 12:34:52PM +0000, Richard Purdie wrote:
> > > With GCC 15.2 this can be noticed by setting many meson-built packages,
> > > such as systemd, with:
> > > 
> > >     EXTRA_OEMESON:append:class-target = " -Db_lto=true"
> > > 
...
> 
> We're not seeing the error with OE-Core today in any of our automated
> testing. Reading the above, it implies that we should see some kind of
> failure with some components auto-selecting it? Something therefore
> isn't adding up.

I tried to give recreate instructions here.  OE-Core doesn't actually do
this so you won't see it unless you enable it.

OpenBMC has a large number of meson-built packages which are generally
safe to enable LTO on (but in the past I've ran into issues with other
packages not building with LTO; now with lto.inc I should revisit this).

We have a global enable for all meson packages:
    https://github.com/openbmc/openbmc/blob/47d900012a93d79ab536ca172fc01cd89645a0d8/meta-phosphor/conf/distro/include/phosphor-defaults.inc#L153

When I upgraded to an OE core that had 1797741aad02b8bf429fac4b81e30cdda64b5448,
almost all of our packages started failing with buildpaths.  I was able
to track it down to this problem.  systemd and libpam are two that are
OE or meta-openembedded packages that I saw fail this way, so I gave
systemd as a clear example.  Like I said, we have a lot of packages in
meta-phosphor[1] that we have enabled LTO on and they're all failing
with buildpaths QA failures without a change to pass along
DEBUG_PREFIX_MAP.

If it helps, I can upload the objdump content.  There was DWARF data
being inserted by the GIMPLE intermediate which wasn't being stripped by
the linker that had my build path in it.

[1]: https://github.com/openbmc/openbmc/tree/master/meta-phosphor
Richard Purdie March 25, 2026, 12:53 p.m. UTC | #4
On Wed, 2026-03-25 at 08:44 -0400, Patrick Williams wrote:
> On Wed, Mar 25, 2026 at 12:34:52PM +0000, Richard Purdie wrote:
> > > > With GCC 15.2 this can be noticed by setting many meson-built packages,
> > > > such as systemd, with:
> > > > 
> > > >     EXTRA_OEMESON:append:class-target = " -Db_lto=true"
> > > > 
> ...
> > 
> > We're not seeing the error with OE-Core today in any of our automated
> > testing. Reading the above, it implies that we should see some kind of
> > failure with some components auto-selecting it? Something therefore
> > isn't adding up.
> 
> I tried to give recreate instructions here.  OE-Core doesn't actually do
> this so you won't see it unless you enable it.
> 
> OpenBMC has a large number of meson-built packages which are generally
> safe to enable LTO on (but in the past I've ran into issues with other
> packages not building with LTO; now with lto.inc I should revisit this).
> 
> We have a global enable for all meson packages:
>     https://github.com/openbmc/openbmc/blob/47d900012a93d79ab536ca172fc01cd89645a0d8/meta-phosphor/conf/distro/include/phosphor-defaults.inc#L153
> 
> When I upgraded to an OE core that had 1797741aad02b8bf429fac4b81e30cdda64b5448,
> almost all of our packages started failing with buildpaths.  I was able
> to track it down to this problem.  systemd and libpam are two that are
> OE or meta-openembedded packages that I saw fail this way, so I gave
> systemd as a clear example.  Like I said, we have a lot of packages in
> meta-phosphor[1] that we have enabled LTO on and they're all failing
> with buildpaths QA failures without a change to pass along
> DEBUG_PREFIX_MAP.
> 
> If it helps, I can upload the objdump content.  There was DWARF data
> being inserted by the GIMPLE intermediate which wasn't being stripped by
> the linker that had my build path in it.
> 
> [1]: https://github.com/openbmc/openbmc/tree/master/meta-phosphor

My question still stands though - why don't we see this with systemd or
libpam in OE-Core?

You're asserting we need to add this to our default config as the
output is broken. We don't see those failures anywhere in our testing
(which definitely includes systemd).

Are you passing specific config to systemd or libpam to enable lto?

I believe that if this breaks when you set certain configs, these
workaround belong with those configs.

Cheers,

Richard
Patrick Williams March 25, 2026, 1:09 p.m. UTC | #5
On Wed, Mar 25, 2026 at 12:53:28PM +0000, Richard Purdie wrote:
> On Wed, 2026-03-25 at 08:44 -0400, Patrick Williams wrote:
> > On Wed, Mar 25, 2026 at 12:34:52PM +0000, Richard Purdie wrote:
> > > > > With GCC 15.2 this can be noticed by setting many meson-built packages,
> > > > > such as systemd, with:
> > > > > 
> > > > >     EXTRA_OEMESON:append:class-target = " -Db_lto=true"
> > > > > 
> > ...
> > > 
> > > We're not seeing the error with OE-Core today in any of our automated
> > > testing. Reading the above, it implies that we should see some kind of
> > > failure with some components auto-selecting it? Something therefore
> > > isn't adding up.
> > 
> > I tried to give recreate instructions here.  OE-Core doesn't actually do
> > this so you won't see it unless you enable it.
> > 
> > OpenBMC has a large number of meson-built packages which are generally
> > safe to enable LTO on (but in the past I've ran into issues with other
> > packages not building with LTO; now with lto.inc I should revisit this).
> > 
> > We have a global enable for all meson packages:
> >     https://github.com/openbmc/openbmc/blob/47d900012a93d79ab536ca172fc01cd89645a0d8/meta-phosphor/conf/distro/include/phosphor-defaults.inc#L153
> > 
> > When I upgraded to an OE core that had 1797741aad02b8bf429fac4b81e30cdda64b5448,
> > almost all of our packages started failing with buildpaths.  I was able
> > to track it down to this problem.  systemd and libpam are two that are
> > OE or meta-openembedded packages that I saw fail this way, so I gave
> > systemd as a clear example.  Like I said, we have a lot of packages in
> > meta-phosphor[1] that we have enabled LTO on and they're all failing
> > with buildpaths QA failures without a change to pass along
> > DEBUG_PREFIX_MAP.
> > 
> > If it helps, I can upload the objdump content.  There was DWARF data
> > being inserted by the GIMPLE intermediate which wasn't being stripped by
> > the linker that had my build path in it.
> > 
> > [1]: https://github.com/openbmc/openbmc/tree/master/meta-phosphor
> 
> My question still stands though - why don't we see this with systemd or
> libpam in OE-Core?

Because you do not enable LTO on those.

> You're asserting we need to add this to our default config as the
> output is broken. We don't see those failures anywhere in our testing
> (which definitely includes systemd).
> 
> Are you passing specific config to systemd or libpam to enable lto?

Yes, via a global EXTRA_OEMESON I pointed to above.  I only referred to
those because they are packages you have that can easily show this
symptom.

> I believe that if this breaks when you set certain configs, these
> workaround belong with those configs.

Ok.  We'll just keep it downstream if that's what you want.

I pointed to [1].  I do think this should at least be included in lto.inc
because otherwise lto.inc is similarly broke for everyone.  I'm guessing
you don't have an upstream test that enables lto.inc (yet).

I will argue it is a really bad experience for people though to _not_
have this in the default config.  It took me about 4 hours to track down
what was causing buildpath issues in previously working packages when
upgrading the OE-Core base, including tracking down the underlying gcc
bug.

[1]: https://lore.kernel.org/openembedded-core/20260204052638.284617-1-changqing.li@windriver.com/
Richard Purdie March 25, 2026, 2:25 p.m. UTC | #6
On Wed, 2026-03-25 at 09:09 -0400, Patrick Williams wrote:
> On Wed, Mar 25, 2026 at 12:53:28PM +0000, Richard Purdie wrote:
> > On Wed, 2026-03-25 at 08:44 -0400, Patrick Williams wrote:
> > > On Wed, Mar 25, 2026 at 12:34:52PM +0000, Richard Purdie wrote:
> > > > > > With GCC 15.2 this can be noticed by setting many meson-built packages,
> > > > > > such as systemd, with:
> > > > > > 
> > > > > >     EXTRA_OEMESON:append:class-target = " -Db_lto=true"
> > > > > > 
> > > ...
> > > > 
> > > > We're not seeing the error with OE-Core today in any of our automated
> > > > testing. Reading the above, it implies that we should see some kind of
> > > > failure with some components auto-selecting it? Something therefore
> > > > isn't adding up.
> > > 
> > > I tried to give recreate instructions here.  OE-Core doesn't actually do
> > > this so you won't see it unless you enable it.
> > > 
> > > OpenBMC has a large number of meson-built packages which are generally
> > > safe to enable LTO on (but in the past I've ran into issues with other
> > > packages not building with LTO; now with lto.inc I should revisit this).
> > > 
> > > We have a global enable for all meson packages:
> > >     https://github.com/openbmc/openbmc/blob/47d900012a93d79ab536ca172fc01cd89645a0d8/meta-phosphor/conf/distro/include/phosphor-defaults.inc#L153
> > > 
> > > When I upgraded to an OE core that had 1797741aad02b8bf429fac4b81e30cdda64b5448,
> > > almost all of our packages started failing with buildpaths.  I was able
> > > to track it down to this problem.  systemd and libpam are two that are
> > > OE or meta-openembedded packages that I saw fail this way, so I gave
> > > systemd as a clear example.  Like I said, we have a lot of packages in
> > > meta-phosphor[1] that we have enabled LTO on and they're all failing
> > > with buildpaths QA failures without a change to pass along
> > > DEBUG_PREFIX_MAP.
> > > 
> > > If it helps, I can upload the objdump content.  There was DWARF data
> > > being inserted by the GIMPLE intermediate which wasn't being stripped by
> > > the linker that had my build path in it.
> > > 
> > > [1]: https://github.com/openbmc/openbmc/tree/master/meta-phosphor
> > 
> > My question still stands though - why don't we see this with systemd or
> > libpam in OE-Core?
> 
> Because you do not enable LTO on those.
> 
> > You're asserting we need to add this to our default config as the
> > output is broken. We don't see those failures anywhere in our testing
> > (which definitely includes systemd).
> > 
> > Are you passing specific config to systemd or libpam to enable lto?
> 
> Yes, via a global EXTRA_OEMESON I pointed to above.  I only referred to
> those because they are packages you have that can easily show this
> symptom.

Ok, now I understand. Thanks for making that clear.

> > I believe that if this breaks when you set certain configs, these
> > workaround belong with those configs.
> 
> Ok.  We'll just keep it downstream if that's what you want.
> 
> I pointed to [1].  I do think this should at least be included in lto.inc
> because otherwise lto.inc is similarly broke for everyone.  I'm guessing
> you don't have an upstream test that enables lto.inc (yet).

Correct, unfortunately we don't. Adding one would mean having people
willing/able to fix bugs in that too though.

> I will argue it is a really bad experience for people though to _not_
> have this in the default config.  It took me about 4 hours to track down
> what was causing buildpath issues in previously working packages when
> upgrading the OE-Core base, including tracking down the underlying gcc
> bug.
> 
> [1]: https://lore.kernel.org/openembedded-core/20260204052638.284617-1-changqing.li@windriver.com/

I agree it should be in the lto config and I am happy to take that bit.
Equally, if we just bandaid this everywhere, people will continue to
ignore the issue and the bug will remain unsolved.

Having to pass huge linking commands around when they're not supposed
to be needed is pretty poor as well and breaks other things like go, as
you noticed.

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/classes/toolchain/gcc.bbclass b/meta/classes/toolchain/gcc.bbclass
index 0ed49ba892..2df5281ee2 100644
--- a/meta/classes/toolchain/gcc.bbclass
+++ b/meta/classes/toolchain/gcc.bbclass
@@ -32,4 +32,9 @@  PREFERRED_PROVIDER_virtual/nativesdk-compilerlibs:class-cross-canadian = "native
 
 DEBUG_PREFIX_MAP_EXTRA = "-fcanon-prefix-map"
 
+# GCC possibly injects build strings when using LTO unless DEBUG_PREFIX_MAP
+# flags are given:
+#   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109805
+TARGET_LDFLAGS:append:class-target = " ${DEBUG_PREFIX_MAP}"
+
 TCOVERRIDE = "toolchain-gcc"