diff mbox series

[kirkstone,V2,1/8] pthreads NPTL: lost wakeup fix 2

Message ID 20250612095639.3630518-2-sunilkumar.dora@windriver.com
State Changes Requested
Delegated to: Steve Sakoman
Headers show
Series glibc: Fix lost wakeup in pthread_cond_wait (PR25847) | expand

Commit Message

Dora, Sunil Kumar June 12, 2025, 9:56 a.m. UTC
From: Sunil Dora <sunilkumar.dora@windriver.com>

The following commits have been cherry-picked from Glibc master branch:
Bug : https://sourceware.org/bugzilla/show_bug.cgi?id=25847

Upstream-Status: Backport [1db84775f831a1494993ce9c118deaf9537cc50a]

Signed-off-by: Sunil Dora <sunilkumar.dora@windriver.com>
---
 .../glibc/glibc/0026-PR25847-1.patch          | 453 ++++++++++++++++++
 meta/recipes-core/glibc/glibc_2.35.bb         |   1 +
 2 files changed, 454 insertions(+)
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-1.patch

Comments

Steve Sakoman June 12, 2025, 4:04 p.m. UTC | #1
Hi Sunil,

Could you please send a V2 with:

1. A cover letter explaining the reason for this patch series and an
overview of what is being changed
2. A link to the upstream commit being backported in the
Upstream-status lines (rather than just a commit id)

Thanks!

Steve

On Thu, Jun 12, 2025 at 2:57 AM Dora, Sunil Kumar via
lists.openembedded.org
<SunilKumar.Dora=windriver.com@lists.openembedded.org> wrote:
>
> From: Sunil Dora <sunilkumar.dora@windriver.com>
>
> The following commits have been cherry-picked from Glibc master branch:
> Bug : https://sourceware.org/bugzilla/show_bug.cgi?id=25847
>
> Upstream-Status: Backport [1db84775f831a1494993ce9c118deaf9537cc50a]
>
> Signed-off-by: Sunil Dora <sunilkumar.dora@windriver.com>
> ---
>  .../glibc/glibc/0026-PR25847-1.patch          | 453 ++++++++++++++++++
>  meta/recipes-core/glibc/glibc_2.35.bb         |   1 +
>  2 files changed, 454 insertions(+)
>  create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
>
> diff --git a/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch b/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
> new file mode 100644
> index 0000000000..67a6fa1e62
> --- /dev/null
> +++ b/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
> @@ -0,0 +1,453 @@
> +From 6f030a8faf242b897e714cb77c65254b14682cfc Mon Sep 17 00:00:00 2001
> +From: Frank Barrus <frankbarrus_sw@shaggy.cc>
> +Date: Mon, 9 Jun 2025 02:32:52 -0700
> +Subject: [PATCH] pthreads NPTL: lost wakeup fix 2
> +
> +This fixes the lost wakeup (from a bug in signal stealing) with a change
> +in the usage of g_signals[] in the condition variable internal state.
> +It also completely eliminates the concept and handling of signal stealing,
> +as well as the need for signalers to block to wait for waiters to wake
> +up every time there is a G1/G2 switch.  This greatly reduces the average
> +and maximum latency for pthread_cond_signal.
> +
> +The g_signals[] field now contains a signal count that is relative to
> +the current g1_start value.  Since it is a 32-bit field, and the LSB is
> +still reserved (though not currently used anymore), it has a 31-bit value
> +that corresponds to the low 31 bits of the sequence number in g1_start.
> +(since g1_start also has an LSB flag, this means bits 31:1 in g_signals
> +correspond to bits 31:1 in g1_start, plus the current signal count)
> +
> +By making the signal count relative to g1_start, there is no longer
> +any ambiguity or A/B/A issue, and thus any checks before blocking,
> +including the futex call itself, are guaranteed not to block if the G1/G2
> +switch occurs, even if the signal count remains the same.  This allows
> +initially safely blocking in G2 until the switch to G1 occurs, and
> +then transitioning from G1 to a new G1 or G2, and always being able to
> +distinguish the state change.  This removes the race condition and A/B/A
> +problems that otherwise ocurred if a late (pre-empted) waiter were to
> +resume just as the futex call attempted to block on g_signal since
> +otherwise there was no last opportunity to re-check things like whether
> +the current G1 group was already closed.
> +
> +By fixing these issues, the signal stealing code can be eliminated,
> +since there is no concept of signal stealing anymore.  The code to block
> +for all waiters to exit g_refs can also be removed, since any waiters
> +that are still in the g_refs region can be guaranteed to safely wake
> +up and exit.  If there are still any left at this time, they are all
> +sent one final futex wakeup to ensure that they are not blocked any
> +longer, but there is no need for the signaller to block and wait for
> +them to wake up and exit the g_refs region.
> +
> +The signal count is then effectively "zeroed" but since it is now
> +relative to g1_start, this is done by advancing it to a new value that
> +can be observed by any pending blocking waiters.  Any late waiters can
> +always tell the difference, and can thus just cleanly exit if they are
> +in a stale G1 or G2.  They can never steal a signal from the current
> +G1 if they are not in the current G1, since the signal value that has
> +to match in the cmpxchg has the low 31 bits of the g1_start value
> +contained in it, and that's first checked, and then it won't match if
> +there's a G1/G2 change.
> +
> +Note: the 31-bit sequence number used in g_signals is designed to
> +handle wrap-around when checking the signal count, but if the entire
> +31-bit wraparound (2 billion signals) occurs while there is still a
> +late waiter that has not yet resumed, and it happens to then match
> +the current g1_start low bits, and the pre-emption occurs after the
> +normal "closed group" checks (which are 64-bit) but then hits the
> +futex syscall and signal consuming code, then an A/B/A issue could
> +still result and cause an incorrect assumption about whether it
> +should block.  This particular scenario seems unlikely in practice.
> +Note that once awake from the futex, the waiter would notice the
> +closed group before consuming the signal (since that's still a 64-bit
> +check that would not be aliased in the wrap-around in g_signals),
> +so the biggest impact would be blocking on the futex until the next
> +full wakeup from a G1/G2 switch.
> +
> +The following commits have been cherry-picked from Glibc master branch:
> +Bug : https://sourceware.org/bugzilla/show_bug.cgi?id=25847
> +
> +Upstream-Status: Backport [1db84775f831a1494993ce9c118deaf9537cc50a]
> +
> +Signed-off-by: Sunil Dora <sunilkumar.dora@windriver.com>
> +---
> + nptl/pthread_cond_common.c | 105 +++++++++------------------
> + nptl/pthread_cond_wait.c   | 144 ++++++++++++-------------------------
> + 2 files changed, 81 insertions(+), 168 deletions(-)
> +
> +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
> +index fb035f72..a55eee3e 100644
> +--- a/nptl/pthread_cond_common.c
> ++++ b/nptl/pthread_cond_common.c
> +@@ -201,7 +201,6 @@ static bool __attribute__ ((unused))
> + __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
> +     unsigned int *g1index, int private)
> + {
> +-  const unsigned int maxspin = 0;
> +   unsigned int g1 = *g1index;
> +
> +   /* If there is no waiter in G2, we don't do anything.  The expression may
> +@@ -222,84 +221,46 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
> +      * New waiters arriving concurrently with the group switching will all go
> +        into G2 until we atomically make the switch.  Waiters existing in G2
> +        are not affected.
> +-     * Waiters in G1 will be closed out immediately by setting a flag in
> +-       __g_signals, which will prevent waiters from blocking using a futex on
> +-       __g_signals and also notifies them that the group is closed.  As a
> +-       result, they will eventually remove their group reference, allowing us
> +-       to close switch group roles.  */
> +-
> +-  /* First, set the closed flag on __g_signals.  This tells waiters that are
> +-     about to wait that they shouldn't do that anymore.  This basically
> +-     serves as an advance notificaton of the upcoming change to __g1_start;
> +-     waiters interpret it as if __g1_start was larger than their waiter
> +-     sequence position.  This allows us to change __g1_start after waiting
> +-     for all existing waiters with group references to leave, which in turn
> +-     makes recovery after stealing a signal simpler because it then can be
> +-     skipped if __g1_start indicates that the group is closed (otherwise,
> +-     we would have to recover always because waiters don't know how big their
> +-     groups are).  Relaxed MO is fine.  */
> +-  atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1);
> +-
> +-  /* Wait until there are no group references anymore.  The fetch-or operation
> +-     injects us into the modification order of __g_refs; release MO ensures
> +-     that waiters incrementing __g_refs after our fetch-or see the previous
> +-     changes to __g_signals and to __g1_start that had to happen before we can
> +-     switch this G1 and alias with an older group (we have two groups, so
> +-     aliasing requires switching group roles twice).  Note that nobody else
> +-     can have set the wake-request flag, so we do not have to act upon it.
> +-
> +-     Also note that it is harmless if older waiters or waiters from this G1
> +-     get a group reference after we have quiesced the group because it will
> +-     remain closed for them either because of the closed flag in __g_signals
> +-     or the later update to __g1_start.  New waiters will never arrive here
> +-     but instead continue to go into the still current G2.  */
> +-  unsigned r = atomic_fetch_or_release (cond->__data.__g_refs + g1, 0);
> +-  while ((r >> 1) > 0)
> +-    {
> +-      for (unsigned int spin = maxspin; ((r >> 1) > 0) && (spin > 0); spin--)
> +-      {
> +-        /* TODO Back off.  */
> +-        r = atomic_load_relaxed (cond->__data.__g_refs + g1);
> +-      }
> +-      if ((r >> 1) > 0)
> +-      {
> +-        /* There is still a waiter after spinning.  Set the wake-request
> +-           flag and block.  Relaxed MO is fine because this is just about
> +-           this futex word.
> +-
> +-           Update r to include the set wake-request flag so that the upcoming
> +-           futex_wait only blocks if the flag is still set (otherwise, we'd
> +-           violate the basic client-side futex protocol).  */
> +-        r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1) | 1;
> +-
> +-        if ((r >> 1) > 0)
> +-          futex_wait_simple (cond->__data.__g_refs + g1, r, private);
> +-        /* Reload here so we eventually see the most recent value even if we
> +-           do not spin.   */
> +-        r = atomic_load_relaxed (cond->__data.__g_refs + g1);
> +-      }
> +-    }
> +-  /* Acquire MO so that we synchronize with the release operation that waiters
> +-     use to decrement __g_refs and thus happen after the waiters we waited
> +-     for.  */
> +-  atomic_thread_fence_acquire ();
> ++     * Waiters in G1 will be closed out immediately by the advancing of
> ++       __g_signals to the next "lowseq" (low 31 bits of the new g1_start),
> ++       which will prevent waiters from blocking using a futex on
> ++       __g_signals since it provides enough signals for all possible
> ++       remaining waiters.  As a result, they can each consume a signal
> ++       and they will eventually remove their group reference.  */
> +
> +   /* Update __g1_start, which finishes closing this group.  The value we add
> +      will never be negative because old_orig_size can only be zero when we
> +      switch groups the first time after a condvar was initialized, in which
> +-     case G1 will be at index 1 and we will add a value of 1.  See above for
> +-     why this takes place after waiting for quiescence of the group.
> ++     case G1 will be at index 1 and we will add a value of 1.
> +      Relaxed MO is fine because the change comes with no additional
> +      constraints that others would have to observe.  */
> +   __condvar_add_g1_start_relaxed (cond,
> +       (old_orig_size << 1) + (g1 == 1 ? 1 : - 1));
> +
> +-  /* Now reopen the group, thus enabling waiters to again block using the
> +-     futex controlled by __g_signals.  Release MO so that observers that see
> +-     no signals (and thus can block) also see the write __g1_start and thus
> +-     that this is now a new group (see __pthread_cond_wait_common for the
> +-     matching acquire MO loads).  */
> +-  atomic_store_release (cond->__data.__g_signals + g1, 0);
> ++  unsigned int lowseq = ((old_g1_start + old_orig_size) << 1) & ~1U;
> ++
> ++  /* If any waiters still hold group references (and thus could be blocked),
> ++     then wake them all up now and prevent any running ones from blocking.
> ++     This is effectively a catch-all for any possible current or future
> ++     bugs that can allow the group size to reach 0 before all G1 waiters
> ++     have been awakened or at least given signals to consume, or any
> ++     other case that can leave blocked (or about to block) older waiters..  */
> ++  if ((atomic_fetch_or_release (cond->__data.__g_refs + g1, 0) >> 1) > 0)
> ++   {
> ++    /* First advance signals to the end of the group (i.e. enough signals
> ++       for the entire G1 group) to ensure that waiters which have not
> ++       yet blocked in the futex will not block.
> ++       Note that in the vast majority of cases, this should never
> ++       actually be necessary, since __g_signals will have enough
> ++       signals for the remaining g_refs waiters.  As an optimization,
> ++       we could check this first before proceeding, although that
> ++       could still leave the potential for futex lost wakeup bugs
> ++       if the signal count was non-zero but the futex wakeup
> ++       was somehow lost.  */
> ++    atomic_store_release (cond->__data.__g_signals + g1, lowseq);
> ++
> ++    futex_wake (cond->__data.__g_signals + g1, INT_MAX, private);
> ++   }
> +
> +   /* At this point, the old G1 is now a valid new G2 (but not in use yet).
> +      No old waiter can neither grab a signal nor acquire a reference without
> +@@ -311,6 +272,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
> +   g1 ^= 1;
> +   *g1index ^= 1;
> +
> ++  /* Now advance the new G1 g_signals to the new lowseq, giving it
> ++     an effective signal count of 0 to start.  */
> ++  atomic_store_release (cond->__data.__g_signals + g1, lowseq);
> ++
> +   /* These values are just observed by signalers, and thus protected by the
> +      lock.  */
> +   unsigned int orig_size = wseq - (old_g1_start + old_orig_size);
> +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
> +index 20c348a5..1cb3dbf7 100644
> +--- a/nptl/pthread_cond_wait.c
> ++++ b/nptl/pthread_cond_wait.c
> +@@ -238,9 +238,7 @@ __condvar_cleanup_waiting (void *arg)
> +    signaled), and a reference count.
> +
> +    The group reference count is used to maintain the number of waiters that
> +-   are using the group's futex.  Before a group can change its role, the
> +-   reference count must show that no waiters are using the futex anymore; this
> +-   prevents ABA issues on the futex word.
> ++   are using the group's futex.
> +
> +    To represent which intervals in the waiter sequence the groups cover (and
> +    thus also which group slot contains G1 or G2), we use a 64b counter to
> +@@ -300,11 +298,12 @@ __condvar_cleanup_waiting (void *arg)
> +        last reference.
> +      * Reference count used by waiters concurrently with signalers that have
> +        acquired the condvar-internal lock.
> +-   __g_signals: The number of signals that can still be consumed.
> ++   __g_signals: The number of signals that can still be consumed, relative to
> ++     the current g1_start.  (i.e. bits 31 to 1 of __g_signals are bits
> ++     31 to 1 of g1_start with the signal count added)
> +      * Used as a futex word by waiters.  Used concurrently by waiters and
> +        signalers.
> +-     * LSB is true iff this group has been completely signaled (i.e., it is
> +-       closed).
> ++     * LSB is currently reserved and 0.
> +    __g_size: Waiters remaining in this group (i.e., which have not been
> +      signaled yet.
> +      * Accessed by signalers and waiters that cancel waiting (both do so only
> +@@ -328,18 +327,6 @@ __condvar_cleanup_waiting (void *arg)
> +    sufficient because if a waiter can see a sufficiently large value, it could
> +    have also consume a signal in the waiters group.
> +
> +-   Waiters try to grab a signal from __g_signals without holding a reference
> +-   count, which can lead to stealing a signal from a more recent group after
> +-   their own group was already closed.  They cannot always detect whether they
> +-   in fact did because they do not know when they stole, but they can
> +-   conservatively add a signal back to the group they stole from; if they
> +-   did so unnecessarily, all that happens is a spurious wake-up.  To make this
> +-   even less likely, __g1_start contains the index of the current g2 too,
> +-   which allows waiters to check if there aliasing on the group slots; if
> +-   there wasn't, they didn't steal from the current G1, which means that the
> +-   G1 they stole from must have been already closed and they do not need to
> +-   fix anything.
> +-
> +    It is essential that the last field in pthread_cond_t is __g_signals[1]:
> +    The previous condvar used a pointer-sized field in pthread_cond_t, so a
> +    PTHREAD_COND_INITIALIZER from that condvar implementation might only
> +@@ -435,6 +422,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +     {
> +       while (1)
> +       {
> ++          uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
> ++          unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
> ++
> +         /* Spin-wait first.
> +            Note that spinning first without checking whether a timeout
> +            passed might lead to what looks like a spurious wake-up even
> +@@ -446,35 +436,45 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +            having to compare against the current time seems to be the right
> +            choice from a performance perspective for most use cases.  */
> +         unsigned int spin = maxspin;
> +-        while (signals == 0 && spin > 0)
> ++        while (spin > 0 && ((int)(signals - lowseq) < 2))
> +           {
> +             /* Check that we are not spinning on a group that's already
> +                closed.  */
> +-            if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
> +-              goto done;
> ++            if (seq < (g1_start >> 1))
> ++              break;
> +
> +             /* TODO Back off.  */
> +
> +             /* Reload signals.  See above for MO.  */
> +             signals = atomic_load_acquire (cond->__data.__g_signals + g);
> ++              g1_start = __condvar_load_g1_start_relaxed (cond);
> ++              lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
> +             spin--;
> +           }
> +
> +-        /* If our group will be closed as indicated by the flag on signals,
> +-           don't bother grabbing a signal.  */
> +-        if (signals & 1)
> +-          goto done;
> +-
> +-        /* If there is an available signal, don't block.  */
> +-        if (signals != 0)
> ++          if (seq < (g1_start >> 1))
> ++          {
> ++              /* If the group is closed already,
> ++               then this waiter originally had enough extra signals to
> ++               consume, up until the time its group was closed.  */
> ++             goto done;
> ++            }
> ++
> ++        /* If there is an available signal, don't block.
> ++             If __g1_start has advanced at all, then we must be in G1
> ++           by now, perhaps in the process of switching back to an older
> ++           G2, but in either case we're allowed to consume the available
> ++           signal and should not block anymore.  */
> ++        if ((int)(signals - lowseq) >= 2)
> +           break;
> +
> +         /* No signals available after spinning, so prepare to block.
> +            We first acquire a group reference and use acquire MO for that so
> +            that we synchronize with the dummy read-modify-write in
> +            __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
> +-           in this case this will make us see the closed flag on __g_signals
> +-           that designates a concurrent attempt to reuse the group's slot.
> ++           in this case this will make us see the advancement of __g_signals
> ++           to the upcoming new g1_start that occurs with a concurrent
> ++           attempt to reuse the group's slot.
> +            We use acquire MO for the __g_signals check to make the
> +            __g1_start check work (see spinning above).
> +            Note that the group reference acquisition will not mask the
> +@@ -482,15 +482,24 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +            an atomic read-modify-write operation and thus extend the release
> +            sequence.  */
> +         atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
> +-        if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
> +-            || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
> ++        signals = atomic_load_acquire (cond->__data.__g_signals + g);
> ++          g1_start = __condvar_load_g1_start_relaxed (cond);
> ++          lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
> ++
> ++          if (seq < (g1_start >> 1))
> +           {
> +-            /* Our group is closed.  Wake up any signalers that might be
> +-               waiting.  */
> ++              /* group is closed already, so don't block */
> +             __condvar_dec_grefs (cond, g, private);
> +             goto done;
> +           }
> +
> ++        if ((int)(signals - lowseq) >= 2)
> ++          {
> ++            /* a signal showed up or G1/G2 switched after we grabbed the refcount */
> ++            __condvar_dec_grefs (cond, g, private);
> ++            break;
> ++            }
> ++
> +         // Now block.
> +         struct _pthread_cleanup_buffer buffer;
> +         struct _condvar_cleanup_buffer cbuffer;
> +@@ -501,7 +510,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +         __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer);
> +
> +         err = __futex_abstimed_wait_cancelable64 (
> +-          cond->__data.__g_signals + g, 0, clockid, abstime, private);
> ++          cond->__data.__g_signals + g, signals, clockid, abstime, private);
> +
> +         __pthread_cleanup_pop (&buffer, 0);
> +
> +@@ -524,6 +533,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +         signals = atomic_load_acquire (cond->__data.__g_signals + g);
> +       }
> +
> ++       if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
> ++       goto done;
> +     }
> +   /* Try to grab a signal.  Use acquire MO so that we see an up-to-date value
> +      of __g1_start below (see spinning above for a similar case).  In
> +@@ -532,69 +543,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
> +                                               &signals, signals - 2));
> +
> +-  /* We consumed a signal but we could have consumed from a more recent group
> +-     that aliased with ours due to being in the same group slot.  If this
> +-     might be the case our group must be closed as visible through
> +-     __g1_start.  */
> +-  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
> +-  if (seq < (g1_start >> 1))
> +-    {
> +-      /* We potentially stole a signal from a more recent group but we do not
> +-       know which group we really consumed from.
> +-       We do not care about groups older than current G1 because they are
> +-       closed; we could have stolen from these, but then we just add a
> +-       spurious wake-up for the current groups.
> +-       We will never steal a signal from current G2 that was really intended
> +-       for G2 because G2 never receives signals (until it becomes G1).  We
> +-       could have stolen a signal from G2 that was conservatively added by a
> +-       previous waiter that also thought it stole a signal -- but given that
> +-       that signal was added unnecessarily, it's not a problem if we steal
> +-       it.
> +-       Thus, the remaining case is that we could have stolen from the current
> +-       G1, where "current" means the __g1_start value we observed.  However,
> +-       if the current G1 does not have the same slot index as we do, we did
> +-       not steal from it and do not need to undo that.  This is the reason
> +-       for putting a bit with G2's index into__g1_start as well.  */
> +-      if (((g1_start & 1) ^ 1) == g)
> +-      {
> +-        /* We have to conservatively undo our potential mistake of stealing
> +-           a signal.  We can stop trying to do that when the current G1
> +-           changes because other spinning waiters will notice this too and
> +-           __condvar_quiesce_and_switch_g1 has checked that there are no
> +-           futex waiters anymore before switching G1.
> +-           Relaxed MO is fine for the __g1_start load because we need to
> +-           merely be able to observe this fact and not have to observe
> +-           something else as well.
> +-           ??? Would it help to spin for a little while to see whether the
> +-           current G1 gets closed?  This might be worthwhile if the group is
> +-           small or close to being closed.  */
> +-        unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
> +-        while (__condvar_load_g1_start_relaxed (cond) == g1_start)
> +-          {
> +-            /* Try to add a signal.  We don't need to acquire the lock
> +-               because at worst we can cause a spurious wake-up.  If the
> +-               group is in the process of being closed (LSB is true), this
> +-               has an effect similar to us adding a signal.  */
> +-            if (((s & 1) != 0)
> +-                || atomic_compare_exchange_weak_relaxed
> +-                     (cond->__data.__g_signals + g, &s, s + 2))
> +-              {
> +-                /* If we added a signal, we also need to add a wake-up on
> +-                   the futex.  We also need to do that if we skipped adding
> +-                   a signal because the group is being closed because
> +-                   while __condvar_quiesce_and_switch_g1 could have closed
> +-                   the group, it might stil be waiting for futex waiters to
> +-                   leave (and one of those waiters might be the one we stole
> +-                   the signal from, which cause it to block using the
> +-                   futex).  */
> +-                futex_wake (cond->__data.__g_signals + g, 1, private);
> +-                break;
> +-              }
> +-            /* TODO Back off.  */
> +-          }
> +-      }
> +-    }
> +-
> +  done:
> +
> +   /* Confirm that we have been woken.  We do that before acquiring the mutex
> +--
> +2.49.0
> +
> diff --git a/meta/recipes-core/glibc/glibc_2.35.bb b/meta/recipes-core/glibc/glibc_2.35.bb
> index 9073e04537..2d23c17a48 100644
> --- a/meta/recipes-core/glibc/glibc_2.35.bb
> +++ b/meta/recipes-core/glibc/glibc_2.35.bb
> @@ -61,6 +61,7 @@ SRC_URI =  "${GLIBC_GIT_URI};branch=${SRCBRANCH};name=glibc \
>             file://0022-sysdeps-gnu-configure.ac-Set-libc_cv_rootsbindir-onl.patch \
>             file://0023-timezone-Make-shell-interpreter-overridable-in-tzsel.patch \
>             file://0024-fix-create-thread-failed-in-unprivileged-process-BZ-.patch \
> +           file://0026-PR25847-1.patch \
>             \
>             file://0001-Revert-Linux-Implement-a-useful-version-of-_startup_.patch \
>             file://0002-get_nscd_addresses-Fix-subscript-typos-BZ-29605.patch \
> --
> 2.49.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#218493): https://lists.openembedded.org/g/openembedded-core/message/218493
> Mute This Topic: https://lists.openembedded.org/mt/113603744/3620601
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [steve@sakoman.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Dora, Sunil Kumar June 12, 2025, 6:55 p.m. UTC | #2
Hi Steve,

Regarding the cover letter: I had included a cover letter in the V2 series [1] that provided an overview and reason for the backport.
It might have been overlooked? I will resend the cover letter in the V3 series as well, and I'll make sure it is clear and includes the required information.

( https://lists.openembedded.org/g/openembedded-core/topic/kirkstone_patch_v2_0_8/113603743 )

For the upstream commit links: I will update each patch to include the full upstream commit link in the Upstream-status line in the V3 series.
Additionally, as per Gyorgy's feedback on the V2 series, I will add the "glibc:" prefix to each commit subject line to make the history clearer.
I will send out V3 shortly.

Thanks,
Sunil Dora
________________________________
From: Steve Sakoman <steve@sakoman.com>
Sent: Thursday, June 12, 2025 9:34 PM
To: Dora, Sunil Kumar <SunilKumar.Dora@windriver.com>
Cc: openembedded-core@lists.openembedded.org <openembedded-core@lists.openembedded.org>; MacLeod, Randy <Randy.MacLeod@windriver.com>; Kokkonda, Sundeep <Sundeep.Kokkonda@windriver.com>; =SunilKumar.Dora@windriver.com <=SunilKumar.Dora@windriver.com>
Subject: Re: [OE-core] [kirkstone][PATCH V2 1/8] pthreads NPTL: lost wakeup fix 2

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Sunil,

Could you please send a V2 with:

1. A cover letter explaining the reason for this patch series and an
overview of what is being changed
2. A link to the upstream commit being backported in the
Upstream-status lines (rather than just a commit id)

Thanks!

Steve

On Thu, Jun 12, 2025 at 2:57 AM Dora, Sunil Kumar via
lists.openembedded.org
<SunilKumar.Dora=windriver.com@lists.openembedded.org> wrote:
>
> From: Sunil Dora <sunilkumar.dora@windriver.com>
>
> The following commits have been cherry-picked from Glibc master branch:
> Bug : https://sourceware.org/bugzilla/show_bug.cgi?id=25847
>
> Upstream-Status: Backport [1db84775f831a1494993ce9c118deaf9537cc50a]
>
> Signed-off-by: Sunil Dora <sunilkumar.dora@windriver.com>
> ---
>  .../glibc/glibc/0026-PR25847-1.patch          | 453 ++++++++++++++++++
>  meta/recipes-core/glibc/glibc_2.35.bb         |   1 +
>  2 files changed, 454 insertions(+)
>  create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
>
> diff --git a/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch b/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
> new file mode 100644
> index 0000000000..67a6fa1e62
> --- /dev/null
> +++ b/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
> @@ -0,0 +1,453 @@
> +From 6f030a8faf242b897e714cb77c65254b14682cfc Mon Sep 17 00:00:00 2001
> +From: Frank Barrus <frankbarrus_sw@shaggy.cc>
> +Date: Mon, 9 Jun 2025 02:32:52 -0700
> +Subject: [PATCH] pthreads NPTL: lost wakeup fix 2
> +
> +This fixes the lost wakeup (from a bug in signal stealing) with a change
> +in the usage of g_signals[] in the condition variable internal state.
> +It also completely eliminates the concept and handling of signal stealing,
> +as well as the need for signalers to block to wait for waiters to wake
> +up every time there is a G1/G2 switch.  This greatly reduces the average
> +and maximum latency for pthread_cond_signal.
> +
> +The g_signals[] field now contains a signal count that is relative to
> +the current g1_start value.  Since it is a 32-bit field, and the LSB is
> +still reserved (though not currently used anymore), it has a 31-bit value
> +that corresponds to the low 31 bits of the sequence number in g1_start.
> +(since g1_start also has an LSB flag, this means bits 31:1 in g_signals
> +correspond to bits 31:1 in g1_start, plus the current signal count)
> +
> +By making the signal count relative to g1_start, there is no longer
> +any ambiguity or A/B/A issue, and thus any checks before blocking,
> +including the futex call itself, are guaranteed not to block if the G1/G2
> +switch occurs, even if the signal count remains the same.  This allows
> +initially safely blocking in G2 until the switch to G1 occurs, and
> +then transitioning from G1 to a new G1 or G2, and always being able to
> +distinguish the state change.  This removes the race condition and A/B/A
> +problems that otherwise ocurred if a late (pre-empted) waiter were to
> +resume just as the futex call attempted to block on g_signal since
> +otherwise there was no last opportunity to re-check things like whether
> +the current G1 group was already closed.
> +
> +By fixing these issues, the signal stealing code can be eliminated,
> +since there is no concept of signal stealing anymore.  The code to block
> +for all waiters to exit g_refs can also be removed, since any waiters
> +that are still in the g_refs region can be guaranteed to safely wake
> +up and exit.  If there are still any left at this time, they are all
> +sent one final futex wakeup to ensure that they are not blocked any
> +longer, but there is no need for the signaller to block and wait for
> +them to wake up and exit the g_refs region.
> +
> +The signal count is then effectively "zeroed" but since it is now
> +relative to g1_start, this is done by advancing it to a new value that
> +can be observed by any pending blocking waiters.  Any late waiters can
> +always tell the difference, and can thus just cleanly exit if they are
> +in a stale G1 or G2.  They can never steal a signal from the current
> +G1 if they are not in the current G1, since the signal value that has
> +to match in the cmpxchg has the low 31 bits of the g1_start value
> +contained in it, and that's first checked, and then it won't match if
> +there's a G1/G2 change.
> +
> +Note: the 31-bit sequence number used in g_signals is designed to
> +handle wrap-around when checking the signal count, but if the entire
> +31-bit wraparound (2 billion signals) occurs while there is still a
> +late waiter that has not yet resumed, and it happens to then match
> +the current g1_start low bits, and the pre-emption occurs after the
> +normal "closed group" checks (which are 64-bit) but then hits the
> +futex syscall and signal consuming code, then an A/B/A issue could
> +still result and cause an incorrect assumption about whether it
> +should block.  This particular scenario seems unlikely in practice.
> +Note that once awake from the futex, the waiter would notice the
> +closed group before consuming the signal (since that's still a 64-bit
> +check that would not be aliased in the wrap-around in g_signals),
> +so the biggest impact would be blocking on the futex until the next
> +full wakeup from a G1/G2 switch.
> +
> +The following commits have been cherry-picked from Glibc master branch:
> +Bug : https://sourceware.org/bugzilla/show_bug.cgi?id=25847
> +
> +Upstream-Status: Backport [1db84775f831a1494993ce9c118deaf9537cc50a]
> +
> +Signed-off-by: Sunil Dora <sunilkumar.dora@windriver.com>
> +---
> + nptl/pthread_cond_common.c | 105 +++++++++------------------
> + nptl/pthread_cond_wait.c   | 144 ++++++++++++-------------------------
> + 2 files changed, 81 insertions(+), 168 deletions(-)
> +
> +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
> +index fb035f72..a55eee3e 100644
> +--- a/nptl/pthread_cond_common.c
> ++++ b/nptl/pthread_cond_common.c
> +@@ -201,7 +201,6 @@ static bool __attribute__ ((unused))
> + __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
> +     unsigned int *g1index, int private)
> + {
> +-  const unsigned int maxspin = 0;
> +   unsigned int g1 = *g1index;
> +
> +   /* If there is no waiter in G2, we don't do anything.  The expression may
> +@@ -222,84 +221,46 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
> +      * New waiters arriving concurrently with the group switching will all go
> +        into G2 until we atomically make the switch.  Waiters existing in G2
> +        are not affected.
> +-     * Waiters in G1 will be closed out immediately by setting a flag in
> +-       __g_signals, which will prevent waiters from blocking using a futex on
> +-       __g_signals and also notifies them that the group is closed.  As a
> +-       result, they will eventually remove their group reference, allowing us
> +-       to close switch group roles.  */
> +-
> +-  /* First, set the closed flag on __g_signals.  This tells waiters that are
> +-     about to wait that they shouldn't do that anymore.  This basically
> +-     serves as an advance notificaton of the upcoming change to __g1_start;
> +-     waiters interpret it as if __g1_start was larger than their waiter
> +-     sequence position.  This allows us to change __g1_start after waiting
> +-     for all existing waiters with group references to leave, which in turn
> +-     makes recovery after stealing a signal simpler because it then can be
> +-     skipped if __g1_start indicates that the group is closed (otherwise,
> +-     we would have to recover always because waiters don't know how big their
> +-     groups are).  Relaxed MO is fine.  */
> +-  atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1);
> +-
> +-  /* Wait until there are no group references anymore.  The fetch-or operation
> +-     injects us into the modification order of __g_refs; release MO ensures
> +-     that waiters incrementing __g_refs after our fetch-or see the previous
> +-     changes to __g_signals and to __g1_start that had to happen before we can
> +-     switch this G1 and alias with an older group (we have two groups, so
> +-     aliasing requires switching group roles twice).  Note that nobody else
> +-     can have set the wake-request flag, so we do not have to act upon it.
> +-
> +-     Also note that it is harmless if older waiters or waiters from this G1
> +-     get a group reference after we have quiesced the group because it will
> +-     remain closed for them either because of the closed flag in __g_signals
> +-     or the later update to __g1_start.  New waiters will never arrive here
> +-     but instead continue to go into the still current G2.  */
> +-  unsigned r = atomic_fetch_or_release (cond->__data.__g_refs + g1, 0);
> +-  while ((r >> 1) > 0)
> +-    {
> +-      for (unsigned int spin = maxspin; ((r >> 1) > 0) && (spin > 0); spin--)
> +-      {
> +-        /* TODO Back off.  */
> +-        r = atomic_load_relaxed (cond->__data.__g_refs + g1);
> +-      }
> +-      if ((r >> 1) > 0)
> +-      {
> +-        /* There is still a waiter after spinning.  Set the wake-request
> +-           flag and block.  Relaxed MO is fine because this is just about
> +-           this futex word.
> +-
> +-           Update r to include the set wake-request flag so that the upcoming
> +-           futex_wait only blocks if the flag is still set (otherwise, we'd
> +-           violate the basic client-side futex protocol).  */
> +-        r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1) | 1;
> +-
> +-        if ((r >> 1) > 0)
> +-          futex_wait_simple (cond->__data.__g_refs + g1, r, private);
> +-        /* Reload here so we eventually see the most recent value even if we
> +-           do not spin.   */
> +-        r = atomic_load_relaxed (cond->__data.__g_refs + g1);
> +-      }
> +-    }
> +-  /* Acquire MO so that we synchronize with the release operation that waiters
> +-     use to decrement __g_refs and thus happen after the waiters we waited
> +-     for.  */
> +-  atomic_thread_fence_acquire ();
> ++     * Waiters in G1 will be closed out immediately by the advancing of
> ++       __g_signals to the next "lowseq" (low 31 bits of the new g1_start),
> ++       which will prevent waiters from blocking using a futex on
> ++       __g_signals since it provides enough signals for all possible
> ++       remaining waiters.  As a result, they can each consume a signal
> ++       and they will eventually remove their group reference.  */
> +
> +   /* Update __g1_start, which finishes closing this group.  The value we add
> +      will never be negative because old_orig_size can only be zero when we
> +      switch groups the first time after a condvar was initialized, in which
> +-     case G1 will be at index 1 and we will add a value of 1.  See above for
> +-     why this takes place after waiting for quiescence of the group.
> ++     case G1 will be at index 1 and we will add a value of 1.
> +      Relaxed MO is fine because the change comes with no additional
> +      constraints that others would have to observe.  */
> +   __condvar_add_g1_start_relaxed (cond,
> +       (old_orig_size << 1) + (g1 == 1 ? 1 : - 1));
> +
> +-  /* Now reopen the group, thus enabling waiters to again block using the
> +-     futex controlled by __g_signals.  Release MO so that observers that see
> +-     no signals (and thus can block) also see the write __g1_start and thus
> +-     that this is now a new group (see __pthread_cond_wait_common for the
> +-     matching acquire MO loads).  */
> +-  atomic_store_release (cond->__data.__g_signals + g1, 0);
> ++  unsigned int lowseq = ((old_g1_start + old_orig_size) << 1) & ~1U;
> ++
> ++  /* If any waiters still hold group references (and thus could be blocked),
> ++     then wake them all up now and prevent any running ones from blocking.
> ++     This is effectively a catch-all for any possible current or future
> ++     bugs that can allow the group size to reach 0 before all G1 waiters
> ++     have been awakened or at least given signals to consume, or any
> ++     other case that can leave blocked (or about to block) older waiters..  */
> ++  if ((atomic_fetch_or_release (cond->__data.__g_refs + g1, 0) >> 1) > 0)
> ++   {
> ++    /* First advance signals to the end of the group (i.e. enough signals
> ++       for the entire G1 group) to ensure that waiters which have not
> ++       yet blocked in the futex will not block.
> ++       Note that in the vast majority of cases, this should never
> ++       actually be necessary, since __g_signals will have enough
> ++       signals for the remaining g_refs waiters.  As an optimization,
> ++       we could check this first before proceeding, although that
> ++       could still leave the potential for futex lost wakeup bugs
> ++       if the signal count was non-zero but the futex wakeup
> ++       was somehow lost.  */
> ++    atomic_store_release (cond->__data.__g_signals + g1, lowseq);
> ++
> ++    futex_wake (cond->__data.__g_signals + g1, INT_MAX, private);
> ++   }
> +
> +   /* At this point, the old G1 is now a valid new G2 (but not in use yet).
> +      No old waiter can neither grab a signal nor acquire a reference without
> +@@ -311,6 +272,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
> +   g1 ^= 1;
> +   *g1index ^= 1;
> +
> ++  /* Now advance the new G1 g_signals to the new lowseq, giving it
> ++     an effective signal count of 0 to start.  */
> ++  atomic_store_release (cond->__data.__g_signals + g1, lowseq);
> ++
> +   /* These values are just observed by signalers, and thus protected by the
> +      lock.  */
> +   unsigned int orig_size = wseq - (old_g1_start + old_orig_size);
> +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
> +index 20c348a5..1cb3dbf7 100644
> +--- a/nptl/pthread_cond_wait.c
> ++++ b/nptl/pthread_cond_wait.c
> +@@ -238,9 +238,7 @@ __condvar_cleanup_waiting (void *arg)
> +    signaled), and a reference count.
> +
> +    The group reference count is used to maintain the number of waiters that
> +-   are using the group's futex.  Before a group can change its role, the
> +-   reference count must show that no waiters are using the futex anymore; this
> +-   prevents ABA issues on the futex word.
> ++   are using the group's futex.
> +
> +    To represent which intervals in the waiter sequence the groups cover (and
> +    thus also which group slot contains G1 or G2), we use a 64b counter to
> +@@ -300,11 +298,12 @@ __condvar_cleanup_waiting (void *arg)
> +        last reference.
> +      * Reference count used by waiters concurrently with signalers that have
> +        acquired the condvar-internal lock.
> +-   __g_signals: The number of signals that can still be consumed.
> ++   __g_signals: The number of signals that can still be consumed, relative to
> ++     the current g1_start.  (i.e. bits 31 to 1 of __g_signals are bits
> ++     31 to 1 of g1_start with the signal count added)
> +      * Used as a futex word by waiters.  Used concurrently by waiters and
> +        signalers.
> +-     * LSB is true iff this group has been completely signaled (i.e., it is
> +-       closed).
> ++     * LSB is currently reserved and 0.
> +    __g_size: Waiters remaining in this group (i.e., which have not been
> +      signaled yet.
> +      * Accessed by signalers and waiters that cancel waiting (both do so only
> +@@ -328,18 +327,6 @@ __condvar_cleanup_waiting (void *arg)
> +    sufficient because if a waiter can see a sufficiently large value, it could
> +    have also consume a signal in the waiters group.
> +
> +-   Waiters try to grab a signal from __g_signals without holding a reference
> +-   count, which can lead to stealing a signal from a more recent group after
> +-   their own group was already closed.  They cannot always detect whether they
> +-   in fact did because they do not know when they stole, but they can
> +-   conservatively add a signal back to the group they stole from; if they
> +-   did so unnecessarily, all that happens is a spurious wake-up.  To make this
> +-   even less likely, __g1_start contains the index of the current g2 too,
> +-   which allows waiters to check if there aliasing on the group slots; if
> +-   there wasn't, they didn't steal from the current G1, which means that the
> +-   G1 they stole from must have been already closed and they do not need to
> +-   fix anything.
> +-
> +    It is essential that the last field in pthread_cond_t is __g_signals[1]:
> +    The previous condvar used a pointer-sized field in pthread_cond_t, so a
> +    PTHREAD_COND_INITIALIZER from that condvar implementation might only
> +@@ -435,6 +422,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +     {
> +       while (1)
> +       {
> ++          uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
> ++          unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
> ++
> +         /* Spin-wait first.
> +            Note that spinning first without checking whether a timeout
> +            passed might lead to what looks like a spurious wake-up even
> +@@ -446,35 +436,45 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +            having to compare against the current time seems to be the right
> +            choice from a performance perspective for most use cases.  */
> +         unsigned int spin = maxspin;
> +-        while (signals == 0 && spin > 0)
> ++        while (spin > 0 && ((int)(signals - lowseq) < 2))
> +           {
> +             /* Check that we are not spinning on a group that's already
> +                closed.  */
> +-            if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
> +-              goto done;
> ++            if (seq < (g1_start >> 1))
> ++              break;
> +
> +             /* TODO Back off.  */
> +
> +             /* Reload signals.  See above for MO.  */
> +             signals = atomic_load_acquire (cond->__data.__g_signals + g);
> ++              g1_start = __condvar_load_g1_start_relaxed (cond);
> ++              lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
> +             spin--;
> +           }
> +
> +-        /* If our group will be closed as indicated by the flag on signals,
> +-           don't bother grabbing a signal.  */
> +-        if (signals & 1)
> +-          goto done;
> +-
> +-        /* If there is an available signal, don't block.  */
> +-        if (signals != 0)
> ++          if (seq < (g1_start >> 1))
> ++          {
> ++              /* If the group is closed already,
> ++               then this waiter originally had enough extra signals to
> ++               consume, up until the time its group was closed.  */
> ++             goto done;
> ++            }
> ++
> ++        /* If there is an available signal, don't block.
> ++             If __g1_start has advanced at all, then we must be in G1
> ++           by now, perhaps in the process of switching back to an older
> ++           G2, but in either case we're allowed to consume the available
> ++           signal and should not block anymore.  */
> ++        if ((int)(signals - lowseq) >= 2)
> +           break;
> +
> +         /* No signals available after spinning, so prepare to block.
> +            We first acquire a group reference and use acquire MO for that so
> +            that we synchronize with the dummy read-modify-write in
> +            __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
> +-           in this case this will make us see the closed flag on __g_signals
> +-           that designates a concurrent attempt to reuse the group's slot.
> ++           in this case this will make us see the advancement of __g_signals
> ++           to the upcoming new g1_start that occurs with a concurrent
> ++           attempt to reuse the group's slot.
> +            We use acquire MO for the __g_signals check to make the
> +            __g1_start check work (see spinning above).
> +            Note that the group reference acquisition will not mask the
> +@@ -482,15 +482,24 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +            an atomic read-modify-write operation and thus extend the release
> +            sequence.  */
> +         atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
> +-        if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
> +-            || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
> ++        signals = atomic_load_acquire (cond->__data.__g_signals + g);
> ++          g1_start = __condvar_load_g1_start_relaxed (cond);
> ++          lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
> ++
> ++          if (seq < (g1_start >> 1))
> +           {
> +-            /* Our group is closed.  Wake up any signalers that might be
> +-               waiting.  */
> ++              /* group is closed already, so don't block */
> +             __condvar_dec_grefs (cond, g, private);
> +             goto done;
> +           }
> +
> ++        if ((int)(signals - lowseq) >= 2)
> ++          {
> ++            /* a signal showed up or G1/G2 switched after we grabbed the refcount */
> ++            __condvar_dec_grefs (cond, g, private);
> ++            break;
> ++            }
> ++
> +         // Now block.
> +         struct _pthread_cleanup_buffer buffer;
> +         struct _condvar_cleanup_buffer cbuffer;
> +@@ -501,7 +510,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +         __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer);
> +
> +         err = __futex_abstimed_wait_cancelable64 (
> +-          cond->__data.__g_signals + g, 0, clockid, abstime, private);
> ++          cond->__data.__g_signals + g, signals, clockid, abstime, private);
> +
> +         __pthread_cleanup_pop (&buffer, 0);
> +
> +@@ -524,6 +533,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +         signals = atomic_load_acquire (cond->__data.__g_signals + g);
> +       }
> +
> ++       if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
> ++       goto done;
> +     }
> +   /* Try to grab a signal.  Use acquire MO so that we see an up-to-date value
> +      of __g1_start below (see spinning above for a similar case).  In
> +@@ -532,69 +543,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
> +   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
> +                                               &signals, signals - 2));
> +
> +-  /* We consumed a signal but we could have consumed from a more recent group
> +-     that aliased with ours due to being in the same group slot.  If this
> +-     might be the case our group must be closed as visible through
> +-     __g1_start.  */
> +-  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
> +-  if (seq < (g1_start >> 1))
> +-    {
> +-      /* We potentially stole a signal from a more recent group but we do not
> +-       know which group we really consumed from.
> +-       We do not care about groups older than current G1 because they are
> +-       closed; we could have stolen from these, but then we just add a
> +-       spurious wake-up for the current groups.
> +-       We will never steal a signal from current G2 that was really intended
> +-       for G2 because G2 never receives signals (until it becomes G1).  We
> +-       could have stolen a signal from G2 that was conservatively added by a
> +-       previous waiter that also thought it stole a signal -- but given that
> +-       that signal was added unnecessarily, it's not a problem if we steal
> +-       it.
> +-       Thus, the remaining case is that we could have stolen from the current
> +-       G1, where "current" means the __g1_start value we observed.  However,
> +-       if the current G1 does not have the same slot index as we do, we did
> +-       not steal from it and do not need to undo that.  This is the reason
> +-       for putting a bit with G2's index into__g1_start as well.  */
> +-      if (((g1_start & 1) ^ 1) == g)
> +-      {
> +-        /* We have to conservatively undo our potential mistake of stealing
> +-           a signal.  We can stop trying to do that when the current G1
> +-           changes because other spinning waiters will notice this too and
> +-           __condvar_quiesce_and_switch_g1 has checked that there are no
> +-           futex waiters anymore before switching G1.
> +-           Relaxed MO is fine for the __g1_start load because we need to
> +-           merely be able to observe this fact and not have to observe
> +-           something else as well.
> +-           ??? Would it help to spin for a little while to see whether the
> +-           current G1 gets closed?  This might be worthwhile if the group is
> +-           small or close to being closed.  */
> +-        unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
> +-        while (__condvar_load_g1_start_relaxed (cond) == g1_start)
> +-          {
> +-            /* Try to add a signal.  We don't need to acquire the lock
> +-               because at worst we can cause a spurious wake-up.  If the
> +-               group is in the process of being closed (LSB is true), this
> +-               has an effect similar to us adding a signal.  */
> +-            if (((s & 1) != 0)
> +-                || atomic_compare_exchange_weak_relaxed
> +-                     (cond->__data.__g_signals + g, &s, s + 2))
> +-              {
> +-                /* If we added a signal, we also need to add a wake-up on
> +-                   the futex.  We also need to do that if we skipped adding
> +-                   a signal because the group is being closed because
> +-                   while __condvar_quiesce_and_switch_g1 could have closed
> +-                   the group, it might stil be waiting for futex waiters to
> +-                   leave (and one of those waiters might be the one we stole
> +-                   the signal from, which cause it to block using the
> +-                   futex).  */
> +-                futex_wake (cond->__data.__g_signals + g, 1, private);
> +-                break;
> +-              }
> +-            /* TODO Back off.  */
> +-          }
> +-      }
> +-    }
> +-
> +  done:
> +
> +   /* Confirm that we have been woken.  We do that before acquiring the mutex
> +--
> +2.49.0
> +
> diff --git a/meta/recipes-core/glibc/glibc_2.35.bb b/meta/recipes-core/glibc/glibc_2.35.bb
> index 9073e04537..2d23c17a48 100644
> --- a/meta/recipes-core/glibc/glibc_2.35.bb
> +++ b/meta/recipes-core/glibc/glibc_2.35.bb
> @@ -61,6 +61,7 @@ SRC_URI =  "${GLIBC_GIT_URI};branch=${SRCBRANCH};name=glibc \
>             file://0022-sysdeps-gnu-configure.ac-Set-libc_cv_rootsbindir-onl.patch \
>             file://0023-timezone-Make-shell-interpreter-overridable-in-tzsel.patch \
>             file://0024-fix-create-thread-failed-in-unprivileged-process-BZ-.patch \
> +           file://0026-PR25847-1.patch \
>             \
>             file://0001-Revert-Linux-Implement-a-useful-version-of-_startup_.patch \
>             file://0002-get_nscd_addresses-Fix-subscript-typos-BZ-29605.patch \
> --
> 2.49.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#218493): https://lists.openembedded.org/g/openembedded-core/message/218493
> Mute This Topic: https://lists.openembedded.org/mt/113603744/3620601
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [steve@sakoman.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch b/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
new file mode 100644
index 0000000000..67a6fa1e62
--- /dev/null
+++ b/meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
@@ -0,0 +1,453 @@ 
+From 6f030a8faf242b897e714cb77c65254b14682cfc Mon Sep 17 00:00:00 2001
+From: Frank Barrus <frankbarrus_sw@shaggy.cc>
+Date: Mon, 9 Jun 2025 02:32:52 -0700
+Subject: [PATCH] pthreads NPTL: lost wakeup fix 2
+
+This fixes the lost wakeup (from a bug in signal stealing) with a change
+in the usage of g_signals[] in the condition variable internal state.
+It also completely eliminates the concept and handling of signal stealing,
+as well as the need for signalers to block to wait for waiters to wake
+up every time there is a G1/G2 switch.  This greatly reduces the average
+and maximum latency for pthread_cond_signal.
+
+The g_signals[] field now contains a signal count that is relative to
+the current g1_start value.  Since it is a 32-bit field, and the LSB is
+still reserved (though not currently used anymore), it has a 31-bit value
+that corresponds to the low 31 bits of the sequence number in g1_start.
+(since g1_start also has an LSB flag, this means bits 31:1 in g_signals
+correspond to bits 31:1 in g1_start, plus the current signal count)
+
+By making the signal count relative to g1_start, there is no longer
+any ambiguity or A/B/A issue, and thus any checks before blocking,
+including the futex call itself, are guaranteed not to block if the G1/G2
+switch occurs, even if the signal count remains the same.  This allows
+initially safely blocking in G2 until the switch to G1 occurs, and
+then transitioning from G1 to a new G1 or G2, and always being able to
+distinguish the state change.  This removes the race condition and A/B/A
+problems that otherwise ocurred if a late (pre-empted) waiter were to
+resume just as the futex call attempted to block on g_signal since
+otherwise there was no last opportunity to re-check things like whether
+the current G1 group was already closed.
+
+By fixing these issues, the signal stealing code can be eliminated,
+since there is no concept of signal stealing anymore.  The code to block
+for all waiters to exit g_refs can also be removed, since any waiters
+that are still in the g_refs region can be guaranteed to safely wake
+up and exit.  If there are still any left at this time, they are all
+sent one final futex wakeup to ensure that they are not blocked any
+longer, but there is no need for the signaller to block and wait for
+them to wake up and exit the g_refs region.
+
+The signal count is then effectively "zeroed" but since it is now
+relative to g1_start, this is done by advancing it to a new value that
+can be observed by any pending blocking waiters.  Any late waiters can
+always tell the difference, and can thus just cleanly exit if they are
+in a stale G1 or G2.  They can never steal a signal from the current
+G1 if they are not in the current G1, since the signal value that has
+to match in the cmpxchg has the low 31 bits of the g1_start value
+contained in it, and that's first checked, and then it won't match if
+there's a G1/G2 change.
+
+Note: the 31-bit sequence number used in g_signals is designed to
+handle wrap-around when checking the signal count, but if the entire
+31-bit wraparound (2 billion signals) occurs while there is still a
+late waiter that has not yet resumed, and it happens to then match
+the current g1_start low bits, and the pre-emption occurs after the
+normal "closed group" checks (which are 64-bit) but then hits the
+futex syscall and signal consuming code, then an A/B/A issue could
+still result and cause an incorrect assumption about whether it
+should block.  This particular scenario seems unlikely in practice.
+Note that once awake from the futex, the waiter would notice the
+closed group before consuming the signal (since that's still a 64-bit
+check that would not be aliased in the wrap-around in g_signals),
+so the biggest impact would be blocking on the futex until the next
+full wakeup from a G1/G2 switch.
+
+The following commits have been cherry-picked from Glibc master branch:
+Bug : https://sourceware.org/bugzilla/show_bug.cgi?id=25847
+
+Upstream-Status: Backport [1db84775f831a1494993ce9c118deaf9537cc50a]
+
+Signed-off-by: Sunil Dora <sunilkumar.dora@windriver.com>
+---
+ nptl/pthread_cond_common.c | 105 +++++++++------------------
+ nptl/pthread_cond_wait.c   | 144 ++++++++++++-------------------------
+ 2 files changed, 81 insertions(+), 168 deletions(-)
+
+diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
+index fb035f72..a55eee3e 100644
+--- a/nptl/pthread_cond_common.c
++++ b/nptl/pthread_cond_common.c
+@@ -201,7 +201,6 @@ static bool __attribute__ ((unused))
+ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
+     unsigned int *g1index, int private)
+ {
+-  const unsigned int maxspin = 0;
+   unsigned int g1 = *g1index;
+ 
+   /* If there is no waiter in G2, we don't do anything.  The expression may
+@@ -222,84 +221,46 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
+      * New waiters arriving concurrently with the group switching will all go
+        into G2 until we atomically make the switch.  Waiters existing in G2
+        are not affected.
+-     * Waiters in G1 will be closed out immediately by setting a flag in
+-       __g_signals, which will prevent waiters from blocking using a futex on
+-       __g_signals and also notifies them that the group is closed.  As a
+-       result, they will eventually remove their group reference, allowing us
+-       to close switch group roles.  */
+-
+-  /* First, set the closed flag on __g_signals.  This tells waiters that are
+-     about to wait that they shouldn't do that anymore.  This basically
+-     serves as an advance notificaton of the upcoming change to __g1_start;
+-     waiters interpret it as if __g1_start was larger than their waiter
+-     sequence position.  This allows us to change __g1_start after waiting
+-     for all existing waiters with group references to leave, which in turn
+-     makes recovery after stealing a signal simpler because it then can be
+-     skipped if __g1_start indicates that the group is closed (otherwise,
+-     we would have to recover always because waiters don't know how big their
+-     groups are).  Relaxed MO is fine.  */
+-  atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1);
+-
+-  /* Wait until there are no group references anymore.  The fetch-or operation
+-     injects us into the modification order of __g_refs; release MO ensures
+-     that waiters incrementing __g_refs after our fetch-or see the previous
+-     changes to __g_signals and to __g1_start that had to happen before we can
+-     switch this G1 and alias with an older group (we have two groups, so
+-     aliasing requires switching group roles twice).  Note that nobody else
+-     can have set the wake-request flag, so we do not have to act upon it.
+-
+-     Also note that it is harmless if older waiters or waiters from this G1
+-     get a group reference after we have quiesced the group because it will
+-     remain closed for them either because of the closed flag in __g_signals
+-     or the later update to __g1_start.  New waiters will never arrive here
+-     but instead continue to go into the still current G2.  */
+-  unsigned r = atomic_fetch_or_release (cond->__data.__g_refs + g1, 0);
+-  while ((r >> 1) > 0)
+-    {
+-      for (unsigned int spin = maxspin; ((r >> 1) > 0) && (spin > 0); spin--)
+-	{
+-	  /* TODO Back off.  */
+-	  r = atomic_load_relaxed (cond->__data.__g_refs + g1);
+-	}
+-      if ((r >> 1) > 0)
+-	{
+-	  /* There is still a waiter after spinning.  Set the wake-request
+-	     flag and block.  Relaxed MO is fine because this is just about
+-	     this futex word.
+-
+-	     Update r to include the set wake-request flag so that the upcoming
+-	     futex_wait only blocks if the flag is still set (otherwise, we'd
+-	     violate the basic client-side futex protocol).  */
+-	  r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1) | 1;
+-
+-	  if ((r >> 1) > 0)
+-	    futex_wait_simple (cond->__data.__g_refs + g1, r, private);
+-	  /* Reload here so we eventually see the most recent value even if we
+-	     do not spin.   */
+-	  r = atomic_load_relaxed (cond->__data.__g_refs + g1);
+-	}
+-    }
+-  /* Acquire MO so that we synchronize with the release operation that waiters
+-     use to decrement __g_refs and thus happen after the waiters we waited
+-     for.  */
+-  atomic_thread_fence_acquire ();
++     * Waiters in G1 will be closed out immediately by the advancing of
++       __g_signals to the next "lowseq" (low 31 bits of the new g1_start),
++       which will prevent waiters from blocking using a futex on
++       __g_signals since it provides enough signals for all possible
++       remaining waiters.  As a result, they can each consume a signal
++       and they will eventually remove their group reference.  */
+ 
+   /* Update __g1_start, which finishes closing this group.  The value we add
+      will never be negative because old_orig_size can only be zero when we
+      switch groups the first time after a condvar was initialized, in which
+-     case G1 will be at index 1 and we will add a value of 1.  See above for
+-     why this takes place after waiting for quiescence of the group.
++     case G1 will be at index 1 and we will add a value of 1.
+      Relaxed MO is fine because the change comes with no additional
+      constraints that others would have to observe.  */
+   __condvar_add_g1_start_relaxed (cond,
+       (old_orig_size << 1) + (g1 == 1 ? 1 : - 1));
+ 
+-  /* Now reopen the group, thus enabling waiters to again block using the
+-     futex controlled by __g_signals.  Release MO so that observers that see
+-     no signals (and thus can block) also see the write __g1_start and thus
+-     that this is now a new group (see __pthread_cond_wait_common for the
+-     matching acquire MO loads).  */
+-  atomic_store_release (cond->__data.__g_signals + g1, 0);
++  unsigned int lowseq = ((old_g1_start + old_orig_size) << 1) & ~1U;
++
++  /* If any waiters still hold group references (and thus could be blocked),
++     then wake them all up now and prevent any running ones from blocking.
++     This is effectively a catch-all for any possible current or future
++     bugs that can allow the group size to reach 0 before all G1 waiters
++     have been awakened or at least given signals to consume, or any
++     other case that can leave blocked (or about to block) older waiters..  */
++  if ((atomic_fetch_or_release (cond->__data.__g_refs + g1, 0) >> 1) > 0)
++   {
++    /* First advance signals to the end of the group (i.e. enough signals
++       for the entire G1 group) to ensure that waiters which have not
++       yet blocked in the futex will not block.
++       Note that in the vast majority of cases, this should never
++       actually be necessary, since __g_signals will have enough
++       signals for the remaining g_refs waiters.  As an optimization,
++       we could check this first before proceeding, although that
++       could still leave the potential for futex lost wakeup bugs
++       if the signal count was non-zero but the futex wakeup
++       was somehow lost.  */
++    atomic_store_release (cond->__data.__g_signals + g1, lowseq);
++
++    futex_wake (cond->__data.__g_signals + g1, INT_MAX, private);
++   }
+ 
+   /* At this point, the old G1 is now a valid new G2 (but not in use yet).
+      No old waiter can neither grab a signal nor acquire a reference without
+@@ -311,6 +272,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
+   g1 ^= 1;
+   *g1index ^= 1;
+ 
++  /* Now advance the new G1 g_signals to the new lowseq, giving it
++     an effective signal count of 0 to start.  */
++  atomic_store_release (cond->__data.__g_signals + g1, lowseq);
++
+   /* These values are just observed by signalers, and thus protected by the
+      lock.  */
+   unsigned int orig_size = wseq - (old_g1_start + old_orig_size);
+diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
+index 20c348a5..1cb3dbf7 100644
+--- a/nptl/pthread_cond_wait.c
++++ b/nptl/pthread_cond_wait.c
+@@ -238,9 +238,7 @@ __condvar_cleanup_waiting (void *arg)
+    signaled), and a reference count.
+ 
+    The group reference count is used to maintain the number of waiters that
+-   are using the group's futex.  Before a group can change its role, the
+-   reference count must show that no waiters are using the futex anymore; this
+-   prevents ABA issues on the futex word.
++   are using the group's futex.
+ 
+    To represent which intervals in the waiter sequence the groups cover (and
+    thus also which group slot contains G1 or G2), we use a 64b counter to
+@@ -300,11 +298,12 @@ __condvar_cleanup_waiting (void *arg)
+        last reference.
+      * Reference count used by waiters concurrently with signalers that have
+        acquired the condvar-internal lock.
+-   __g_signals: The number of signals that can still be consumed.
++   __g_signals: The number of signals that can still be consumed, relative to
++     the current g1_start.  (i.e. bits 31 to 1 of __g_signals are bits
++     31 to 1 of g1_start with the signal count added)
+      * Used as a futex word by waiters.  Used concurrently by waiters and
+        signalers.
+-     * LSB is true iff this group has been completely signaled (i.e., it is
+-       closed).
++     * LSB is currently reserved and 0.
+    __g_size: Waiters remaining in this group (i.e., which have not been
+      signaled yet.
+      * Accessed by signalers and waiters that cancel waiting (both do so only
+@@ -328,18 +327,6 @@ __condvar_cleanup_waiting (void *arg)
+    sufficient because if a waiter can see a sufficiently large value, it could
+    have also consume a signal in the waiters group.
+ 
+-   Waiters try to grab a signal from __g_signals without holding a reference
+-   count, which can lead to stealing a signal from a more recent group after
+-   their own group was already closed.  They cannot always detect whether they
+-   in fact did because they do not know when they stole, but they can
+-   conservatively add a signal back to the group they stole from; if they
+-   did so unnecessarily, all that happens is a spurious wake-up.  To make this
+-   even less likely, __g1_start contains the index of the current g2 too,
+-   which allows waiters to check if there aliasing on the group slots; if
+-   there wasn't, they didn't steal from the current G1, which means that the
+-   G1 they stole from must have been already closed and they do not need to
+-   fix anything.
+-
+    It is essential that the last field in pthread_cond_t is __g_signals[1]:
+    The previous condvar used a pointer-sized field in pthread_cond_t, so a
+    PTHREAD_COND_INITIALIZER from that condvar implementation might only
+@@ -435,6 +422,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
+     {
+       while (1)
+ 	{
++          uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
++          unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
++
+ 	  /* Spin-wait first.
+ 	     Note that spinning first without checking whether a timeout
+ 	     passed might lead to what looks like a spurious wake-up even
+@@ -446,35 +436,45 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
+ 	     having to compare against the current time seems to be the right
+ 	     choice from a performance perspective for most use cases.  */
+ 	  unsigned int spin = maxspin;
+-	  while (signals == 0 && spin > 0)
++	  while (spin > 0 && ((int)(signals - lowseq) < 2))
+ 	    {
+ 	      /* Check that we are not spinning on a group that's already
+ 		 closed.  */
+-	      if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
+-		goto done;
++	      if (seq < (g1_start >> 1))
++		break;
+ 
+ 	      /* TODO Back off.  */
+ 
+ 	      /* Reload signals.  See above for MO.  */
+ 	      signals = atomic_load_acquire (cond->__data.__g_signals + g);
++              g1_start = __condvar_load_g1_start_relaxed (cond);
++              lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
+ 	      spin--;
+ 	    }
+ 
+-	  /* If our group will be closed as indicated by the flag on signals,
+-	     don't bother grabbing a signal.  */
+-	  if (signals & 1)
+-	    goto done;
+-
+-	  /* If there is an available signal, don't block.  */
+-	  if (signals != 0)
++          if (seq < (g1_start >> 1))
++	    {
++              /* If the group is closed already,
++	         then this waiter originally had enough extra signals to
++	         consume, up until the time its group was closed.  */
++	       goto done;
++            }
++
++	  /* If there is an available signal, don't block.
++             If __g1_start has advanced at all, then we must be in G1
++	     by now, perhaps in the process of switching back to an older
++	     G2, but in either case we're allowed to consume the available
++	     signal and should not block anymore.  */
++	  if ((int)(signals - lowseq) >= 2)
+ 	    break;
+ 
+ 	  /* No signals available after spinning, so prepare to block.
+ 	     We first acquire a group reference and use acquire MO for that so
+ 	     that we synchronize with the dummy read-modify-write in
+ 	     __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
+-	     in this case this will make us see the closed flag on __g_signals
+-	     that designates a concurrent attempt to reuse the group's slot.
++	     in this case this will make us see the advancement of __g_signals
++	     to the upcoming new g1_start that occurs with a concurrent
++	     attempt to reuse the group's slot.
+ 	     We use acquire MO for the __g_signals check to make the
+ 	     __g1_start check work (see spinning above).
+ 	     Note that the group reference acquisition will not mask the
+@@ -482,15 +482,24 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
+ 	     an atomic read-modify-write operation and thus extend the release
+ 	     sequence.  */
+ 	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
+-	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
+-	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
++	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
++          g1_start = __condvar_load_g1_start_relaxed (cond);
++          lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U;
++
++          if (seq < (g1_start >> 1))
+ 	    {
+-	      /* Our group is closed.  Wake up any signalers that might be
+-		 waiting.  */
++              /* group is closed already, so don't block */
+ 	      __condvar_dec_grefs (cond, g, private);
+ 	      goto done;
+ 	    }
+ 
++	  if ((int)(signals - lowseq) >= 2)
++	    {
++	      /* a signal showed up or G1/G2 switched after we grabbed the refcount */
++	      __condvar_dec_grefs (cond, g, private);
++	      break;
++            }
++
+ 	  // Now block.
+ 	  struct _pthread_cleanup_buffer buffer;
+ 	  struct _condvar_cleanup_buffer cbuffer;
+@@ -501,7 +510,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
+ 	  __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer);
+ 
+ 	  err = __futex_abstimed_wait_cancelable64 (
+-	    cond->__data.__g_signals + g, 0, clockid, abstime, private);
++	    cond->__data.__g_signals + g, signals, clockid, abstime, private);
+ 
+ 	  __pthread_cleanup_pop (&buffer, 0);
+ 
+@@ -524,6 +533,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
+ 	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
+ 	}
+ 
++       if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
++	 goto done;
+     }
+   /* Try to grab a signal.  Use acquire MO so that we see an up-to-date value
+      of __g1_start below (see spinning above for a similar case).  In
+@@ -532,69 +543,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
+   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
+ 						&signals, signals - 2));
+ 
+-  /* We consumed a signal but we could have consumed from a more recent group
+-     that aliased with ours due to being in the same group slot.  If this
+-     might be the case our group must be closed as visible through
+-     __g1_start.  */
+-  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
+-  if (seq < (g1_start >> 1))
+-    {
+-      /* We potentially stole a signal from a more recent group but we do not
+-	 know which group we really consumed from.
+-	 We do not care about groups older than current G1 because they are
+-	 closed; we could have stolen from these, but then we just add a
+-	 spurious wake-up for the current groups.
+-	 We will never steal a signal from current G2 that was really intended
+-	 for G2 because G2 never receives signals (until it becomes G1).  We
+-	 could have stolen a signal from G2 that was conservatively added by a
+-	 previous waiter that also thought it stole a signal -- but given that
+-	 that signal was added unnecessarily, it's not a problem if we steal
+-	 it.
+-	 Thus, the remaining case is that we could have stolen from the current
+-	 G1, where "current" means the __g1_start value we observed.  However,
+-	 if the current G1 does not have the same slot index as we do, we did
+-	 not steal from it and do not need to undo that.  This is the reason
+-	 for putting a bit with G2's index into__g1_start as well.  */
+-      if (((g1_start & 1) ^ 1) == g)
+-	{
+-	  /* We have to conservatively undo our potential mistake of stealing
+-	     a signal.  We can stop trying to do that when the current G1
+-	     changes because other spinning waiters will notice this too and
+-	     __condvar_quiesce_and_switch_g1 has checked that there are no
+-	     futex waiters anymore before switching G1.
+-	     Relaxed MO is fine for the __g1_start load because we need to
+-	     merely be able to observe this fact and not have to observe
+-	     something else as well.
+-	     ??? Would it help to spin for a little while to see whether the
+-	     current G1 gets closed?  This might be worthwhile if the group is
+-	     small or close to being closed.  */
+-	  unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
+-	  while (__condvar_load_g1_start_relaxed (cond) == g1_start)
+-	    {
+-	      /* Try to add a signal.  We don't need to acquire the lock
+-		 because at worst we can cause a spurious wake-up.  If the
+-		 group is in the process of being closed (LSB is true), this
+-		 has an effect similar to us adding a signal.  */
+-	      if (((s & 1) != 0)
+-		  || atomic_compare_exchange_weak_relaxed
+-		       (cond->__data.__g_signals + g, &s, s + 2))
+-		{
+-		  /* If we added a signal, we also need to add a wake-up on
+-		     the futex.  We also need to do that if we skipped adding
+-		     a signal because the group is being closed because
+-		     while __condvar_quiesce_and_switch_g1 could have closed
+-		     the group, it might stil be waiting for futex waiters to
+-		     leave (and one of those waiters might be the one we stole
+-		     the signal from, which cause it to block using the
+-		     futex).  */
+-		  futex_wake (cond->__data.__g_signals + g, 1, private);
+-		  break;
+-		}
+-	      /* TODO Back off.  */
+-	    }
+-	}
+-    }
+-
+  done:
+ 
+   /* Confirm that we have been woken.  We do that before acquiring the mutex
+-- 
+2.49.0
+
diff --git a/meta/recipes-core/glibc/glibc_2.35.bb b/meta/recipes-core/glibc/glibc_2.35.bb
index 9073e04537..2d23c17a48 100644
--- a/meta/recipes-core/glibc/glibc_2.35.bb
+++ b/meta/recipes-core/glibc/glibc_2.35.bb
@@ -61,6 +61,7 @@  SRC_URI =  "${GLIBC_GIT_URI};branch=${SRCBRANCH};name=glibc \
            file://0022-sysdeps-gnu-configure.ac-Set-libc_cv_rootsbindir-onl.patch \
            file://0023-timezone-Make-shell-interpreter-overridable-in-tzsel.patch \
            file://0024-fix-create-thread-failed-in-unprivileged-process-BZ-.patch \
+           file://0026-PR25847-1.patch \
            \
            file://0001-Revert-Linux-Implement-a-useful-version-of-_startup_.patch \
            file://0002-get_nscd_addresses-Fix-subscript-typos-BZ-29605.patch \