mbox series

[kirkstone,00/11] glibc: Fix lost wakeup in pthread_cond_wait (BZ#25847)

Message ID 20251014144347.536537-1-sunilkumar.dora@windriver.com
Headers show
Series glibc: Fix lost wakeup in pthread_cond_wait (BZ#25847) | expand

Message

Sunil Kumar Dora Oct. 14, 2025, 2:43 p.m. UTC
From: Sunil Dora <sunilkumar.dora@windriver.com>

This series backports the full 10-commit upstream fix for a race condition in
pthread_cond_wait that can cause lost wake-ups under high thread contention,
particularly affecting applications with many threads waiting on condition
variables.

BUG: https://sourceware.org/bugzilla/show_bug.cgi?id=25847

Previously, an 8-patch partial backport was submitted, excluding commit
c36fc50781995e6758cae2b6927839d0157f213c ("nptl: Remove g_refs from condition
variables") and its dependent dbc5a50d12eff4cb3f782129029d04b8a76f58e7 ("nptl:
PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions (bug 32786)")
following guidance from glibc maintainer Florian Weimer (BZ#25847 comment #74,
2025-04-18). This exclusion aimed to avoid potential ABI incompatibilities in
pthread_cond_t layout for process-shared condition variables in stable branches.

However, per subsequent recommendation from glibc maintainer Carlos O'Donell
(BZ#25847 comment #75, 2025-04-29) to backport the exact series from main
development for consistency, this resubmission includes the full 10 commits.

The initial patch in this series removes the prior partial backport to enable
clean application of the complete series.

Glibc Upstream status: Fixed in glibc master till 2.38

[Root Cause Analysis]
 The lost wakeup occurs when all of these happen simultaneously:
 1. A waiter thread gets preempted between signal decrement and group check
 2. Multiple group rotations occur during preemption
 3. Signal accounting becomes corrupted
 4. Final signals are delivered to empty groups

[Impact]
 Applications using pthread condition variables could experience hangs due to lost wake-ups.
 This particularly affects high-contention scenarios with many threads.

[Fix Details]
 The fix prevents signal stealing by:
     Broadening scope of g_refs to cover entire wait operation
     Removing the complex signal stealing handling code
     Properly maintaining signal accounting invariants

Upstream Status: Fixed in glibc 2.41 and master:
 [https://sourceware.org/bugzilla/show_bug.cgi?id=25847#c72]


[Testing]
 Verified on x86_64 with:
 - Custom reproduction test case
 - Stress testing with high thread counts

Reproduction Steps:
 1. Build glibc(2.35) with injected delay (usleep(10)) in __pthread_cond_wait_common
     Just before : uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
     File: { nptl/pthread_cond_wait.c }
     (Note: The injected delay is for testing purposes only to reliably reproduce the race condition)
 2. Compile test program with custom glibc:
     [https://sourceware.org/bugzilla/attachment.cgi?id=14360]
     gcc -g -o pthread_cond_bug pthread_cond_bug.c \
     -Wl,-dynamic-linker=<CUSTOM_GLIBC>/lib/ld-linux-x86-64.so.2 \
     -Wl,-rpath=<CUSTOM_GLIBC>/lib -L<CUSTOM_GLIBC>/lib \
     -I<CUSTOM_GLIBC>/include -lpthread
 3. Run with CPU pinning: taskset -c 0 ./pthread_cond_bug

[Behavior]
 Without the fix:
 - Process hangs for 300 seconds
 - Eventually crashes with core dump
 - Verifiable via gdb

With the fix:
 - Process runs smoothly even with forced delay.

Sunil Dora (11):
  glibc: Remove partial BZ#25847 backport patches
  glibc: pthreads NPTL lost wakeup fix 2
  glibc: nptl Update comments and indentation for new condvar
    implementation
  glibc: nptl Remove unnecessary catch-all-wake in condvar group switch
  glibc: nptl Remove unnecessary quadruple check in pthread_cond_wait
  glibc: Remove g_refs from condition variables
  glibc: nptl Use a single loop in pthread_cond_wait instaed of a nested
    loop
  glibc: nptl Fix indentation
  glibc: nptl rename __condvar_quiesce_and_switch_g1
  glibc: nptl Use all of g1_start and g_signals
  glibc: : PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions
    (bug 32786)

 .../glibc/glibc/0026-PR25847-1.patch          |  24 +-
 .../glibc/glibc/0026-PR25847-10.patch         |  54 ++++
 .../glibc/glibc/0026-PR25847-2.patch          |  13 +-
 .../glibc/glibc/0026-PR25847-3.patch          |  18 +-
 .../glibc/glibc/0026-PR25847-4.patch          |  11 +-
 .../glibc/glibc/0026-PR25847-5.patch          | 237 ++++++++++-----
 .../glibc/glibc/0026-PR25847-6.patch          | 220 +++++---------
 .../glibc/glibc/0026-PR25847-7.patch          | 277 +++++++++---------
 .../glibc/glibc/0026-PR25847-8.patch          | 269 ++++++++---------
 .../glibc/glibc/0026-PR25847-9.patch          | 193 ++++++++++++
 meta/recipes-core/glibc/glibc_2.35.bb         |   2 +
 11 files changed, 773 insertions(+), 545 deletions(-)
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-10.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-9.patch