mbox series

[kirkstone,V3,0/8] glibc: Fix lost wakeup in pthread_cond_wait (PR25847)

Message ID 20250617100855.2696492-1-sunilkumar.dora@windriver.com
Headers show
Series glibc: Fix lost wakeup in pthread_cond_wait (PR25847) | expand

Message

Dora, Sunil Kumar June 17, 2025, 10:08 a.m. UTC
From: Sunil Dora <sunilkumar.dora@windriver.com>

V3 Changes:
  - Split the previous single large patch into 9 individual patches, 
    each corresponding to an upstream commit, for easier review as requested.
  - Added the recipe name (glibc) at the beginning of each commit message for clarity.
  - In the Upstream-Status field, replaced the upstream commit ID with a direct link to the
    corresponding commit on the Sourceware repository for better traceability.

PR: https://sourceware.org/bugzilla/show_bug.cgi?id=25847

A race condition in pthread_cond_wait that could cause lost wake-ups
under high thread contention. The issue particularly affects
applications with many threads waiting on condition variables.

[Root Cause Analysis]
 The lost wakeup occurs when all of these happen simultaneously:
 1. A waiter thread gets preempted between signal decrement and group check
 2. Multiple group rotations occur during preemption
 3. Signal accounting becomes corrupted
 4. Final signals are delivered to empty groups

[Impact]
 Applications using pthread condition variables could experience hangs due to lost wake-ups.
 This particularly affects high-contention scenarios with many threads.

[Fix Details]
 The fix prevents signal stealing by:
     Broadening scope of g_refs to cover entire wait operation
     Removing the complex signal stealing handling code
     Properly maintaining signal accounting invariants

Upstream Status: Fixed in glibc 2.41 and master:
 [https://sourceware.org/bugzilla/show_bug.cgi?id=25847#c72]

According to https://sourceware.org/bugzilla/show_bug.cgi?id=25847#c74, 
commit c36fc50781995e6758cae2b6927839d0157f213c is unsuitable for backporting to older branches 
and has therefore been excluded.

[Testing]
 Verified on x86_64 with:
 - Custom reproduction test case
 - Stress testing with high thread counts

Reproduction Steps:
 1. Build glibc(2.35) with injected delay (usleep(10)) in __pthread_cond_wait_common
     Just before : uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
     File: { nptl/pthread_cond_wait.c }
     (Note: The injected delay is for testing purposes only to reliably reproduce the race condition)
 2. Compile test program with custom glibc:
     [https://sourceware.org/bugzilla/attachment.cgi?id=14360]
     gcc -g -o pthread_cond_bug pthread_cond_bug.c \
     -Wl,-dynamic-linker=<CUSTOM_GLIBC>/lib/ld-linux-x86-64.so.2 \
     -Wl,-rpath=<CUSTOM_GLIBC>/lib -L<CUSTOM_GLIBC>/lib \
     -I<CUSTOM_GLIBC>/include -lpthread
 3. Run with CPU pinning: taskset -c 0 ./pthread_cond_bug

[Behavior]
 Without the fix:
 - Process hangs for 300 seconds
 - Eventually crashes with core dump
 - Verifiable via gdb

With the fix:
 - Process runs smoothly even with forced delay.
  

Sunil Dora (8):
  glibc: pthreads NPTL lost wakeup fix 2
  glibc: nptl Update comments and indentation for new condvar
    implementation
  glibc: nptl Remove unnecessary catch-all-wake in condvar group switch
  glibc: nptl Remove unnecessary quadruple check in pthread_cond_wait
  glibc: nptl Use a single loop in pthread_cond_wait instaed of a nested
    loop
  glibc: nptl Fix indentation
  glibc: nptl rename __condvar_quiesce_and_switch_g1
  glibc: nptl Use all of g1_start and g_signals

 .../glibc/glibc/0026-PR25847-1.patch          | 455 ++++++++++++++++++
 .../glibc/glibc/0026-PR25847-2.patch          | 144 ++++++
 .../glibc/glibc/0026-PR25847-3.patch          |  77 +++
 .../glibc/glibc/0026-PR25847-4.patch          | 117 +++++
 .../glibc/glibc/0026-PR25847-5.patch          | 105 ++++
 .../glibc/glibc/0026-PR25847-6.patch          | 169 +++++++
 .../glibc/glibc/0026-PR25847-7.patch          | 160 ++++++
 .../glibc/glibc/0026-PR25847-8.patch          | 192 ++++++++
 meta/recipes-core/glibc/glibc_2.35.bb         |   8 +
 9 files changed, 1427 insertions(+)
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-1.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-2.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-3.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-4.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-5.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-6.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-7.patch
 create mode 100644 meta/recipes-core/glibc/glibc/0026-PR25847-8.patch