diff mbox series

[kirkstone,10/10] glibc: Add single-threaded fast path to rand()

Message ID 19fcc012160c6b8782be0e6fc54797c88bf084ba.1744145328.git.steve@sakoman.com
State RFC
Delegated to: Steve Sakoman
Headers show
Series [kirkstone,01/10] curl: ignore CVE-2025-0725 | expand

Commit Message

Steve Sakoman April 8, 2025, 8:51 p.m. UTC
From: Haixiao Yan <haixiao.yan.cn@windriver.com>

Backport a patch [1] to improve performance of rand() and __random()[2]
by adding a single-threaded fast path.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=32777
Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 ...dd-single-threaded-fast-path-to-rand.patch | 47 +++++++++++++++++++
 meta/recipes-core/glibc/glibc_2.35.bb         |  1 +
 2 files changed, 48 insertions(+)
 create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch

Comments

Richard Purdie April 10, 2025, 11:33 a.m. UTC | #1
On Tue, 2025-04-08 at 13:51 -0700, Steve Sakoman via
lists.openembedded.org wrote:
> From: Haixiao Yan <haixiao.yan.cn@windriver.com>
> 
> Backport a patch [1] to improve performance of rand() and
> __random()[2]
> by adding a single-threaded fast path.
> 
> [1]
> https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f
> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=32777
> Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com>
> Signed-off-by: Steve Sakoman <steve@sakoman.com>
> ---
>  ...dd-single-threaded-fast-path-to-rand.patch | 47
> +++++++++++++++++++
>  meta/recipes-core/glibc/glibc_2.35.bb         |  1 +
>  2 files changed, 48 insertions(+)
>  create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add-
> single-threaded-fast-path-to-rand.patch

This isn't in walnascar yet.

Cheers,

Richard
Steve Sakoman April 10, 2025, 1:49 p.m. UTC | #2
On Thu, Apr 10, 2025 at 4:33 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> On Tue, 2025-04-08 at 13:51 -0700, Steve Sakoman via
> lists.openembedded.org wrote:
> > From: Haixiao Yan <haixiao.yan.cn@windriver.com>
> >
> > Backport a patch [1] to improve performance of rand() and
> > __random()[2]
> > by adding a single-threaded fast path.
> >
> > [1]
> > https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f
> > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=32777
> > Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com>
> > Signed-off-by: Steve Sakoman <steve@sakoman.com>
> > ---
> >  ...dd-single-threaded-fast-path-to-rand.patch | 47
> > +++++++++++++++++++
> >  meta/recipes-core/glibc/glibc_2.35.bb         |  1 +
> >  2 files changed, 48 insertions(+)
> >  create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add-
> > single-threaded-fast-path-to-rand.patch
>
> This isn't in walnascar yet.

I've got it and "qemu 8.2.7: ignore CVE-2023-1386" in my local
walnascar branch. I will hold off merging to all stable branches till
after walnascar release.

Thanks for catching this!

Steve
Randy MacLeod April 10, 2025, 3:30 p.m. UTC | #3
On 2025-04-10 9:49 a.m., Steve Sakoman via lists.openembedded.org wrote:
> On Thu, Apr 10, 2025 at 4:33 AM Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
>> On Tue, 2025-04-08 at 13:51 -0700, Steve Sakoman via
>> lists.openembedded.org wrote:
>>> From: Haixiao Yan<haixiao.yan.cn@windriver.com>
>>>
>>> Backport a patch [1] to improve performance of rand() and
>>> __random()[2]
>>> by adding a single-threaded fast path.
>>>
>>> [1]
>>> https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f
>>> [2]https://sourceware.org/bugzilla/show_bug.cgi?id=32777
>>> Signed-off-by: Haixiao Yan<haixiao.yan.cn@windriver.com>
>>> Signed-off-by: Steve Sakoman<steve@sakoman.com>
>>> ---
>>>   ...dd-single-threaded-fast-path-to-rand.patch | 47
>>> +++++++++++++++++++
>>>   meta/recipes-core/glibc/glibc_2.35.bb         |  1 +
>>>   2 files changed, 48 insertions(+)
>>>   create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add-
>>> single-threaded-fast-path-to-rand.patch
>> This isn't in walnascar yet.
> I've got it and "qemu 8.2.7: ignore CVE-2023-1386" in my local
> walnascar branch. I will hold off merging to all stable branches till
> after walnascar release.
>
> Thanks for catching this!

People were discussing this bug wrt adding it to walnascar's RC2 bulid.
The conclusion is to merge it to walnascar *after* GA so it can soak for 
a bit
and to minimize risk to the GA release.

This is a customer issue and they have been happy with the fix since 
March 28th but
it was only merged to our product repo on April 2nd so it's probably a 
good idea
to let it soak.

I'm not sure if it'll help but below is Haixiao's summary from our 
internal bug tracker.
It's a verbose version of the master branch commit log.
One notable comment is:
    the issue can not reproduce on modern CPU (such as Zen3/Ryzen9),
    they are smart to handle the lock for single-thread case.

I wonder how many other 'modern' targets avoid the issue at build / run 
time.

If we want more info, please rely and Haixiao can spend some time on digging
a bit more.

../Randy



Summary:

This is a generic issue, not only found on wrLinux, also met on ubuntu 
22.04.

The degradation since from glibc-2.34, until the latest glibc-2.41 is 
affected.
The fix will be included in glibc-2.42.
I have filed https://sourceware.org/bugzilla/show_bug.cgi?id=32777 
<https://sourceware.org/bugzilla/show_bug.cgi?id=32777> to the upstream.

The first bad commit is 60d5e40ab200033a982a9fd7594a1f83dcdb94a0.

commit 60d5e40ab200033a982a9fd7594a1f83dcdb94a0
Author: Florian Weimer<fweimer@redhat.com>
Date:   Wed Apr 21 19:49:51 2021 +0200

     x86: Remove low-level lock optimization
     
     The current approach is to do this optimizations at a higher level,
     in generic code, so that single-threaded cases can be specifically
     targeted.
     
     Furthermore, using IS_IN (libc) as a compile-time indicator that
     all locks are private is no longer correct once process-shared lock
     implementations are moved into libc.
     
     The generic <lowlevellock.h> is not compatible with assembler code
     (obviously), so it's necessary to remove two long-unused #includes.
     
     Reviewed-by: Adhemerval Zanella<adhemerval.zanella@linaro.org>

This commit remove the optimization for single-threaded case, such as 
mallocrandfree.
 From the perf result, random function has been called significantly 
increased from 15.5% to 42.28%.
on glibc-2.33, disassemble the random function:

(gdb) disassemble random
Dump of assembler code for function __random:
    0x00007ffff7e3dc20 <+0>:     endbr64
    0x00007ffff7e3dc24 <+4>:     sub    $0x18,%rsp
    0x00007ffff7e3dc28 <+8>:     mov    %fs:0x18,%eax
    0x00007ffff7e3dc30 <+16>:    test   %eax,%eax
    0x00007ffff7e3dc32 <+18>:    jne    0x7ffff7e3dc70 <__random+80>
    0x00007ffff7e3dc34 <+20>:    mov    $0x1,%edx
    0x00007ffff7e3dc39 <+25>:    cmpxchg %edx,0x17e708(%rip)        # 0x7ffff7fbc348 <lock>
    0x00007ffff7e3dc40 <+32>:    lea    0xc(%rsp),%rsi
    0x00007ffff7e3dc45 <+37>:    lea    0x17bb14(%rip),%rdi        # 0x7ffff7fb9760 <unsafe_state>
    0x00007ffff7e3dc4c <+44>:    call   0x7ffff7e3e070 <__random_r>
    0x00007ffff7e3dc51 <+49>:    mov    %fs:0x18,%eax
    0x00007ffff7e3dc59 <+57>:    test   %eax,%eax
    0x00007ffff7e3dc5b <+59>:    jne    0x7ffff7e3dc90 <__random+112>
    0x00007ffff7e3dc5d <+61>:    subl   $0x1,0x17e6e4(%rip)        # 0x7ffff7fbc348 <lock>
    0x00007ffff7e3dc64 <+68>:    movslq 0xc(%rsp),%rax
    0x00007ffff7e3dc69 <+73>:    add    $0x18,%rsp
    0x00007ffff7e3dc6d <+77>:    ret
    0x00007ffff7e3dc6e <+78>:    xchg   %ax,%ax
    0x00007ffff7e3dc70 <+80>:    xor    %eax,%eax
    0x00007ffff7e3dc72 <+82>:    mov    $0x1,%edx
    0x00007ffff7e3dc77 <+87>:    lock cmpxchg %edx,0x17e6c9(%rip)        # 0x7ffff7fbc348 <lock>
    0x00007ffff7e3dc7f <+95>:    je     0x7ffff7e3dc40 <__random+32>
    0x00007ffff7e3dc81 <+97>:    lea    0x17e6c0(%rip),%rdi        # 0x7ffff7fbc348 <lock>
    0x00007ffff7e3dc88 <+104>:   call   0x7ffff7e81c00 <__lll_lock_wait_private>
    0x00007ffff7e3dc8d <+109>:   jmp    0x7ffff7e3dc40 <__random+32>
    0x00007ffff7e3dc8f <+111>:   nop
    0x00007ffff7e3dc90 <+112>:   xor    %eax,%eax
    0x00007ffff7e3dc92 <+114>:   xchg   %eax,0x17e6b0(%rip)        # 0x7ffff7fbc348 <lock>
    0x00007ffff7e3dc98 <+120>:   cmp    $0x1,%eax
    0x00007ffff7e3dc9b <+123>:   jle    0x7ffff7e3dc64 <__random+68>
    0x00007ffff7e3dc9d <+125>:   xor    %r10d,%r10d
    0x00007ffff7e3dca0 <+128>:   mov    $0x1,%edx
    0x00007ffff7e3dca5 <+133>:   mov    $0x81,%esi
    0x00007ffff7e3dcaa <+138>:   mov    $0xca,%eax
    0x00007ffff7e3dcaf <+143>:   lea    0x17e692(%rip),%rdi        # 0x7ffff7fbc348 <lock>
    0x00007ffff7e3dcb6 <+150>:   syscall
    0x00007ffff7e3dcb8 <+152>:   jmp    0x7ffff7e3dc64 <__random+68>
End of assembler dump.

test %eax,%eax will check the single-threaded condition, if in 
single-threaded environments, skips locking, calls __random_r directly, 
and returns the random number.
if in multi-threaded environments, acquires a lock, calls __random_r, 
releases the lock, handles contention, and returns the random number.

on glibc-2.34, disassemble the random function:

(gdb) disassemble random
Dump of assembler code for function __random:
    0x00007ffff7c44db0 <+0>:     endbr64
    0x00007ffff7c44db4 <+4>:     sub    $0x18,%rsp
    0x00007ffff7c44db8 <+8>:     xor    %eax,%eax
    0x00007ffff7c44dba <+10>:    mov    $0x1,%edx
    0x00007ffff7c44dbf <+15>:    lock cmpxchg %edx,0x1b0641(%rip)        # 0x7ffff7df5408 <lock>
    0x00007ffff7c44dc7 <+23>:    jne    0x7ffff7c44df8 <__random+72>
    0x00007ffff7c44dc9 <+25>:    lea    0xc(%rsp),%rsi
    0x00007ffff7c44dce <+30>:    lea    0x1ada6b(%rip),%rdi        # 0x7ffff7df2840 <unsafe_state>
    0x00007ffff7c44dd5 <+37>:    call   0x7ffff7c451e0 <__random_r>
    0x00007ffff7c44dda <+42>:    xor    %eax,%eax
    0x00007ffff7c44ddc <+44>:    xchg   %eax,0x1b0626(%rip)        # 0x7ffff7df5408 <lock>
    0x00007ffff7c44de2 <+50>:    cmp    $0x1,%eax
    0x00007ffff7c44de5 <+53>:    jg     0x7ffff7c44e10 <__random+96>
    0x00007ffff7c44de7 <+55>:    movslq 0xc(%rsp),%rax
    0x00007ffff7c44dec <+60>:    add    $0x18,%rsp
    0x00007ffff7c44df0 <+64>:    ret
    0x00007ffff7c44df1 <+65>:    nopl   0x0(%rax)
    0x00007ffff7c44df8 <+72>:    lea    0x1b0609(%rip),%rdi        # 0x7ffff7df5408 <lock>
    0x00007ffff7c44dff <+79>:    call   0x7ffff7c88680 <__GI___lll_lock_wait_private>
    0x00007ffff7c44e04 <+84>:    jmp    0x7ffff7c44dc9 <__random+25>
    0x00007ffff7c44e06 <+86>:    cs nopw 0x0(%rax,%rax,1)
    0x00007ffff7c44e10 <+96>:    lea    0x1b05f1(%rip),%rdi        # 0x7ffff7df5408 <lock>
    0x00007ffff7c44e17 <+103>:   call   0x7ffff7c88750 <__GI___lll_lock_wake_private>
    0x00007ffff7c44e1c <+108>:   movslq 0xc(%rsp),%rax
    0x00007ffff7c44e21 <+113>:   add    $0x18,%rsp
    0x00007ffff7c44e25 <+117>:   ret
End of assembler dump.

Both single-threaded and multi-threaded are using atomic operations and 
lock management.
So cause the performance of single-threaded cases degradation.

The fix has been included on master branch on Feb 24, 2025.
https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f 
<https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f>

after the fix, disassemble the random function:

(gdb) disassemble random
Dump of assembler code for function __random:
    0x00007ffff7e1d0b0 <+0>:     endbr64
    0x00007ffff7e1d0b4 <+4>:     sub    $0x18,%rsp
    0x00007ffff7e1d0b8 <+8>:     cmpb   $0x0,0x199221(%rip)        # 0x7ffff7fb62e0 <__libc_single_threaded_internal>
    0x00007ffff7e1d0bf <+15>:    jne    0x7ffff7e1d100 <__random+80>
    0x00007ffff7e1d0c1 <+17>:    xor    %eax,%eax
    0x00007ffff7e1d0c3 <+19>:    mov    $0x1,%edx
    0x00007ffff7e1d0c8 <+24>:    lock cmpxchg %edx,0x1935c0(%rip)        # 0x7ffff7fb0690 <lock>
    0x00007ffff7e1d0d0 <+32>:    jne    0x7ffff7e1d130 <__random+128>
    0x00007ffff7e1d0d2 <+34>:    lea    0xc(%rsp),%rsi
    0x00007ffff7e1d0d7 <+39>:    lea    0x1917a2(%rip),%rdi        # 0x7ffff7fae880 <unsafe_state>
    0x00007ffff7e1d0de <+46>:    call   0x7ffff7e1d500 <__random_r>
    0x00007ffff7e1d0e3 <+51>:    xor    %eax,%eax
    0x00007ffff7e1d0e5 <+53>:    xchg   %eax,0x1935a5(%rip)        # 0x7ffff7fb0690 <lock>
    0x00007ffff7e1d0eb <+59>:    cmp    $0x1,%eax
    0x00007ffff7e1d0ee <+62>:    jg     0x7ffff7e1d120 <__random+112>
    0x00007ffff7e1d0f0 <+64>:    movslq 0xc(%rsp),%rax
    0x00007ffff7e1d0f5 <+69>:    add    $0x18,%rsp
    0x00007ffff7e1d0f9 <+73>:    ret
    0x00007ffff7e1d0fa <+74>:    nopw   0x0(%rax,%rax,1)
    0x00007ffff7e1d100 <+80>:    lea    0xc(%rsp),%rsi
    0x00007ffff7e1d105 <+85>:    lea    0x191774(%rip),%rdi        # 0x7ffff7fae880 <unsafe_state>
    0x00007ffff7e1d10c <+92>:    call   0x7ffff7e1d500 <__random_r>
    0x00007ffff7e1d111 <+97>:    movslq 0xc(%rsp),%rax
    0x00007ffff7e1d116 <+102>:   add    $0x18,%rsp
    0x00007ffff7e1d11a <+106>:   ret
    0x00007ffff7e1d11b <+107>:   nopl   0x0(%rax,%rax,1)
    0x00007ffff7e1d120 <+112>:   lea    0x193569(%rip),%rdi        # 0x7ffff7fb0690 <lock>
    0x00007ffff7e1d127 <+119>:   call   0x7ffff7e60360 <__GI___lll_lock_wake_private>
    0x00007ffff7e1d12c <+124>:   jmp    0x7ffff7e1d0f0 <__random+64>
    0x00007ffff7e1d12e <+126>:   xchg   %ax,%ax
    0x00007ffff7e1d130 <+128>:   lea    0x193559(%rip),%rdi        # 0x7ffff7fb0690 <lock>
    0x00007ffff7e1d137 <+135>:   call   0x7ffff7e60290 <__GI___lll_lock_wait_private>
    0x00007ffff7e1d13c <+140>:   jmp    0x7ffff7e1d0d2 <__random+34>
End of assembler dump.

cmpb $0x0,0x199221(%rip) will check single-threaded mode, which avoids 
unnecessary locking.

BTW, the issue can not reproduce on modern CPU (such as Zen3/Ryzen9), 
they are smart to handle the lock for single-thread case.


>
> Steve
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#214675):https://lists.openembedded.org/g/openembedded-core/message/214675
> Mute This Topic:https://lists.openembedded.org/mt/112161542/3616765
> Group Owner:openembedded-core+owner@lists.openembedded.org
> Unsubscribe:https://lists.openembedded.org/g/openembedded-core/unsub [randy.macleod@windriver.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch b/meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch
new file mode 100644
index 0000000000..736fc51f38
--- /dev/null
+++ b/meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch
@@ -0,0 +1,47 @@ 
+From 4f54b0dfc16dbe0df86afccb90e447df5f7f571e Mon Sep 17 00:00:00 2001
+From: Wilco Dijkstra <wilco.dijkstra@arm.com>
+Date: Mon, 18 Mar 2024 15:18:20 +0000
+Subject: [PATCH] stdlib: Add single-threaded fast path to rand()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Improve performance of rand() and __random() by adding a single-threaded
+fast path.  Bench-random-lock shows about 5x speedup on Neoverse V1.
+
+Upstream-Status: Backport [be0cfd848d9ad7378800d6302bc11467cf2b514f]
+
+Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
+Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com>
+---
+ stdlib/random.c | 7 +++++++
+ 1 file changed, 7 insertions(+)
+
+diff --git a/stdlib/random.c b/stdlib/random.c
+index 17cc61ba8f55..5d482a857065 100644
+--- a/stdlib/random.c
++++ b/stdlib/random.c
+@@ -51,6 +51,7 @@
+    SUCH DAMAGE.*/
+ 
+ #include <libc-lock.h>
++#include <sys/single_threaded.h>
+ #include <limits.h>
+ #include <stddef.h>
+ #include <stdlib.h>
+@@ -288,6 +289,12 @@ __random (void)
+ {
+   int32_t retval;
+ 
++  if (SINGLE_THREAD_P)
++    {
++      (void) __random_r (&unsafe_state, &retval);
++      return retval;
++    }
++
+   __libc_lock_lock (lock);
+ 
+   (void) __random_r (&unsafe_state, &retval);
+-- 
+2.34.1
+
diff --git a/meta/recipes-core/glibc/glibc_2.35.bb b/meta/recipes-core/glibc/glibc_2.35.bb
index d9cae79ac2..9073e04537 100644
--- a/meta/recipes-core/glibc/glibc_2.35.bb
+++ b/meta/recipes-core/glibc/glibc_2.35.bb
@@ -65,6 +65,7 @@  SRC_URI =  "${GLIBC_GIT_URI};branch=${SRCBRANCH};name=glibc \
            file://0001-Revert-Linux-Implement-a-useful-version-of-_startup_.patch \
            file://0002-get_nscd_addresses-Fix-subscript-typos-BZ-29605.patch \
            file://0003-sunrpc-suppress-gcc-os-warning-on-user2netname.patch \
+           file://0001-stdlib-Add-single-threaded-fast-path-to-rand.patch \
            "
 S = "${WORKDIR}/git"
 B = "${WORKDIR}/build-${TARGET_SYS}"