Message ID | 19fcc012160c6b8782be0e6fc54797c88bf084ba.1744145328.git.steve@sakoman.com |
---|---|
State | RFC |
Delegated to: | Steve Sakoman |
Headers | show |
Series | [kirkstone,01/10] curl: ignore CVE-2025-0725 | expand |
On Tue, 2025-04-08 at 13:51 -0700, Steve Sakoman via lists.openembedded.org wrote: > From: Haixiao Yan <haixiao.yan.cn@windriver.com> > > Backport a patch [1] to improve performance of rand() and > __random()[2] > by adding a single-threaded fast path. > > [1] > https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=32777 > Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com> > Signed-off-by: Steve Sakoman <steve@sakoman.com> > --- > ...dd-single-threaded-fast-path-to-rand.patch | 47 > +++++++++++++++++++ > meta/recipes-core/glibc/glibc_2.35.bb | 1 + > 2 files changed, 48 insertions(+) > create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add- > single-threaded-fast-path-to-rand.patch This isn't in walnascar yet. Cheers, Richard
On Thu, Apr 10, 2025 at 4:33 AM Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > > On Tue, 2025-04-08 at 13:51 -0700, Steve Sakoman via > lists.openembedded.org wrote: > > From: Haixiao Yan <haixiao.yan.cn@windriver.com> > > > > Backport a patch [1] to improve performance of rand() and > > __random()[2] > > by adding a single-threaded fast path. > > > > [1] > > https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f > > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=32777 > > Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com> > > Signed-off-by: Steve Sakoman <steve@sakoman.com> > > --- > > ...dd-single-threaded-fast-path-to-rand.patch | 47 > > +++++++++++++++++++ > > meta/recipes-core/glibc/glibc_2.35.bb | 1 + > > 2 files changed, 48 insertions(+) > > create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add- > > single-threaded-fast-path-to-rand.patch > > This isn't in walnascar yet. I've got it and "qemu 8.2.7: ignore CVE-2023-1386" in my local walnascar branch. I will hold off merging to all stable branches till after walnascar release. Thanks for catching this! Steve
On 2025-04-10 9:49 a.m., Steve Sakoman via lists.openembedded.org wrote: > On Thu, Apr 10, 2025 at 4:33 AM Richard Purdie > <richard.purdie@linuxfoundation.org> wrote: >> On Tue, 2025-04-08 at 13:51 -0700, Steve Sakoman via >> lists.openembedded.org wrote: >>> From: Haixiao Yan<haixiao.yan.cn@windriver.com> >>> >>> Backport a patch [1] to improve performance of rand() and >>> __random()[2] >>> by adding a single-threaded fast path. >>> >>> [1] >>> https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f >>> [2]https://sourceware.org/bugzilla/show_bug.cgi?id=32777 >>> Signed-off-by: Haixiao Yan<haixiao.yan.cn@windriver.com> >>> Signed-off-by: Steve Sakoman<steve@sakoman.com> >>> --- >>> ...dd-single-threaded-fast-path-to-rand.patch | 47 >>> +++++++++++++++++++ >>> meta/recipes-core/glibc/glibc_2.35.bb | 1 + >>> 2 files changed, 48 insertions(+) >>> create mode 100644 meta/recipes-core/glibc/glibc/0001-stdlib-Add- >>> single-threaded-fast-path-to-rand.patch >> This isn't in walnascar yet. > I've got it and "qemu 8.2.7: ignore CVE-2023-1386" in my local > walnascar branch. I will hold off merging to all stable branches till > after walnascar release. > > Thanks for catching this! People were discussing this bug wrt adding it to walnascar's RC2 bulid. The conclusion is to merge it to walnascar *after* GA so it can soak for a bit and to minimize risk to the GA release. This is a customer issue and they have been happy with the fix since March 28th but it was only merged to our product repo on April 2nd so it's probably a good idea to let it soak. I'm not sure if it'll help but below is Haixiao's summary from our internal bug tracker. It's a verbose version of the master branch commit log. One notable comment is: the issue can not reproduce on modern CPU (such as Zen3/Ryzen9), they are smart to handle the lock for single-thread case. I wonder how many other 'modern' targets avoid the issue at build / run time. If we want more info, please rely and Haixiao can spend some time on digging a bit more. ../Randy Summary: This is a generic issue, not only found on wrLinux, also met on ubuntu 22.04. The degradation since from glibc-2.34, until the latest glibc-2.41 is affected. The fix will be included in glibc-2.42. I have filed https://sourceware.org/bugzilla/show_bug.cgi?id=32777 <https://sourceware.org/bugzilla/show_bug.cgi?id=32777> to the upstream. The first bad commit is 60d5e40ab200033a982a9fd7594a1f83dcdb94a0. commit 60d5e40ab200033a982a9fd7594a1f83dcdb94a0 Author: Florian Weimer<fweimer@redhat.com> Date: Wed Apr 21 19:49:51 2021 +0200 x86: Remove low-level lock optimization The current approach is to do this optimizations at a higher level, in generic code, so that single-threaded cases can be specifically targeted. Furthermore, using IS_IN (libc) as a compile-time indicator that all locks are private is no longer correct once process-shared lock implementations are moved into libc. The generic <lowlevellock.h> is not compatible with assembler code (obviously), so it's necessary to remove two long-unused #includes. Reviewed-by: Adhemerval Zanella<adhemerval.zanella@linaro.org> This commit remove the optimization for single-threaded case, such as mallocrandfree. From the perf result, random function has been called significantly increased from 15.5% to 42.28%. on glibc-2.33, disassemble the random function: (gdb) disassemble random Dump of assembler code for function __random: 0x00007ffff7e3dc20 <+0>: endbr64 0x00007ffff7e3dc24 <+4>: sub $0x18,%rsp 0x00007ffff7e3dc28 <+8>: mov %fs:0x18,%eax 0x00007ffff7e3dc30 <+16>: test %eax,%eax 0x00007ffff7e3dc32 <+18>: jne 0x7ffff7e3dc70 <__random+80> 0x00007ffff7e3dc34 <+20>: mov $0x1,%edx 0x00007ffff7e3dc39 <+25>: cmpxchg %edx,0x17e708(%rip) # 0x7ffff7fbc348 <lock> 0x00007ffff7e3dc40 <+32>: lea 0xc(%rsp),%rsi 0x00007ffff7e3dc45 <+37>: lea 0x17bb14(%rip),%rdi # 0x7ffff7fb9760 <unsafe_state> 0x00007ffff7e3dc4c <+44>: call 0x7ffff7e3e070 <__random_r> 0x00007ffff7e3dc51 <+49>: mov %fs:0x18,%eax 0x00007ffff7e3dc59 <+57>: test %eax,%eax 0x00007ffff7e3dc5b <+59>: jne 0x7ffff7e3dc90 <__random+112> 0x00007ffff7e3dc5d <+61>: subl $0x1,0x17e6e4(%rip) # 0x7ffff7fbc348 <lock> 0x00007ffff7e3dc64 <+68>: movslq 0xc(%rsp),%rax 0x00007ffff7e3dc69 <+73>: add $0x18,%rsp 0x00007ffff7e3dc6d <+77>: ret 0x00007ffff7e3dc6e <+78>: xchg %ax,%ax 0x00007ffff7e3dc70 <+80>: xor %eax,%eax 0x00007ffff7e3dc72 <+82>: mov $0x1,%edx 0x00007ffff7e3dc77 <+87>: lock cmpxchg %edx,0x17e6c9(%rip) # 0x7ffff7fbc348 <lock> 0x00007ffff7e3dc7f <+95>: je 0x7ffff7e3dc40 <__random+32> 0x00007ffff7e3dc81 <+97>: lea 0x17e6c0(%rip),%rdi # 0x7ffff7fbc348 <lock> 0x00007ffff7e3dc88 <+104>: call 0x7ffff7e81c00 <__lll_lock_wait_private> 0x00007ffff7e3dc8d <+109>: jmp 0x7ffff7e3dc40 <__random+32> 0x00007ffff7e3dc8f <+111>: nop 0x00007ffff7e3dc90 <+112>: xor %eax,%eax 0x00007ffff7e3dc92 <+114>: xchg %eax,0x17e6b0(%rip) # 0x7ffff7fbc348 <lock> 0x00007ffff7e3dc98 <+120>: cmp $0x1,%eax 0x00007ffff7e3dc9b <+123>: jle 0x7ffff7e3dc64 <__random+68> 0x00007ffff7e3dc9d <+125>: xor %r10d,%r10d 0x00007ffff7e3dca0 <+128>: mov $0x1,%edx 0x00007ffff7e3dca5 <+133>: mov $0x81,%esi 0x00007ffff7e3dcaa <+138>: mov $0xca,%eax 0x00007ffff7e3dcaf <+143>: lea 0x17e692(%rip),%rdi # 0x7ffff7fbc348 <lock> 0x00007ffff7e3dcb6 <+150>: syscall 0x00007ffff7e3dcb8 <+152>: jmp 0x7ffff7e3dc64 <__random+68> End of assembler dump. test %eax,%eax will check the single-threaded condition, if in single-threaded environments, skips locking, calls __random_r directly, and returns the random number. if in multi-threaded environments, acquires a lock, calls __random_r, releases the lock, handles contention, and returns the random number. on glibc-2.34, disassemble the random function: (gdb) disassemble random Dump of assembler code for function __random: 0x00007ffff7c44db0 <+0>: endbr64 0x00007ffff7c44db4 <+4>: sub $0x18,%rsp 0x00007ffff7c44db8 <+8>: xor %eax,%eax 0x00007ffff7c44dba <+10>: mov $0x1,%edx 0x00007ffff7c44dbf <+15>: lock cmpxchg %edx,0x1b0641(%rip) # 0x7ffff7df5408 <lock> 0x00007ffff7c44dc7 <+23>: jne 0x7ffff7c44df8 <__random+72> 0x00007ffff7c44dc9 <+25>: lea 0xc(%rsp),%rsi 0x00007ffff7c44dce <+30>: lea 0x1ada6b(%rip),%rdi # 0x7ffff7df2840 <unsafe_state> 0x00007ffff7c44dd5 <+37>: call 0x7ffff7c451e0 <__random_r> 0x00007ffff7c44dda <+42>: xor %eax,%eax 0x00007ffff7c44ddc <+44>: xchg %eax,0x1b0626(%rip) # 0x7ffff7df5408 <lock> 0x00007ffff7c44de2 <+50>: cmp $0x1,%eax 0x00007ffff7c44de5 <+53>: jg 0x7ffff7c44e10 <__random+96> 0x00007ffff7c44de7 <+55>: movslq 0xc(%rsp),%rax 0x00007ffff7c44dec <+60>: add $0x18,%rsp 0x00007ffff7c44df0 <+64>: ret 0x00007ffff7c44df1 <+65>: nopl 0x0(%rax) 0x00007ffff7c44df8 <+72>: lea 0x1b0609(%rip),%rdi # 0x7ffff7df5408 <lock> 0x00007ffff7c44dff <+79>: call 0x7ffff7c88680 <__GI___lll_lock_wait_private> 0x00007ffff7c44e04 <+84>: jmp 0x7ffff7c44dc9 <__random+25> 0x00007ffff7c44e06 <+86>: cs nopw 0x0(%rax,%rax,1) 0x00007ffff7c44e10 <+96>: lea 0x1b05f1(%rip),%rdi # 0x7ffff7df5408 <lock> 0x00007ffff7c44e17 <+103>: call 0x7ffff7c88750 <__GI___lll_lock_wake_private> 0x00007ffff7c44e1c <+108>: movslq 0xc(%rsp),%rax 0x00007ffff7c44e21 <+113>: add $0x18,%rsp 0x00007ffff7c44e25 <+117>: ret End of assembler dump. Both single-threaded and multi-threaded are using atomic operations and lock management. So cause the performance of single-threaded cases degradation. The fix has been included on master branch on Feb 24, 2025. https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f <https://sourceware.org/git/?p=glibc.git;a=commit;h=be0cfd848d9ad7378800d6302bc11467cf2b514f> after the fix, disassemble the random function: (gdb) disassemble random Dump of assembler code for function __random: 0x00007ffff7e1d0b0 <+0>: endbr64 0x00007ffff7e1d0b4 <+4>: sub $0x18,%rsp 0x00007ffff7e1d0b8 <+8>: cmpb $0x0,0x199221(%rip) # 0x7ffff7fb62e0 <__libc_single_threaded_internal> 0x00007ffff7e1d0bf <+15>: jne 0x7ffff7e1d100 <__random+80> 0x00007ffff7e1d0c1 <+17>: xor %eax,%eax 0x00007ffff7e1d0c3 <+19>: mov $0x1,%edx 0x00007ffff7e1d0c8 <+24>: lock cmpxchg %edx,0x1935c0(%rip) # 0x7ffff7fb0690 <lock> 0x00007ffff7e1d0d0 <+32>: jne 0x7ffff7e1d130 <__random+128> 0x00007ffff7e1d0d2 <+34>: lea 0xc(%rsp),%rsi 0x00007ffff7e1d0d7 <+39>: lea 0x1917a2(%rip),%rdi # 0x7ffff7fae880 <unsafe_state> 0x00007ffff7e1d0de <+46>: call 0x7ffff7e1d500 <__random_r> 0x00007ffff7e1d0e3 <+51>: xor %eax,%eax 0x00007ffff7e1d0e5 <+53>: xchg %eax,0x1935a5(%rip) # 0x7ffff7fb0690 <lock> 0x00007ffff7e1d0eb <+59>: cmp $0x1,%eax 0x00007ffff7e1d0ee <+62>: jg 0x7ffff7e1d120 <__random+112> 0x00007ffff7e1d0f0 <+64>: movslq 0xc(%rsp),%rax 0x00007ffff7e1d0f5 <+69>: add $0x18,%rsp 0x00007ffff7e1d0f9 <+73>: ret 0x00007ffff7e1d0fa <+74>: nopw 0x0(%rax,%rax,1) 0x00007ffff7e1d100 <+80>: lea 0xc(%rsp),%rsi 0x00007ffff7e1d105 <+85>: lea 0x191774(%rip),%rdi # 0x7ffff7fae880 <unsafe_state> 0x00007ffff7e1d10c <+92>: call 0x7ffff7e1d500 <__random_r> 0x00007ffff7e1d111 <+97>: movslq 0xc(%rsp),%rax 0x00007ffff7e1d116 <+102>: add $0x18,%rsp 0x00007ffff7e1d11a <+106>: ret 0x00007ffff7e1d11b <+107>: nopl 0x0(%rax,%rax,1) 0x00007ffff7e1d120 <+112>: lea 0x193569(%rip),%rdi # 0x7ffff7fb0690 <lock> 0x00007ffff7e1d127 <+119>: call 0x7ffff7e60360 <__GI___lll_lock_wake_private> 0x00007ffff7e1d12c <+124>: jmp 0x7ffff7e1d0f0 <__random+64> 0x00007ffff7e1d12e <+126>: xchg %ax,%ax 0x00007ffff7e1d130 <+128>: lea 0x193559(%rip),%rdi # 0x7ffff7fb0690 <lock> 0x00007ffff7e1d137 <+135>: call 0x7ffff7e60290 <__GI___lll_lock_wait_private> 0x00007ffff7e1d13c <+140>: jmp 0x7ffff7e1d0d2 <__random+34> End of assembler dump. cmpb $0x0,0x199221(%rip) will check single-threaded mode, which avoids unnecessary locking. BTW, the issue can not reproduce on modern CPU (such as Zen3/Ryzen9), they are smart to handle the lock for single-thread case. > > Steve > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#214675):https://lists.openembedded.org/g/openembedded-core/message/214675 > Mute This Topic:https://lists.openembedded.org/mt/112161542/3616765 > Group Owner:openembedded-core+owner@lists.openembedded.org > Unsubscribe:https://lists.openembedded.org/g/openembedded-core/unsub [randy.macleod@windriver.com] > -=-=-=-=-=-=-=-=-=-=-=- >
diff --git a/meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch b/meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch new file mode 100644 index 0000000000..736fc51f38 --- /dev/null +++ b/meta/recipes-core/glibc/glibc/0001-stdlib-Add-single-threaded-fast-path-to-rand.patch @@ -0,0 +1,47 @@ +From 4f54b0dfc16dbe0df86afccb90e447df5f7f571e Mon Sep 17 00:00:00 2001 +From: Wilco Dijkstra <wilco.dijkstra@arm.com> +Date: Mon, 18 Mar 2024 15:18:20 +0000 +Subject: [PATCH] stdlib: Add single-threaded fast path to rand() +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Improve performance of rand() and __random() by adding a single-threaded +fast path. Bench-random-lock shows about 5x speedup on Neoverse V1. + +Upstream-Status: Backport [be0cfd848d9ad7378800d6302bc11467cf2b514f] + +Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> +Signed-off-by: Haixiao Yan <haixiao.yan.cn@windriver.com> +--- + stdlib/random.c | 7 +++++++ + 1 file changed, 7 insertions(+) + +diff --git a/stdlib/random.c b/stdlib/random.c +index 17cc61ba8f55..5d482a857065 100644 +--- a/stdlib/random.c ++++ b/stdlib/random.c +@@ -51,6 +51,7 @@ + SUCH DAMAGE.*/ + + #include <libc-lock.h> ++#include <sys/single_threaded.h> + #include <limits.h> + #include <stddef.h> + #include <stdlib.h> +@@ -288,6 +289,12 @@ __random (void) + { + int32_t retval; + ++ if (SINGLE_THREAD_P) ++ { ++ (void) __random_r (&unsafe_state, &retval); ++ return retval; ++ } ++ + __libc_lock_lock (lock); + + (void) __random_r (&unsafe_state, &retval); +-- +2.34.1 + diff --git a/meta/recipes-core/glibc/glibc_2.35.bb b/meta/recipes-core/glibc/glibc_2.35.bb index d9cae79ac2..9073e04537 100644 --- a/meta/recipes-core/glibc/glibc_2.35.bb +++ b/meta/recipes-core/glibc/glibc_2.35.bb @@ -65,6 +65,7 @@ SRC_URI = "${GLIBC_GIT_URI};branch=${SRCBRANCH};name=glibc \ file://0001-Revert-Linux-Implement-a-useful-version-of-_startup_.patch \ file://0002-get_nscd_addresses-Fix-subscript-typos-BZ-29605.patch \ file://0003-sunrpc-suppress-gcc-os-warning-on-user2netname.patch \ + file://0001-stdlib-Add-single-threaded-fast-path-to-rand.patch \ " S = "${WORKDIR}/git" B = "${WORKDIR}/build-${TARGET_SYS}"