From patchwork Mon Mar 3 13:06:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Purdie X-Patchwork-Id: 58193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 401DEC282C5 for ; Mon, 3 Mar 2025 13:06:35 +0000 (UTC) Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) by mx.groups.io with SMTP id smtpd.web11.52382.1741007192369065713 for ; Mon, 03 Mar 2025 05:06:33 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@linuxfoundation.org header.s=google header.b=XTawXsdy; spf=pass (domain: linuxfoundation.org, ip: 209.85.221.53, mailfrom: richard.purdie@linuxfoundation.org) Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-390e88caa4dso1942263f8f.1 for ; Mon, 03 Mar 2025 05:06:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; t=1741007190; x=1741611990; darn=lists.openembedded.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=sA7VBBFD/Ovp/Lk+yD15NL/c2cbd5C90InQLdtyElZU=; b=XTawXsdyj8BIDuPchUB87ugJ2v2gkTlBKENT+bVLVCCDJy2tfy/pOJF5a4oqPdLu0Y MCjIVU3cn9X+yUvdDXpIGHu0veojdA+1hb31PDERCiOJOd99eD0U2Y/sqVR0FbdozD0/ SR2QHpwv17t9rJj7U9h5oq2ckughhZp0VhVfQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741007190; x=1741611990; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sA7VBBFD/Ovp/Lk+yD15NL/c2cbd5C90InQLdtyElZU=; b=qALJHu/fXiOC2VQnwj8EKIzNDEuynxthAWTKHCs8iv6qXWcUNxlt++qJSAlsuzjoO1 NdhJzdbvCAieNBlthVfve1auu/VkVdoxvwsVCPUMzex14+NNCihlpE4Y1vA+h9Zh1hUI q8X3UANmaoJf4Ss6dPCL/VWJpTpI7F9PkpydsrDcyIXISX9K5HTFeJgm4ntxYR1n5oZM XPENPB9P1nv5r7VuYm5A5XbXUnvD1caQAEFX5fCN1D08sDENzuxr3NzYIB/RxS94QHX4 9gF5E11u3zZ8JJHX9wHyJL70KfMhLp2mrvRivN7X/Xq1/hSDvsba96amFHjbwN52K2+/ Qq6w== X-Gm-Message-State: AOJu0YxsLc2hHVX5Od0IuHmFMdNygEwLBqri3kjeIlOhvuqxtbgvctQJ kHJ6KW92s3off/xZnwLaax6Bq+uOF0kKQpiW7ibOCpTsyH/K1XS0OrQGQpj0IxZtZzfq1UHt5f7 x X-Gm-Gg: ASbGncsw+8egXhxwyn2p6/gZIzF2ZkM5Avpgns/A0yhCywoHZCWzfHVeskqzm4vldhP 3+fMc13ZdL7zKIKBid8HKx6mjbXQeMuwAmrVyo8BTjrMH7yKa6jKJvvHqlEKzmfGmi3wCf5Nw89 +huoTtZBayAXenBihfvDbt2Mgcssgawhq7McPsccLg6eUKAj2+W+lh0M4LcH8/fH0r9affJVyms S2e6NkDCMG8AOrFInYYQaaSb0grRQI6Tdyn4H6gGH+QJj7vL58CkjglU2yydzeXqAn4poadMvsH T1hDBrflRAh+JXGtT7zYCWSKKKviq61f4k8b+lysnrAklpQ7iX3Gjgs/LoQxnMX/jRirb5bM X-Google-Smtp-Source: AGHT+IGF/jWH6HNFqLaYLYzfyOGcZex1OrwUWH8LrzxHzGCSfQr489W/eEcFbWcZ/99E60trkJ/IXA== X-Received: by 2002:a5d:5850:0:b0:38d:df83:7142 with SMTP id ffacd0b85a97d-390ec7ce1b1mr10135688f8f.22.1741007190275; Mon, 03 Mar 2025 05:06:30 -0800 (PST) Received: from max.int.rpsys.net ([2001:8b0:aba:5f3c:c70:3037:60d2:90d7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b7a27af83sm160667145e9.30.2025.03.03.05.06.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 05:06:29 -0800 (PST) From: Richard Purdie To: bitbake-devel@lists.openembedded.org Subject: [PATCH 1/4] utils: Print information about lock issue before exiting Date: Mon, 3 Mar 2025 13:06:25 +0000 Message-ID: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Mon, 03 Mar 2025 13:06:35 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/17369 Signed-off-by: Richard Purdie --- lib/bb/utils.py | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/bb/utils.py b/lib/bb/utils.py index 3cb59d8078..da8c20fe95 100644 --- a/lib/bb/utils.py +++ b/lib/bb/utils.py @@ -1884,6 +1884,7 @@ def lock_timeout(lock): held = lock.acquire(timeout=5*60) try: if not held: + bb.server.process.serverlog("Couldn't get the lock for 5 mins, timed out, exiting.\n%s" % traceback.format_stack()) os._exit(1) yield held finally: From patchwork Mon Mar 3 13:06:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Purdie X-Patchwork-Id: 58192 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 435B5C282D1 for ; Mon, 3 Mar 2025 13:06:35 +0000 (UTC) Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by mx.groups.io with SMTP id smtpd.web10.52728.1741007193516770762 for ; Mon, 03 Mar 2025 05:06:33 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@linuxfoundation.org header.s=google header.b=P6PPGaBG; spf=pass (domain: linuxfoundation.org, ip: 209.85.128.53, mailfrom: richard.purdie@linuxfoundation.org) Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-4399ee18a57so27430995e9.1 for ; Mon, 03 Mar 2025 05:06:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; t=1741007191; x=1741611991; darn=lists.openembedded.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=YtYodA6CvQqK9EoDc5hEniTM/frI4m5DZgomhF0lwiw=; b=P6PPGaBGNz+Q3GkQ9En0r8IIYbf1oIFLpIgdLo1PgU3iOVPRSHBXGm/CgJW0md8MiX Dmk6CECHgsu9U6McKcLUTkUmREgg1Yd82+G6CPfqrN2GTdZ8zv2McmVo6mOiKMt/Xk76 7qQD7cFg+dh14ko8ZX7vTs0O/DQs7lP/ASRmk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741007191; x=1741611991; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YtYodA6CvQqK9EoDc5hEniTM/frI4m5DZgomhF0lwiw=; b=hmvG/I6FHUC0B1u0zvMVOLrwf0X2jMzRREz/nTQcEl+xIx5jssbCCu0gp0la0lwmUV /N6YXnh765z0t6tdpeLo5eyesRhMOCmSsyWPRexAx/IBpw308o2IkGHW53hxBIrcT/QS /c/8pw5l5YEY0/2bKs8iMrC4u1f9ldhJciz7M2pWvrXjnbtUyitLE80a9XypnxWPDV/h U07dc8yeuIBE2lSONGbAwcD+JE5s2KV4nR+K+l9e9mqOVpkHKYkrBJsMbLGJ0Xtc59jB L72fHMaqNy/IhPQVXkz7KYZHFd2cBP36qsxFbuuXOszRXYWsmJY3/Bl4v38c7XbGCGYQ ZSrg== X-Gm-Message-State: AOJu0YyTkSnuHfElZgaDVMRsICefTXEl6+ygdb4X4DBoBHMgcKwM/wZ6 JGffNqmbtT2xZeJt5oeRNxHyswCbOBYLyACvBYl+UC09AoQ6MfJKE3mvZW0f8SmdGSAY3CTkVA0 i X-Gm-Gg: ASbGncucWgKBw/NbUuA4e7Qxg/q6yFS9taVq9CK4Hi2FWqHFFnehJLrfrVL+0c0MZeH XxWfX+QPl4vvzX319DgWEr/Mr6JVAgmI4wOdveTA3eaHPYv4WQKyqlYC3k4jbVeculPXJqR4Zi1 BpiuMWg1tnM3N2CkknupgkQuDxeuSkGpL9f+fd1+75OJBGPRK53CYmqjqVBIEfCD1QKr7qeGO51 g6JaPo8gK/SXIa4EyMo1Pj1mQ86xN6zWe8z+2tRj8Ylmgwa2tAn9HCo1iH0GGQneXlyztP7Sroe TT2IJmxFjoVhILQQQEyYL2SjDE+iCA8tvNzuAJ4K5WN8f2fzGRfSZZkY85pW0cghT1i5olej X-Google-Smtp-Source: AGHT+IHIjH3VfMjotlZnG1+MBBZA9bwzihZxkKPN08PYu9ZJtDIJtOx/ITYM/tSDdlywQXLAcZuGeg== X-Received: by 2002:a05:600c:3549:b0:436:1b86:f05 with SMTP id 5b1f17b1804b1-43ba62901admr102135815e9.11.1741007191459; Mon, 03 Mar 2025 05:06:31 -0800 (PST) Received: from max.int.rpsys.net ([2001:8b0:aba:5f3c:c70:3037:60d2:90d7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b7a27af83sm160667145e9.30.2025.03.03.05.06.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 05:06:30 -0800 (PST) From: Richard Purdie To: bitbake-devel@lists.openembedded.org Subject: [PATCH 2/4] utils: Tweak lock_timeout logic Date: Mon, 3 Mar 2025 13:06:26 +0000 Message-ID: <20250303130628.1656131-2-richard.purdie@linuxfoundation.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> References: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Mon, 03 Mar 2025 13:06:35 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/17370 We should really try and take the lock in the try/finally block so that in some rare cases such as badly timed interrupt/signal, we always release the lock. Signed-off-by: Richard Purdie --- lib/bb/utils.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/bb/utils.py b/lib/bb/utils.py index da8c20fe95..992e200641 100644 --- a/lib/bb/utils.py +++ b/lib/bb/utils.py @@ -1881,8 +1881,8 @@ def path_is_descendant(descendant, ancestor): # we exit at some point than hang. 5 minutes with no progress means we're probably deadlocked. @contextmanager def lock_timeout(lock): - held = lock.acquire(timeout=5*60) try: + held = lock.acquire(timeout=5*60) if not held: bb.server.process.serverlog("Couldn't get the lock for 5 mins, timed out, exiting.\n%s" % traceback.format_stack()) os._exit(1) From patchwork Mon Mar 3 13:06:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Purdie X-Patchwork-Id: 58194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4161AC282D2 for ; Mon, 3 Mar 2025 13:06:35 +0000 (UTC) Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by mx.groups.io with SMTP id smtpd.web10.52729.1741007194571769271 for ; Mon, 03 Mar 2025 05:06:34 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@linuxfoundation.org header.s=google header.b=IQgWomd9; spf=pass (domain: linuxfoundation.org, ip: 209.85.128.52, mailfrom: richard.purdie@linuxfoundation.org) Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-43bb6b0b898so11736455e9.1 for ; Mon, 03 Mar 2025 05:06:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; t=1741007192; x=1741611992; darn=lists.openembedded.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=1JChjod1m1BykZONHw8XPSKX3td5N9nklz6VVO7fT14=; b=IQgWomd9njt3pnyU93gYHnJtudOC9hFGNYMb7fvdSZz3nM+BIeQTDmhZhu9hOtOW4l Bq6sJjUZ+8ynfPMGnlixI0G3vLGftizhJnPZfHznfJXLXoqP3sO1QPJDpBssLf1tTx7N 8BPURPKBEqRyjIJxxjBVZXp4ERLmoQ+DTXgZw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741007192; x=1741611992; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1JChjod1m1BykZONHw8XPSKX3td5N9nklz6VVO7fT14=; b=Ymq/QV87xCkE8WQyK1b9VUE/U7mA9FL7J0SIqwCFhhHRzBBbHTmoV6F4609bl7aMzU stlGNVEqRMyYayRMb+pNkOMg3L4bDEBtQE+2pMNnP2LXnspN2LqV1aTDb4is2Me8wn2g 2IvP6jzU8a0dd85wHrlON2ma/E14LVT43TviBlDV3SDnTzsml+KSKOTNm17NfBySb9wT QBu6SLq4MG4C5E8gH7wez9LkMJD2cFgw4+TQG21PkUw6k8dKmszCUh8F8w4EDa3XR54W eZEly8vXVwMV7zZJgtl7gwcl4Iz83m7M59lSRsa7RQymz77MlxOGcFEel26zFWe3hyEY Wukg== X-Gm-Message-State: AOJu0YyO6GmhLETxtJTpFTJm/GM6Z5Ki43t66QLw5R8TBy0gCdlYdfrY b6Ee/2Uxwj3EK2probTByMQJ+9SR/NiOZ7goMT+c/PV5hjmvwxW5BcMww8ukT+qZLUKeyeJjZZ8 M X-Gm-Gg: ASbGncsgcvMk1Pn74hMAc7nLNBk2//D+MoUk9juDeh2c3L94a5HaY28bY9WAsd4ojBF 0bhIJ/BuoN1xOqqSachFygGJBX5zKQ3jhZeyhVlL8RE06ObipjsfEfRiubIbVjAhpHgvfkG/aID UUV1I9T0qwAdkxeM7MWDXod4QTWAmsRlRGYz1enV9Q7/xv73wdJviykaENdPROxVxlL9u5ZD4fR z81J62hIp9Tsp4mEiuWGr6KSVgVd3UpVAcFakjQEaEA3wyV0sfoX8Xkj53fZ9Oq+MfVl4V7fMIl ia3jMoZcCGozDQC55qA+pLmDas3s19AoOxMDpREeK/d4Rh5OXRfKaypoCbW6Z1cCf2UOQniP X-Google-Smtp-Source: AGHT+IGm6orQHS8S4PUeHgYGgKybPp4/UloNYDB7svaXBSz81+bJayGx2v9u0H+ZNDEDGD7pHftV6A== X-Received: by 2002:a05:600c:3b8e:b0:43b:c228:1ec8 with SMTP id 5b1f17b1804b1-43bc2281fecmr23553175e9.1.1741007192621; Mon, 03 Mar 2025 05:06:32 -0800 (PST) Received: from max.int.rpsys.net ([2001:8b0:aba:5f3c:c70:3037:60d2:90d7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b7a27af83sm160667145e9.30.2025.03.03.05.06.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 05:06:31 -0800 (PST) From: Richard Purdie To: bitbake-devel@lists.openembedded.org Subject: [PATCH 3/4] utils: Add signal blocking for lock_timeout Date: Mon, 3 Mar 2025 13:06:27 +0000 Message-ID: <20250303130628.1656131-3-richard.purdie@linuxfoundation.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> References: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Mon, 03 Mar 2025 13:06:35 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/17371 We never want to exit whilst holding these locks as it deadlocks all python threads. Add signal blocking around the lock critical part so a signal shouldn't cause such an exit. Signed-off-by: Richard Purdie --- lib/bb/utils.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/bb/utils.py b/lib/bb/utils.py index 992e200641..3a4b29181e 100644 --- a/lib/bb/utils.py +++ b/lib/bb/utils.py @@ -1882,6 +1882,7 @@ def path_is_descendant(descendant, ancestor): @contextmanager def lock_timeout(lock): try: + s = signal.pthread_sigmask(signal.SIG_BLOCK, signal.valid_signals()) held = lock.acquire(timeout=5*60) if not held: bb.server.process.serverlog("Couldn't get the lock for 5 mins, timed out, exiting.\n%s" % traceback.format_stack()) @@ -1889,3 +1890,4 @@ def lock_timeout(lock): yield held finally: lock.release() + signal.pthread_sigmask(signal.SIG_SETMASK, s) From patchwork Mon Mar 3 13:06:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Purdie X-Patchwork-Id: 58195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 289BCC282C5 for ; Mon, 3 Mar 2025 13:06:45 +0000 (UTC) Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by mx.groups.io with SMTP id smtpd.web10.52731.1741007195795726878 for ; Mon, 03 Mar 2025 05:06:36 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@linuxfoundation.org header.s=google header.b=CROr27qG; spf=pass (domain: linuxfoundation.org, ip: 209.85.128.46, mailfrom: richard.purdie@linuxfoundation.org) Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-43995b907cfso27733065e9.3 for ; Mon, 03 Mar 2025 05:06:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; t=1741007194; x=1741611994; darn=lists.openembedded.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=XhUbSsFhiQIZ2dHbNRUwrvl29VtI7f1iyDPnmh5/Mow=; b=CROr27qG+h+esh5DqMIHA7m3IhSGHGbsmte83oAquXuzeGkOTBoF/516/4VP9OJ35W wrv4PEBS2gzsjFghZuRp8pRHcPTZKH68iWqPhDv15qZ/2ynx1sSosZavpnZLWarEEA2P kj6NaPFUNBfRuUfyZkmqyovOeCx78+8OLrFgc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741007194; x=1741611994; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XhUbSsFhiQIZ2dHbNRUwrvl29VtI7f1iyDPnmh5/Mow=; b=cZSYo85ZakLfPOOzKFkKFGRQ2NpTnWj5gyafenDqYgbS6jJtkpG15V9LeZxL1B+Pjx bff+Oe6rBXgy7htpVSIP1rlVmynubfUYkNypdjU7qbxZNRuxon84+/vVX8tYn41OhycY zTo/mprEHNF9ywFC/UOpydtYRHQMUcGOqkX/qtUA0y9xA/DoBg1RvD9Bv60e3ZKUL5gb bWczZz35Eh9M5zXzv7Y18WtETmq4NUkOgSUiNO7Fz0syox9UTTSazxt6W52q+jiDeLFl l/7ObF7mEuO0PCOp/FyNRLOkCAMxkgzw7GnkLspBYygv9imMgU3M8EjSKUQCxV5ioQ56 DyLA== X-Gm-Message-State: AOJu0YzdYKnfS+p6zNfCxkS/WCdp+uS5WBynfYA5fh+xljf5b51proAU 1RG2K3zbYHY8LVdqzfq+CV1eShXroualobCamfmliLgEIUpenT/F9FzulDKVQprflWzqmr+CwWl t X-Gm-Gg: ASbGncuTKL4n6y/i2B62NBk56HNFNfnzL2TTDAO7kdKOGqrJ1lepyHX6XCgY81RiZnC OpRP9wuYclhK+xmFX9yHrPRS6IsmnUB5bTMp7dxxWTRj8UNQFuslZikAazZpaRgAheIEI2KBG2i 9UrmWHerYy9FJ9CeeINlmt8MPao6dT0BfmZ4MKEZGcviijdRYLqUK2ePm0Q+nRUHKnaNvTvJBnf dGuluz1b7HBqqxIRsznI9NxCyQwo2bnqvvYOA2Sc1B8X0B/ZUfOjEHi/h2Y/QrpdTQepMhXTPPM fY4Os5L4hxIAqY0mv4SSxbDW6XqVh5cPE2R2Ur3xbVYe559tj3+VK1R18A1+ns/j7Hl9JRvV X-Google-Smtp-Source: AGHT+IFpYE46bf++hfJeGIgPVUko/Ri5HbaTldGJIZH+o0jyLrjzyTyuHgGacriD1mrvsXlzqybJRw== X-Received: by 2002:a05:600c:5488:b0:43b:c825:f0de with SMTP id 5b1f17b1804b1-43bc825f236mr13868905e9.12.1741007193743; Mon, 03 Mar 2025 05:06:33 -0800 (PST) Received: from max.int.rpsys.net ([2001:8b0:aba:5f3c:c70:3037:60d2:90d7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b7a27af83sm160667145e9.30.2025.03.03.05.06.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 05:06:32 -0800 (PST) From: Richard Purdie To: bitbake-devel@lists.openembedded.org Subject: [PATCH 4/4] event/utils: Avoid deadlock from lock_timeout() and recursive events Date: Mon, 3 Mar 2025 13:06:28 +0000 Message-ID: <20250303130628.1656131-4-richard.purdie@linuxfoundation.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> References: <20250303130628.1656131-1-richard.purdie@linuxfoundation.org> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Mon, 03 Mar 2025 13:06:45 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/17372 We've been seeing intermittent failures on Ubuntu 22.04 in oe-selftest which were problematic to debug. The failure was inside lock_timeout and once that was identified and the backtrace obtained, the problem becomes clearer: File "X/bitbake/lib/bb/server/process.py", line 466, in idle_thread_internal retval = function(self, data, False) File "X/bitbake/lib/bb/command.py", line 123, in runAsyncCommand self.cooker.updateCache() File "X/bitbake/lib/bb/cooker.py", line 1629, in updateCache self.parser = CookerParser(self, mcfilelist, total_masked) File "X/bitbake/lib/bb/cooker.py", line 2141, in __init__ self.bb_caches = bb.cache.MulticonfigCache(self.cfgbuilder, self.cfghash, cooker.caches_array) File "X/bitbake/lib/bb/cache.py", line 772, in __init__ loaded += c.prepare_cache(progress) File "X/bitbake/lib/bb/cache.py", line 435, in prepare_cache loaded = self.load_cachefile(progress) File "X/bitbake/lib/bb/cache.py", line 516, in load_cachefile progress(cachefile.tell() + previous_progress) File "X/bitbake/lib/bb/cache.py", line 751, in progress bb.event.fire(bb.event.CacheLoadProgress(current_progress, cachesize), File "X/bitbake/lib/bb/event.py", line 234, in fire fire_ui_handlers(event, d) File "X/bitbake/lib/bb/event.py", line 210, in fire_ui_handlers _ui_handlers[h].event.send(event) File "X/bitbake/lib/bb/cooker.py", line 117, in send str_event = codecs.encode(pickle.dumps(event), \'base64\').decode(\'utf-8\') File "/usr/lib/python3.10/asyncio/sslproto.py", line 320, in __del__ _warn(f"unclosed transport {self!r}", ResourceWarning, source=self) File "/usr/lib/python3.10/warnings.py", line 109, in _showwarnmsg sw(msg.message, msg.category, msg.filename, msg.lineno, File "X/bitbake/lib/bb/main.py", line 113, in _showwarning warnlog.warning(s) File "/usr/lib/python3.10/logging/__init__.py", line 1489, in warning self._log(WARNING, msg, args, **kwargs) File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log self.handle(record) File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle self.callHandlers(record) File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers hdlr.handle(record) File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle self.emit(record) File "X/bitbake/lib/bb/event.py", line 778, in emit fire(record, None) File "X/bitbake/lib/bb/event.py", line 234, in fire fire_ui_handlers(event, d) File "X/bitbake/lib/bb/event.py", line 197, in fire_ui_handlers with bb.utils.lock_timeout(_thread_lock): File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__ return next(self.gen) File "X/bitbake/lib/bb/utils.py", line 1888, in lock_timeout bb.server.process.serverlog("Couldn\'t get the lock for 5 mins, timed out, exiting. %s" % traceback.format_stack()) or put in simpler terms, whilst sending an event(), an unrelated warning message happens to be triggered from asyncio: /usr/lib/python3.10/asyncio/sslproto.py:320: ResourceWarning: unclosed transport which triggers a second event() which can't be sent as we're already in the critcal section and already hold the lock. That warning is due to the version of asyncio used on Ubuntu 22.04 with python 3.10 and that comined with timing issues explains why we don't see it on other python versions or distros. We can't handle the second event as the lock is there to serialise the events. Instead, we queue the event and then process the queue later. Add a new version of lock_timeout which allows us to handle the situation more gracefully. Signed-off-by: Richard Purdie --- lib/bb/event.py | 10 +++++++++- lib/bb/utils.py | 15 +++++++++++++++ 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/lib/bb/event.py b/lib/bb/event.py index e9834d01c7..da07a1dd64 100644 --- a/lib/bb/event.py +++ b/lib/bb/event.py @@ -194,7 +194,12 @@ def fire_ui_handlers(event, d): ui_queue.append(event) return - with bb.utils.lock_timeout(_thread_lock): + with bb.utils.lock_timeout_nocheck(_thread_lock) as lock: + if not lock: + # If we can't get the lock, we may be recursively called, queue and return + ui_queue.append(event) + return + errors = [] for h in _ui_handlers: #print "Sending event %s" % event @@ -213,6 +218,9 @@ def fire_ui_handlers(event, d): for h in errors: del _ui_handlers[h] + while ui_queue: + fire_ui_handlers(ui_queue.pop(), d) + def fire(event, d): """Fire off an Event""" diff --git a/lib/bb/utils.py b/lib/bb/utils.py index 3a4b29181e..5486f9599d 100644 --- a/lib/bb/utils.py +++ b/lib/bb/utils.py @@ -1879,6 +1879,9 @@ def path_is_descendant(descendant, ancestor): # If we don't have a timeout of some kind and a process/thread exits badly (for example # OOM killed) and held a lock, we'd just hang in the lock futex forever. It is better # we exit at some point than hang. 5 minutes with no progress means we're probably deadlocked. +# This function can still deadlock python since it can't signal the other threads to exit +# (signals are handled in the main thread) and even os._exit() will wait on non-daemon threads +# to exit. @contextmanager def lock_timeout(lock): try: @@ -1891,3 +1894,15 @@ def lock_timeout(lock): finally: lock.release() signal.pthread_sigmask(signal.SIG_SETMASK, s) + +# A version of lock_timeout without the check that the lock was locked and a shorter timeout +@contextmanager +def lock_timeout_nocheck(lock): + try: + s = signal.pthread_sigmask(signal.SIG_BLOCK, signal.valid_signals()) + l = lock.acquire(timeout=10) + yield l + finally: + if l: + lock.release() + signal.pthread_sigmask(signal.SIG_SETMASK, s)