diff mbox series

uihelper: Fix KeyError race condition in pidmap access

Message ID 20250702204758.2773339-1-ecordonnier@snap.com
State New
Headers show
Series uihelper: Fix KeyError race condition in pidmap access | expand

Commit Message

Etienne Cordonnier July 2, 2025, 8:47 p.m. UTC
From: Etienne Cordonnier <ecordonnier@snap.com>

I'm seeing random build errors on scarthgap 5.0.10 because the pid is not contained in pidmap.
Fix the error by ensure the PID exists in pidmap before accessing it.

Call-stack of the error:

```
Traceback (most recent call last):
  File "poky/bitbake/lib/bb/ui/knotty.py", line 681, in main
    helper.eventHandler(event)
  File "poky/bitbake/lib/bb/ui/uihelper.py", line 43, in eventHandler
    removetid(event.pid, tid)
  File "poky/bitbake/lib/bb/ui/uihelper.py", line 28, in removetid
    if self.pidmap[pid] == tid:
KeyError: 21041
WARNING: Exiting due to interrupt.
```

Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
---
 lib/bb/ui/uihelper.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Purdie July 3, 2025, 6:42 a.m. UTC | #1
On Wed, 2025-07-02 at 22:47 +0200, Etienne Cordonnier via lists.openembedded.org wrote:
> From: Etienne Cordonnier <ecordonnier@snap.com>
> 
> I'm seeing random build errors on scarthgap 5.0.10 because the pid is not contained in pidmap.
> Fix the error by ensure the PID exists in pidmap before accessing it.
> 
> Call-stack of the error:
> 
> ```
> Traceback (most recent call last):
>   File "poky/bitbake/lib/bb/ui/knotty.py", line 681, in main
>     helper.eventHandler(event)
>   File "poky/bitbake/lib/bb/ui/uihelper.py", line 43, in eventHandler
>     removetid(event.pid, tid)
>   File "poky/bitbake/lib/bb/ui/uihelper.py", line 28, in removetid
>     if self.pidmap[pid] == tid:
> KeyError: 21041
> WARNING: Exiting due to interrupt.
> ```

The code was designed such that this should not happen. If it is
happening, it suggests there is some other issue going on with the task
reference handling and the patch is just hiding some other underlying
problem.

Is there any way to reproduce the issue? Have you other patches applied
to bitbake?

Cheers,

Richard
Etienne Cordonnier July 3, 2025, 8:03 a.m. UTC | #2
Hi Richard,

I have not yet found a way to reproduce the issue. I'm seeing the issue in
2 from 50 builds after updating from 5.0.9 to 5.0.10, on a build server
with 60 cores, so that's about 4% failure rate. I've never seen the error
before the update. There are no other patches applied to bitbake, however
there are many other layers used so it may not be reproducible on
stand-alone poky. I'll monitor and try to find a way to reproduce.

Étienne

On Thu, Jul 3, 2025 at 8:42 AM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Wed, 2025-07-02 at 22:47 +0200, Etienne Cordonnier via
> lists.openembedded.org wrote:
> > From: Etienne Cordonnier <ecordonnier@snap.com>
> >
> > I'm seeing random build errors on scarthgap 5.0.10 because the pid is
> not contained in pidmap.
> > Fix the error by ensure the PID exists in pidmap before accessing it.
> >
> > Call-stack of the error:
> >
> > ```
> > Traceback (most recent call last):
> >   File "poky/bitbake/lib/bb/ui/knotty.py", line 681, in main
> >     helper.eventHandler(event)
> >   File "poky/bitbake/lib/bb/ui/uihelper.py", line 43, in eventHandler
> >     removetid(event.pid, tid)
> >   File "poky/bitbake/lib/bb/ui/uihelper.py", line 28, in removetid
> >     if self.pidmap[pid] == tid:
> > KeyError: 21041
> > WARNING: Exiting due to interrupt.
> > ```
>
> The code was designed such that this should not happen. If it is
> happening, it suggests there is some other issue going on with the task
> reference handling and the patch is just hiding some other underlying
> problem.
>
> Is there any way to reproduce the issue? Have you other patches applied
> to bitbake?
>
> Cheers,
>
> Richard
>
Richard Purdie July 3, 2025, 8:52 a.m. UTC | #3
On Thu, 2025-07-03 at 10:03 +0200, Etienne Cordonnier wrote:
> Hi Richard,
> 
> I have not yet found a way to reproduce the issue. I'm seeing the
> issue in 2 from 50 builds after updating from 5.0.9 to 5.0.10, on a
> build server with 60 cores, so that's about 4% failure rate. I've
> never seen the error before the update. There are no other patches
> applied to bitbake, however there are many other layers used so it
> may not be reproducible on stand-alone poky. I'll monitor and try to
> find a way to reproduce.

Is there any pattern such as only interrupted builds (Ctrl+C?) or build
with failing tasks? Any particular kinds of failures? Just trying to
narrow down the possible causes and give ideas...

Cheers,

Richard
diff mbox series

Patch

diff --git a/lib/bb/ui/uihelper.py b/lib/bb/ui/uihelper.py
index e6983bd55..0bc526ca1 100644
--- a/lib/bb/ui/uihelper.py
+++ b/lib/bb/ui/uihelper.py
@@ -25,7 +25,7 @@  class BBUIHelper:
         def removetid(pid, tid):
             self.running_pids.remove(tid)
             del self.running_tasks[tid]
-            if self.pidmap[pid] == tid:
+            if pid in self.pidmap and self.pidmap[pid] == tid:
                 del self.pidmap[pid]
             self.needUpdate = True