Message ID | 20250702204758.2773339-1-ecordonnier@snap.com |
---|---|
State | New |
Headers | show |
Series | uihelper: Fix KeyError race condition in pidmap access | expand |
On Wed, 2025-07-02 at 22:47 +0200, Etienne Cordonnier via lists.openembedded.org wrote: > From: Etienne Cordonnier <ecordonnier@snap.com> > > I'm seeing random build errors on scarthgap 5.0.10 because the pid is not contained in pidmap. > Fix the error by ensure the PID exists in pidmap before accessing it. > > Call-stack of the error: > > ``` > Traceback (most recent call last): > File "poky/bitbake/lib/bb/ui/knotty.py", line 681, in main > helper.eventHandler(event) > File "poky/bitbake/lib/bb/ui/uihelper.py", line 43, in eventHandler > removetid(event.pid, tid) > File "poky/bitbake/lib/bb/ui/uihelper.py", line 28, in removetid > if self.pidmap[pid] == tid: > KeyError: 21041 > WARNING: Exiting due to interrupt. > ``` The code was designed such that this should not happen. If it is happening, it suggests there is some other issue going on with the task reference handling and the patch is just hiding some other underlying problem. Is there any way to reproduce the issue? Have you other patches applied to bitbake? Cheers, Richard
Hi Richard, I have not yet found a way to reproduce the issue. I'm seeing the issue in 2 from 50 builds after updating from 5.0.9 to 5.0.10, on a build server with 60 cores, so that's about 4% failure rate. I've never seen the error before the update. There are no other patches applied to bitbake, however there are many other layers used so it may not be reproducible on stand-alone poky. I'll monitor and try to find a way to reproduce. Étienne On Thu, Jul 3, 2025 at 8:42 AM Richard Purdie < richard.purdie@linuxfoundation.org> wrote: > On Wed, 2025-07-02 at 22:47 +0200, Etienne Cordonnier via > lists.openembedded.org wrote: > > From: Etienne Cordonnier <ecordonnier@snap.com> > > > > I'm seeing random build errors on scarthgap 5.0.10 because the pid is > not contained in pidmap. > > Fix the error by ensure the PID exists in pidmap before accessing it. > > > > Call-stack of the error: > > > > ``` > > Traceback (most recent call last): > > File "poky/bitbake/lib/bb/ui/knotty.py", line 681, in main > > helper.eventHandler(event) > > File "poky/bitbake/lib/bb/ui/uihelper.py", line 43, in eventHandler > > removetid(event.pid, tid) > > File "poky/bitbake/lib/bb/ui/uihelper.py", line 28, in removetid > > if self.pidmap[pid] == tid: > > KeyError: 21041 > > WARNING: Exiting due to interrupt. > > ``` > > The code was designed such that this should not happen. If it is > happening, it suggests there is some other issue going on with the task > reference handling and the patch is just hiding some other underlying > problem. > > Is there any way to reproduce the issue? Have you other patches applied > to bitbake? > > Cheers, > > Richard >
On Thu, 2025-07-03 at 10:03 +0200, Etienne Cordonnier wrote: > Hi Richard, > > I have not yet found a way to reproduce the issue. I'm seeing the > issue in 2 from 50 builds after updating from 5.0.9 to 5.0.10, on a > build server with 60 cores, so that's about 4% failure rate. I've > never seen the error before the update. There are no other patches > applied to bitbake, however there are many other layers used so it > may not be reproducible on stand-alone poky. I'll monitor and try to > find a way to reproduce. Is there any pattern such as only interrupted builds (Ctrl+C?) or build with failing tasks? Any particular kinds of failures? Just trying to narrow down the possible causes and give ideas... Cheers, Richard
diff --git a/lib/bb/ui/uihelper.py b/lib/bb/ui/uihelper.py index e6983bd55..0bc526ca1 100644 --- a/lib/bb/ui/uihelper.py +++ b/lib/bb/ui/uihelper.py @@ -25,7 +25,7 @@ class BBUIHelper: def removetid(pid, tid): self.running_pids.remove(tid) del self.running_tasks[tid] - if self.pidmap[pid] == tid: + if pid in self.pidmap and self.pidmap[pid] == tid: del self.pidmap[pid] self.needUpdate = True