diff mbox series

[scarthgap,2.8,1/3] runqueue: Fix performance of multiconfigs with large overlap

Message ID db083cfe9e33c9fd7ffeead7b8c6023a5d581976.1733344286.git.steve@sakoman.com
State New
Headers show
Series [scarthgap,2.8,1/3] runqueue: Fix performance of multiconfigs with large overlap | expand

Commit Message

Steve Sakoman Dec. 4, 2024, 8:33 p.m. UTC
From: Richard Purdie <richard.purdie@linuxfoundation.org>

There have been complaints about the performance of large multiconfig builds
for a while. The key missing data point was that the builds needed to have large
overlaps in sstate objects. This can be simulated by building the same things with
just different TMPDIRs. In runqueue/bitbake terms this equates to large numbers of
deferred tasks.

The issue is that the expensive checks in the setscene loop were hit every time
through runqueue's execute function before the check on deferred tasks. This leads
to task execution starvation as that only happens once per iteration.

Move the skip check earlier in the function which speeds things up enormously
and should improve performance of such builds for users.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 9c6c506757f2b3e28c8b20513b45da6b4659c95f)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/runqueue.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Jeroen Hofstee Dec. 23, 2024, 8:02 a.m. UTC | #1
Hi,

Op 04-12-2024 om 21:33 schreef Steve Sakoman:
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>
> There have been complaints about the performance of large multiconfig builds
> for a while. The key missing data point was that the builds needed to have large
> overlaps in sstate objects. This can be simulated by building the same things with
> just different TMPDIRs. In runqueue/bitbake terms this equates to large numbers of
> deferred tasks.
>
[..]
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit 9c6c506757f2b3e28c8b20513b45da6b4659c95f)
> Signed-off-by: Steve Sakoman <steve@sakoman.com>

Our CI build uses a multiconfig setup to build for 10 machines with 3
different archs. On scarthgap with bitbake 2.8, that resulted in a
really slow build:

NOTE: Executing Tasks
Bitbake still alive (no events for 600s). Active tasks:
Bitbake still alive (no events for 1200s). Active tasks:
Bitbake still alive (no events for 1800s). Active tasks:
Bitbake still alive (no events for 2400s). Active tasks:
Bitbake still alive (no events for 3000s). Active tasks:
Bitbake still alive (no events for 3600s). Active tasks:
Bitbake still alive (no events for 4200s). Active tasks:
NOTE: Running task 1 of 79788 ...
[...]

After almost 14 hours it was at 'Running task 8881 of 79788', and I
canceled the job. The cooker was running at 100% cpu, but there were
hardly any bakers running.

With this patch cherry-picked it builds successfully in 3h and 20min.
(I did add INHERIT:remove = "create-spdx", since it triggers an sstate
error causing the build to fail)

So this makes a huge difference, at least for a setup like ours.

Regards,
Jeroen
diff mbox series

Patch

diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
index 93079a977..744542b08 100644
--- a/lib/bb/runqueue.py
+++ b/lib/bb/runqueue.py
@@ -2195,6 +2195,9 @@  class RunQueueExecute:
             # Find the next setscene to run
             for nexttask in self.sorted_setscene_tids:
                 if nexttask in self.sq_buildable and nexttask not in self.sq_running and self.sqdata.stamps[nexttask] not in self.build_stamps.values() and nexttask not in self.sq_harddep_deferred:
+                    if nexttask in self.sq_deferred and self.sq_deferred[nexttask] not in self.runq_complete:
+                        # Skip deferred tasks quickly before the 'expensive' tests below - this is key to performant multiconfig builds
+                        continue
                     if nexttask not in self.sqdata.unskippable and self.sqdata.sq_revdeps[nexttask] and \
                             nexttask not in self.sq_needed_harddeps and \
                             self.sqdata.sq_revdeps[nexttask].issubset(self.scenequeue_covered) and \
@@ -2224,8 +2227,7 @@  class RunQueueExecute:
                         if t in self.runq_running and t not in self.runq_complete:
                             continue
                     if nexttask in self.sq_deferred:
-                        if self.sq_deferred[nexttask] not in self.runq_complete:
-                            continue
+                        # Deferred tasks that were still deferred were skipped above so we now need to process
                         logger.debug("Task %s no longer deferred" % nexttask)
                         del self.sq_deferred[nexttask]
                         valid = self.rq.validate_hashes(set([nexttask]), self.cooker.data, 0, False, summary=False)