| Message ID | 20250718135437.247096-1-richard.purdie@linuxfoundation.org |
|---|---|
| State | Accepted, archived |
| Commit | ee5fce67ce35b025c68aa61e2e758903269ee346 |
| Headers | show |
| Series | [1/2] utils: Optimise signal/sigmask performance | expand |
LGTM Reviewed-by: Joshua Watt <JPEWhacker@gmail.com> On Fri, Jul 18, 2025 at 7:54 AM Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote: > > Running "time bitbake -pP idle" with a valid cache shows around 800,000 > calls to enum creation from python's signal.py. We don't care about this > overhead and it adversely affects cache load time quite badly. > > Try and use _signal directly, falling back to signal, which avoids > this overhead we don't need and makes cache loading much faster. > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> > --- > lib/bb/utils.py | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/lib/bb/utils.py b/lib/bb/utils.py > index f688f7dd687..016036dbce4 100644 > --- a/lib/bb/utils.py > +++ b/lib/bb/utils.py > @@ -2226,6 +2226,15 @@ def path_is_descendant(descendant, ancestor): > > return False > > +# Recomputing the sets in signal.py is expensive (bitbake -pP idle) > +# so try and use _signal directly to avoid it > +valid_signals = signal.valid_signals() > +try: > + import _signal > + sigmask = _signal.pthread_sigmask > +except ImportError: > + sigmask = signal.pthread_sigmask > + > # If we don't have a timeout of some kind and a process/thread exits badly (for example > # OOM killed) and held a lock, we'd just hang in the lock futex forever. It is better > # we exit at some point than hang. 5 minutes with no progress means we're probably deadlocked. > @@ -2235,7 +2244,7 @@ def path_is_descendant(descendant, ancestor): > @contextmanager > def lock_timeout(lock): > try: > - s = signal.pthread_sigmask(signal.SIG_BLOCK, signal.valid_signals()) > + s = sigmask(signal.SIG_BLOCK, valid_signals) > held = lock.acquire(timeout=5*60) > if not held: > bb.server.process.serverlog("Couldn't get the lock for 5 mins, timed out, exiting.\n%s" % traceback.format_stack()) > @@ -2243,17 +2252,17 @@ def lock_timeout(lock): > yield held > finally: > lock.release() > - signal.pthread_sigmask(signal.SIG_SETMASK, s) > + sigmask(signal.SIG_SETMASK, s) > > # A version of lock_timeout without the check that the lock was locked and a shorter timeout > @contextmanager > def lock_timeout_nocheck(lock): > l = False > try: > - s = signal.pthread_sigmask(signal.SIG_BLOCK, signal.valid_signals()) > + s = sigmask(signal.SIG_BLOCK, valid_signals) > l = lock.acquire(timeout=10) > yield l > finally: > if l: > lock.release() > - signal.pthread_sigmask(signal.SIG_SETMASK, s) > + sigmask(signal.SIG_SETMASK, s) > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17794): https://lists.openembedded.org/g/bitbake-devel/message/17794 > Mute This Topic: https://lists.openembedded.org/mt/114221231/3616693 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [JPEWhacker@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- >
Hi Richard, > -----Original Message----- > From: bitbake-devel@lists.openembedded.org <bitbake- > devel@lists.openembedded.org> On Behalf Of Richard Purdie via > lists.openembedded.org > Sent: Friday, July 18, 2025 9:55 AM > To: bitbake-devel@lists.openembedded.org > Subject: [bitbake-devel] [PATCH 1/2] utils: Optimise signal/sigmask > performance > > > > External Sender - Use caution opening files, clicking links, or responding to > requests. > > > Running "time bitbake -pP idle" with a valid cache shows around 800,000 calls > to enum creation from python's signal.py. We don't care about this overhead > and it adversely affects cache load time quite badly. > > Try and use _signal directly, falling back to signal, which avoids this overhead > we don't need and makes cache loading much faster. > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> Coincidentally, for the past couple months I have occasionally been playing around with VizTracer (https://github.com/gaogaotiantian/viztracer) for profiling BitBake. Just today (on Scarthgap) while profiling the cache parsing, I also observed the issue with the sigmask. In case anyone is curious, here is what the results look like: https://i.imgur.com/BGdzSe5.png Two questions: 1. Would it be OK to backport this change to Scarthgap? (CC'd Steve as well) 2. Is there any interest in incorporating VizTracer into BitBake? Thanks, Chris
> > -----Original Message----- > > From: bitbake-devel@lists.openembedded.org <bitbake- > > devel@lists.openembedded.org> On Behalf Of Richard Purdie via > > lists.openembedded.org > > Sent: Friday, July 18, 2025 9:55 AM > > To: bitbake-devel@lists.openembedded.org > > Subject: [bitbake-devel] [PATCH 1/2] utils: Optimise signal/sigmask > > performance > > > > > > > > External Sender - Use caution opening files, clicking links, or > > responding to requests. > > > > > > Running "time bitbake -pP idle" with a valid cache shows around > > 800,000 calls to enum creation from python's signal.py. We don't care > > about this overhead and it adversely affects cache load time quite badly. > > > > Try and use _signal directly, falling back to signal, which avoids > > this overhead we don't need and makes cache loading much faster. > > > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> > > Coincidentally, for the past couple months I have occasionally been playing > around with VizTracer (https://github.com/gaogaotiantian/viztracer) for > profiling BitBake. Just today (on Scarthgap) while profiling the cache parsing, I > also observed the issue with the sigmask. In case anyone is curious, here is > what the results look like: https://i.imgur.com/BGdzSe5.png > By the way, at least on Scarthgap, this patch turned what used to be a 30s cache load (for a build with about 7 multiconfigs) into a near-instant process. So thanks :) Thanks, Chris
On 28 Jul 2025, at 17:30, Chris Laplante via lists.openembedded.org <chris.laplante=agilent.com@lists.openembedded.org> wrote:
> ly been playing around with VizTracer (https://github.com/gaogaotiantian/viztracer) for profiling BitBake. Just today (on Scarthgap) while profiling the cache parsing, I also observed the issue with the sigmask. In case anyone is curious, here is what the results look like: https://i.imgur.com/BGdzSe5.png
Me too!
Did you manage to get “viztracer bitbake …” to actually follow all of the forks and children correctly, or were you just wrapping the parse itself with the tracing?
Cheers,
Ross
Hi Ross, > Me too! > > Did you manage to get "viztracer bitbake ..." to actually follow all of the forks > and children correctly, or were you just wrapping the parse itself with the > tracing? Neat! :) I have never gotten the command-line 'viztracer' to work well for bitbake. As I'm guessing you also found, it just can't follow what BitBake is doing despite all the documented support for concurrency. However, you can sort-of get it to "see everything" by manually wrapping the individual parts with unique file names, then post-processing to combine the results into a single file. For example, in bin/bitbake-server, I just wrap the entire file with: import os from viztracer import VizTracer with VizTracer(output_file=f"server-{os.getpid()}.json") as tracer: Then for bin/bitbake-worker: with VizTracer(output_file=f"worker-{os.getpid()}.json") as tracer: etc. When bitbake finishes running, merge with: viztracer --combine *.json My captures are sometimes multiple GB in size, which the web viewer chokes on. For those cases, I use TraceProcessor to handle them: https://perfetto.dev/docs/visualization/large-traces By the way, prior to using this capture method, I tried using their 'generic multi process support': https://viztracer.readthedocs.io/en/latest/concurrency.html#generic-multi-process-support. It was not worth the effort, because there are so many processes/threads in bitbake that it is difficult to propagate the init_kwargs through all the layers. Thanks, Chris
On 29 Jul 2025, at 14:05, Chris Laplante via lists.openembedded.org <chris.laplante=agilent.com@lists.openembedded.org> wrote:
> I have never gotten the command-line 'viztracer' to work well for bitbake. As I'm guessing you also found, it just can't follow what BitBake is doing despite all the documented support for concurrency. However, you can sort-of get it to "see everything" by manually wrapping the individual parts with unique file names, then post-processing to combine the results into a single file.
I added the code to pass the state to the bitake-workers (see poky-contrib:ross/viztracer, absolutely not mergeable!) but it won’t follow into the parser pool, I’d not figured out why it wasn’t doing that yet.
Ross
> On 29 Jul 2025, at 14:05, Chris Laplante via lists.openembedded.org > <chris.laplante=agilent.com@lists.openembedded.org> wrote: > > I have never gotten the command-line 'viztracer' to work well for bitbake. As > I'm guessing you also found, it just can't follow what BitBake is doing despite > all the documented support for concurrency. However, you can sort-of get it > to "see everything" by manually wrapping the individual parts with unique file > names, then post-processing to combine the results into a single file. > > I added the code to pass the state to the bitake-workers (see poky- > contrib:ross/viztracer, absolutely not mergeable!) but it won’t follow into the > parser pool, I’d not figured out why it wasn’t doing that yet. > > Ross I went searching for bug reports about multiprocessing.Process on viztracer's github and I found your issue report lol. Easiest thing to do here is pass down the tracer args through all the layers manually. Give the below patch a try. Thanks, Chris From e368baedb329d3a6f11f13bf930553faa6e692d2 Mon Sep 17 00:00:00 2001 From: Chris Laplante <chris.laplante@agilent.com> Date: Tue, 29 Jul 2025 11:54:33 -0400 Subject: [PATCH] propogate viztracer traceargs down to the Parse processes --- bitbake/bin/bitbake-server | 2 +- bitbake/lib/bb/cooker.py | 20 +++++++++++++++----- bitbake/lib/bb/server/process.py | 4 ++-- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/bitbake/bin/bitbake-server b/bitbake/bin/bitbake-server index 8e430e81ea0..f5c643b2834 100755 --- a/bitbake/bin/bitbake-server +++ b/bitbake/bin/bitbake-server @@ -63,5 +63,5 @@ logger = logging.getLogger("BitBake") handler = bb.event.LogHandler() logger.addHandler(handler) -bb.server.process.execServer(lockfd, readypipeinfd, lockname, sockname, timeout, xmlrpcinterface, profile) +bb.server.process.execServer(lockfd, readypipeinfd, lockname, sockname, timeout, xmlrpcinterface, profile, tracerargs) diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py index 2eb64ef2372..edb4018cb46 100644 --- a/bitbake/lib/bb/cooker.py +++ b/bitbake/lib/bb/cooker.py @@ -129,7 +129,7 @@ class BBCooker: Manages one bitbake build run """ - def __init__(self, featureSet=None, server=None): + def __init__(self, featureSet=None, server=None, tracerargs=None): self.recipecaches = None self.baseconfig_valid = False self.parsecache_valid = False @@ -190,6 +190,7 @@ class BBCooker: signal.signal(signal.SIGHUP, self.sigterm_exception) bb.debug(1, "BBCooker startup complete %s" % time.time()) + self.tracerargs = tracerargs def init_configdata(self): if not hasattr(self, "data"): @@ -1631,7 +1632,7 @@ You can also remove the BB_HASHSERVE_UPSTREAM setting, but this may result in si for dirent in searchdirs: self.add_filewatch([(dirent, bb.parse.cached_mtime_noerror(dirent))]) - self.parser = CookerParser(self, mcfilelist, total_masked) + self.parser = CookerParser(self, mcfilelist, total_masked, self.tracerargs) self._parsecache_set(True) self.state = State.PARSING @@ -1999,7 +2000,7 @@ class ParsingFailure(Exception): Exception.__init__(self, realexception, recipe) class Parser(multiprocessing.Process): - def __init__(self, jobs, next_job_id, results, quit, profile): + def __init__(self, jobs, next_job_id, results, quit, profile, tracer_args): self.jobs = jobs self.next_job_id = next_job_id self.results = results @@ -2012,6 +2013,7 @@ class Parser(multiprocessing.Process): self.signal_received = [] self.signal_threadlock = threading.Lock() self.exit = False + self.tracer_args = tracer_args def catch_sig(self, signum, frame): if self.queue_signals: @@ -2027,6 +2029,12 @@ class Parser(multiprocessing.Process): self.exit = True def run(self): + if self.tracer_args: + import viztracer + init_kwargs = json.loads(self.tracer_args) + tracer = viztracer.VizTracer(**init_kwargs) + tracer.register_exit() + tracer.start() bb.utils.profile_function("parsing" in self.profile, self.realrun, "profile-parse-%s.log" % multiprocessing.current_process().name, process=False) def realrun(self): @@ -2105,7 +2113,7 @@ class Parser(multiprocessing.Process): bb.event.LogHandler.filter = origfilter class CookerParser(object): - def __init__(self, cooker, mcfilelist, masked): + def __init__(self, cooker, mcfilelist, masked, tracerargs): self.mcfilelist = mcfilelist self.cooker = cooker self.cfgdata = cooker.data @@ -2144,6 +2152,8 @@ class CookerParser(object): multiprocessing.cpu_count()), self.toparse) bb.cache.SiggenRecipeInfo.reset() + self.tracerargs = tracerargs + self.start() self.haveshutdown = False self.syncthread = None @@ -2162,7 +2172,7 @@ class CookerParser(object): # Have to pass in willparse at fork time so all parsing processes have the unpickleable data # then access it by index from the parse queue. for i in range(0, self.num_processes): - parser = Parser(self.willparse, next_job_id, self.result_queue, self.parser_quit, self.cooker.configuration.profile) + parser = Parser(self.willparse, next_job_id, self.result_queue, self.parser_quit, self.cooker.configuration.profile, self.tracerargs) parser.start() self.process_names.append(parser.name) self.processes.append(parser) diff --git a/bitbake/lib/bb/server/process.py b/bitbake/lib/bb/server/process.py index a5ef30cb88b..2f5d9436a63 100644 --- a/bitbake/lib/bb/server/process.py +++ b/bitbake/lib/bb/server/process.py @@ -613,7 +613,7 @@ class BitBakeServer(object): os.execl(sys.executable, sys.executable, serverscript, "decafbad", str(self.bitbake_lock.fileno()), str(self.readypipein), self.logfile, self.bitbake_lock.name, self.sockname, str(self.server_timeout or 0), str(list(self.profile)), str(self.xmlrpcinterface[0]), str(self.xmlrpcinterface[1]), tracer_args) -def execServer(lockfd, readypipeinfd, lockname, sockname, server_timeout, xmlrpcinterface, profile): +def execServer(lockfd, readypipeinfd, lockname, sockname, server_timeout, xmlrpcinterface, profile, tracerargs): import bb.cookerdata import bb.cooker @@ -642,7 +642,7 @@ def execServer(lockfd, readypipeinfd, lockname, sockname, server_timeout, xmlrpc writer = ConnectionWriter(readypipeinfd) try: featureset = [] - cooker = bb.cooker.BBCooker(featureset, server) + cooker = bb.cooker.BBCooker(featureset, server, tracerargs) cooker.configuration.profile = profile except bb.BBHandledException: return None -- 2.43.0
diff --git a/lib/bb/utils.py b/lib/bb/utils.py index f688f7dd687..016036dbce4 100644 --- a/lib/bb/utils.py +++ b/lib/bb/utils.py @@ -2226,6 +2226,15 @@ def path_is_descendant(descendant, ancestor): return False +# Recomputing the sets in signal.py is expensive (bitbake -pP idle) +# so try and use _signal directly to avoid it +valid_signals = signal.valid_signals() +try: + import _signal + sigmask = _signal.pthread_sigmask +except ImportError: + sigmask = signal.pthread_sigmask + # If we don't have a timeout of some kind and a process/thread exits badly (for example # OOM killed) and held a lock, we'd just hang in the lock futex forever. It is better # we exit at some point than hang. 5 minutes with no progress means we're probably deadlocked. @@ -2235,7 +2244,7 @@ def path_is_descendant(descendant, ancestor): @contextmanager def lock_timeout(lock): try: - s = signal.pthread_sigmask(signal.SIG_BLOCK, signal.valid_signals()) + s = sigmask(signal.SIG_BLOCK, valid_signals) held = lock.acquire(timeout=5*60) if not held: bb.server.process.serverlog("Couldn't get the lock for 5 mins, timed out, exiting.\n%s" % traceback.format_stack()) @@ -2243,17 +2252,17 @@ def lock_timeout(lock): yield held finally: lock.release() - signal.pthread_sigmask(signal.SIG_SETMASK, s) + sigmask(signal.SIG_SETMASK, s) # A version of lock_timeout without the check that the lock was locked and a shorter timeout @contextmanager def lock_timeout_nocheck(lock): l = False try: - s = signal.pthread_sigmask(signal.SIG_BLOCK, signal.valid_signals()) + s = sigmask(signal.SIG_BLOCK, valid_signals) l = lock.acquire(timeout=10) yield l finally: if l: lock.release() - signal.pthread_sigmask(signal.SIG_SETMASK, s) + sigmask(signal.SIG_SETMASK, s)
Running "time bitbake -pP idle" with a valid cache shows around 800,000 calls to enum creation from python's signal.py. We don't care about this overhead and it adversely affects cache load time quite badly. Try and use _signal directly, falling back to signal, which avoids this overhead we don't need and makes cache loading much faster. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> --- lib/bb/utils.py | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)