Message ID | 20250523145205.3542264-1-mac@mcrowe.com |
---|---|
State | New |
Headers | show |
Series | runqueue: Allow pressure state change notifications to be disabled | expand |
On Fri, 2025-05-23 at 15:52 +0100, Mike Crowe via lists.openembedded.org wrote: > Allow setting BB_PRESSURE_NOTE_CHANGE = "0" to disable NOTE messages > being emitted every time the pressure state changes. The previous > default behaviour is not changed. > > Signed-off-by: Mike Crowe <mac@mcrowe.com> > Reviewed-by: Jack Mitchell <jack@embed.me.uk> > --- > .../bitbake-user-manual-ref-variables.rst | 7 +++++++ > lib/bb/runqueue.py | 3 ++- > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > index 477443e22..cd81a586e 100644 > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > @@ -574,6 +574,13 @@ overview of their function and contents. > might be useful as a last resort to prevent OOM errors if they are > occurring during builds. > > + :term:`BB_PRESSURE_NOTE_CHANGE` > + > + By default Bitbake emits a note message each time the scheduler > + decides to stop or start scheduling new tasks due to pressure > + changes. Setting :term:`BB_PRESSURE_NOTE_CHANGE` to "0" stops these > + messages. > + > :term:`BB_RUNFMT` > Specifies the name of the executable script files (i.e. run files) > saved into ``${``\ :term:`T`\ ``}``. By default, the > diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py > index 8fadc8338..306814fd6 100644 > --- a/lib/bb/runqueue.py > +++ b/lib/bb/runqueue.py > @@ -218,7 +218,7 @@ class RunQueueScheduler(object): > > pressure_state = (exceeds_cpu_pressure, exceeds_io_pressure, exceeds_memory_pressure) > pressure_values = (round(cpu_pressure,1), self.rq.max_cpu_pressure, round(io_pressure,1), self.rq.max_io_pressure, round(memory_pressure,1), self.rq.max_memory_pressure) > - if hasattr(self, "pressure_state") and pressure_state != self.pressure_state: > + if self.rq.note_pressure_state_change and hasattr(self, "pressure_state") and pressure_state != self.pressure_state: > bb.note("Pressure status changed to CPU: %s, IO: %s, Mem: %s (CPU: %s/%s, IO: %s/%s, Mem: %s/%s) - using %s/%s bitbake threads" % (pressure_state + pressure_values + (len(self.rq.runq_running.difference(self.rq.runq_complete)), self.rq.number_tasks))) > self.pressure_state = pressure_state > return (exceeds_cpu_pressure or exceeds_io_pressure or exceeds_memory_pressure) > @@ -1864,6 +1864,7 @@ class RunQueueExecute: > self.max_io_pressure = self.cfgData.getVar("BB_PRESSURE_MAX_IO") > self.max_memory_pressure = self.cfgData.getVar("BB_PRESSURE_MAX_MEMORY") > self.max_loadfactor = self.cfgData.getVar("BB_LOADFACTOR_MAX") > + self.note_pressure_state_change = (self.cfgData.getVar("BB_PRESSURE_NOTE_CHANGE") or "1") != "0" > > self.sq_buildable = set() > self.sq_running = set() I've resisted taking a change like this in the past since firstly, we'd otherwise get complaints about "why is bitbake not executing anything?" since it becomes unclear why bitbake isn't doing anything. The second issue is that if you're seeing large numbers of pressure messages, your configuration probably isn't right. I'm also never that keen on single use variables like this. I'm therefore torn on the patch. I can see why you're asking for it, that doesn't mean it is the right thing to do though... Cheers, Richard
On Tuesday 27 May 2025 at 09:11:03 +0100, Richard Purdie wrote: > On Fri, 2025-05-23 at 15:52 +0100, Mike Crowe via lists.openembedded.org wrote: > > Allow setting BB_PRESSURE_NOTE_CHANGE = "0" to disable NOTE messages > > being emitted every time the pressure state changes. The previous > > default behaviour is not changed. > > > > Signed-off-by: Mike Crowe <mac@mcrowe.com> > > Reviewed-by: Jack Mitchell <jack@embed.me.uk> > > --- > > �.../bitbake-user-manual-ref-variables.rst����������������� | 7 +++++++ > > �lib/bb/runqueue.py���������������������������������������� | 3 ++- > > �2 files changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > index 477443e22..cd81a586e 100644 > > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > @@ -574,6 +574,13 @@ overview of their function and contents. > > ������ might be useful as a last resort to prevent OOM errors if they are > > ������ occurring during builds. > > � > > +�� :term:`BB_PRESSURE_NOTE_CHANGE` > > + > > +����� By default Bitbake emits a note message each time the scheduler > > +����� decides to stop or start scheduling new tasks due to pressure > > +����� changes. Setting :term:`BB_PRESSURE_NOTE_CHANGE` to "0" stops these > > +����� messages. > > + > > ��� :term:`BB_RUNFMT` > > ������ Specifies the name of the executable script files (i.e. run files) > > ������ saved into ``${``\ :term:`T`\ ``}``. By default, the [snip] Thanks for considering the patch. > I've resisted taking a change like this in the past since firstly, we'd > otherwise get complaints about "why is bitbake not executing anything?" The comment in next_buildable_task implies that at least one task is always running (or shortly will be). Perhaps you didn't mean the complaint to be taken completely literally though? :) > since it becomes unclear why bitbake isn't doing anything. The second > issue is that if you're seeing large numbers of pressure messages, your > configuration probably isn't right. I get hundreds of lines of pressure monitor changes when building even a single large recipe that can take about an hour. It's always the CPU pressure that has changed. The last time I looked I couldn't find much guidance for setting BB_NUMBER_THREADS, BB_NUMBER_PARSE_THREADS, PARALLEL_MAKE and BB_PRESSURE_*. We currently set PARALLEL_MAKE, BB_NUMBER_THREADS and BB_NUMBER_PARSE_THREADS to the number of CPUs on the host. (Our build machines tend to have about 32 logical CPUs though we originally set these values when they only had eight.) We set BB_PRESSURE_MAX_* to 2000 (though it seems that I had it overridden to 20000 in my personal configuration - I may have got this from https://wiki.yoctoproject.org/wiki/images/0/04/Yocto_Project_Autobuilder.pdf or similar). All machines have NVMe storage. In addition to stock oe-core stuff, we also build various embarrassingly-parallel large C++ recipes. These are often blocking dependencies so they run on their own. We end up doing quite a lot of incremental builds too, where some stuff comes from sstate and when compilation does happen ccache is often well populated. This all means that we'd like an individual task to use as many CPUs as possible but ideally we wouldn't run multiple such massively-parallel tasks in parallel. Of course, Bitbake, Make & Ninja don't know about each other so there's no good solution. With the above settings I tend to see Bitbake kicking off many tasks all at once, which pushes the pressure too high and it backs off whilst those tasks finish and then the same happens again. It makes the knotty output on console look rather like a spring bouncing up and down. This is clearly not ideal. Luckily there aren't too many really huge tasks that get to run in parallel, so the smaller ones finish and free up resources. > I'm also never that keen on single use variables like this. > > I'm therefore torn on the patch. I can see why you're asking for it, > that doesn't mean it is the right thing to do though... Your position is understandable. We can keep this as a local change until we can fathom out a better solution. Thanks. Mike.
On Tue, 2025-05-27 at 16:23 +0100, Mike Crowe wrote: > On Tuesday 27 May 2025 at 09:11:03 +0100, Richard Purdie wrote: > > On Fri, 2025-05-23 at 15:52 +0100, Mike Crowe via lists.openembedded.org wrote: > > > Allow setting BB_PRESSURE_NOTE_CHANGE = "0" to disable NOTE messages > > > being emitted every time the pressure state changes. The previous > > > default behaviour is not changed. > > > > > > Signed-off-by: Mike Crowe <mac@mcrowe.com> > > > Reviewed-by: Jack Mitchell <jack@embed.me.uk> > > > --- > > > .../bitbake-user-manual-ref-variables.rst | 7 +++++++ > > > lib/bb/runqueue.py | 3 ++- > > > 2 files changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > > index 477443e22..cd81a586e 100644 > > > --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > > +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst > > > @@ -574,6 +574,13 @@ overview of their function and contents. > > > might be useful as a last resort to prevent OOM errors if they are > > > occurring during builds. > > > > > > + :term:`BB_PRESSURE_NOTE_CHANGE` > > > + > > > + By default Bitbake emits a note message each time the scheduler > > > + decides to stop or start scheduling new tasks due to pressure > > > + changes. Setting :term:`BB_PRESSURE_NOTE_CHANGE` to "0" stops these > > > + messages. > > > + > > > :term:`BB_RUNFMT` > > > Specifies the name of the executable script files (i.e. run files) > > > saved into ``${``\ :term:`T`\ ``}``. By default, the > > [snip] > > Thanks for considering the patch. > > > I've resisted taking a change like this in the past since firstly, we'd > > otherwise get complaints about "why is bitbake not executing anything?" > > The comment in next_buildable_task implies that at least one task is always > running (or shortly will be). Perhaps you didn't mean the complaint to be > taken completely literally though? :) Being specific, "why is bitbake only executing one task" :) > > since it becomes unclear why bitbake isn't doing anything. The second > > issue is that if you're seeing large numbers of pressure messages, your > > configuration probably isn't right. > > I get hundreds of lines of pressure monitor changes when building even a > single large recipe that can take about an hour. It's always the CPU > pressure that has changed. > > The last time I looked I couldn't find much guidance for setting > BB_NUMBER_THREADS, BB_NUMBER_PARSE_THREADS, PARALLEL_MAKE and > BB_PRESSURE_*. > > We currently set PARALLEL_MAKE, BB_NUMBER_THREADS and > BB_NUMBER_PARSE_THREADS to the number of CPUs on the host. (Our build > machines tend to have about 32 logical CPUs though we originally set these > values when they only had eight.) That is usually about the right amount until you get to larger numbers of cores. On the autobuilder which has 96 cores, we use: BB_NUMBER_THREADS = '16' BB_NUMBER_PARSE_THREADS = '16' PARALLEL_MAKE = '-j 16 -l 75' BB_PRESSURE_MAX_CPU = '20000' BB_PRESSURE_MAX_IO = '20000' BB_LOADFACTOR_MAX = '1.5' (see meta/conf/fragments/yocto-autobuilder/autobuilder-resource- constraints.conf) > We set BB_PRESSURE_MAX_* to 2000 (though > it seems that I had it overridden to 20000 in my personal configuration - I > may have got this from > https://wiki.yoctoproject.org/wiki/images/0/04/Yocto_Project_Autobuilder.pdf > or similar). All machines have NVMe storage. > > In addition to stock oe-core stuff, we also build various > embarrassingly-parallel large C++ recipes. These are often blocking > dependencies so they run on their own. We end up doing quite a lot of > incremental builds too, where some stuff comes from sstate and when > compilation does happen ccache is often well populated. > > This all means that we'd like an individual task to use as many CPUs as > possible but ideally we wouldn't run multiple such massively-parallel tasks > in parallel. Of course, Bitbake, Make & Ninja don't know about each other > so there's no good solution. That is something we might be able to change with a shared job pool. Ninja is now able to share make's job pool as of the last week or two and once that happens, sharing with bitbake could be possible too. > With the above settings I tend to see Bitbake kicking off many tasks all at > once, which pushes the pressure too high and it backs off whilst those > tasks finish and then the same happens again. It makes the knotty output > on console look rather like a spring bouncing up and down. This is clearly > not ideal. Luckily there aren't too many really huge tasks that get to run > in parallel, so the smaller ones finish and free up resources. It is tough, bitbake could perhaps have a startup/backoff algorithm to try and avoid it. T That said, the idea has been that it should be able to run BB_NUMBER_THREADS in parallel without issue and then the pressure regulation kicks in if one of those jobs is heavy and using system resources and avoids bitbake starting any new jobs until it is done. I still suspect your pressure numbers may be low. How to work them out is trial and error though, I wish there was a better way. Cheers, Richard
[snip] On Tuesday 27 May 2025 at 17:07:45 +0100, Richard Purdie wrote: > On Tue, 2025-05-27 at 16:23 +0100, Mike Crowe wrote: > > I get hundreds of lines of pressure monitor changes when building even a > > single large recipe that can take about an hour. It's always the CPU > > pressure that has changed. > > > > The last time I looked I couldn't find much guidance for setting > > BB_NUMBER_THREADS, BB_NUMBER_PARSE_THREADS, PARALLEL_MAKE and > > BB_PRESSURE_*. > > > > We currently set PARALLEL_MAKE, BB_NUMBER_THREADS and > > BB_NUMBER_PARSE_THREADS to the number of CPUs on the host. (Our build > > machines tend to have about 32 logical CPUs though we originally set these > > values when they only had eight.) > > That is usually about the right amount until you get to larger numbers > of cores. On the autobuilder which has 96 cores, we use: > > BB_NUMBER_THREADS = '16' > BB_NUMBER_PARSE_THREADS = '16' > PARALLEL_MAKE = '-j 16 -l 75' > BB_PRESSURE_MAX_CPU = '20000' > BB_PRESSURE_MAX_IO = '20000' > BB_LOADFACTOR_MAX = '1.5' > > (see meta/conf/fragments/yocto-autobuilder/autobuilder-resource- > constraints.conf) I would have expected that value for PARALLEL_MAKE to be stopping a single instance of make from using anywhere near enough CPUs to cause CPU pressure with that configuration. With 96 cores you'd need to be running six copies of make in parallel, and even then the load average limit stands a good chance of kicking in before the CPU pressure will rise too high. > > We set BB_PRESSURE_MAX_* to 2000 (though > > it seems that I had it overridden to 20000 in my personal configuration - I > > may have got this from > > https://wiki.yoctoproject.org/wiki/images/0/04/Yocto_Project_Autobuilder.pdf > > or similar). All machines have NVMe storage. > > > > In addition to stock oe-core stuff, we also build various > > embarrassingly-parallel large C++ recipes. These are often blocking > > dependencies so they run on their own. We end up doing quite a lot of > > incremental builds too, where some stuff comes from sstate and when > > compilation does happen ccache is often well populated. > > > > This all means that we'd like an individual task to use as many CPUs as > > possible but ideally we wouldn't run multiple such massively-parallel tasks > > in parallel. Of course, Bitbake, Make & Ninja don't know about each other > > so there's no good solution. > > That is something we might be able to change with a shared job pool. > Ninja is now able to share make's job pool as of the last week or two > and once that happens, sharing with bitbake could be possible too. That's good progress then. Last time I looked the Ninja people were pushing back a bit. > > With the above settings I tend to see Bitbake kicking off many tasks all at > > once, which pushes the pressure too high and it backs off whilst those > > tasks finish and then the same happens again. It makes the knotty output > > on console look rather like a spring bouncing up and down. This is clearly > > not ideal. Luckily there aren't too many really huge tasks that get to run > > in parallel, so the smaller ones finish and free up resources. > > It is tough, bitbake could perhaps have a startup/backoff algorithm to > try and avoid it. T Yes, I did wonder whether it would make sense to wait for a little while before launching another task to give the first one a chance to get going. That's bound to make things worse in some situations though. > That said, the idea has been that it should be able to run > BB_NUMBER_THREADS in parallel without issue and then the pressure > regulation kicks in if one of those jobs is heavy and using system > resources and avoids bitbake starting any new jobs until it is done. > > I still suspect your pressure numbers may be low. How to work them out > is trial and error though, I wish there was a better way. I'll certainly try much higher pressure numbers. I could certainly try reducing BB_NUMBER_THREADS, but reducing PARALLEL_MAKE's -j would slow down the larger projects we build when they are the only task running. It's probably worth experimenting with -l too though. Thanks for your comments and advice. Mike.
On Wed, 2025-05-28 at 13:22 +0100, Mike Crowe wrote: > [snip] > > On Tuesday 27 May 2025 at 17:07:45 +0100, Richard Purdie wrote: > > On Tue, 2025-05-27 at 16:23 +0100, Mike Crowe wrote: > > > I get hundreds of lines of pressure monitor changes when building even a > > > single large recipe that can take about an hour. It's always the CPU > > > pressure that has changed. > > > > > > The last time I looked I couldn't find much guidance for setting > > > BB_NUMBER_THREADS, BB_NUMBER_PARSE_THREADS, PARALLEL_MAKE and > > > BB_PRESSURE_*. > > > > > > We currently set PARALLEL_MAKE, BB_NUMBER_THREADS and > > > BB_NUMBER_PARSE_THREADS to the number of CPUs on the host. (Our build > > > machines tend to have about 32 logical CPUs though we originally set these > > > values when they only had eight.) > > > > That is usually about the right amount until you get to larger numbers > > of cores. On the autobuilder which has 96 cores, we use: > > > > BB_NUMBER_THREADS = '16' > > BB_NUMBER_PARSE_THREADS = '16' > > PARALLEL_MAKE = '-j 16 -l 75' > > BB_PRESSURE_MAX_CPU = '20000' > > BB_PRESSURE_MAX_IO = '20000' > > BB_LOADFACTOR_MAX = '1.5' > > > > (see meta/conf/fragments/yocto-autobuilder/autobuilder-resource- > > constraints.conf) > > I would have expected that value for PARALLEL_MAKE to be stopping a single > instance of make from using anywhere near enough CPUs to cause CPU pressure > with that configuration. With 96 cores you'd need to be running six copies > of make in parallel, and even then the load average limit stands a good > chance of kicking in before the CPU pressure will rise too high. For clarity, we run three autobuilder targets in parallel on most workers and that can include oe-selftest -j 12, so our builds can scale differently/badly and this was found to be the best combination of options for us. I mention it mainly to illustrate a combination we've been happy with. > > > > > With the above settings I tend to see Bitbake kicking off many tasks all at > > > once, which pushes the pressure too high and it backs off whilst those > > > tasks finish and then the same happens again. It makes the knotty output > > > on console look rather like a spring bouncing up and down. This is clearly > > > not ideal. Luckily there aren't too many really huge tasks that get to run > > > in parallel, so the smaller ones finish and free up resources. > > > > It is tough, bitbake could perhaps have a startup/backoff algorithm to > > try and avoid it. T > > Yes, I did wonder whether it would make sense to wait for a little while > before launching another task to give the first one a chance to get going. > That's bound to make things worse in some situations though. It would certainly limit the build speed in some scenarios and you'd have the challenge on how long it takes for load to increase. Some parallel make jobs probably need a short while to build common dependencies before hitting max parallelism. > > That said, the idea has been that it should be able to run > > BB_NUMBER_THREADS in parallel without issue and then the pressure > > regulation kicks in if one of those jobs is heavy and using system > > resources and avoids bitbake starting any new jobs until it is done. > > > > I still suspect your pressure numbers may be low. How to work them out > > is trial and error though, I wish there was a better way. > > I'll certainly try much higher pressure numbers. I could certainly try > reducing BB_NUMBER_THREADS, but reducing PARALLEL_MAKE's -j would slow down > the larger projects we build when they are the only task running. It's > probably worth experimenting with -l too though. I'd say the -l option is by far the more useful of the two FWIW. Cheers, Richard
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst index 477443e22..cd81a586e 100644 --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst @@ -574,6 +574,13 @@ overview of their function and contents. might be useful as a last resort to prevent OOM errors if they are occurring during builds. + :term:`BB_PRESSURE_NOTE_CHANGE` + + By default Bitbake emits a note message each time the scheduler + decides to stop or start scheduling new tasks due to pressure + changes. Setting :term:`BB_PRESSURE_NOTE_CHANGE` to "0" stops these + messages. + :term:`BB_RUNFMT` Specifies the name of the executable script files (i.e. run files) saved into ``${``\ :term:`T`\ ``}``. By default, the diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py index 8fadc8338..306814fd6 100644 --- a/lib/bb/runqueue.py +++ b/lib/bb/runqueue.py @@ -218,7 +218,7 @@ class RunQueueScheduler(object): pressure_state = (exceeds_cpu_pressure, exceeds_io_pressure, exceeds_memory_pressure) pressure_values = (round(cpu_pressure,1), self.rq.max_cpu_pressure, round(io_pressure,1), self.rq.max_io_pressure, round(memory_pressure,1), self.rq.max_memory_pressure) - if hasattr(self, "pressure_state") and pressure_state != self.pressure_state: + if self.rq.note_pressure_state_change and hasattr(self, "pressure_state") and pressure_state != self.pressure_state: bb.note("Pressure status changed to CPU: %s, IO: %s, Mem: %s (CPU: %s/%s, IO: %s/%s, Mem: %s/%s) - using %s/%s bitbake threads" % (pressure_state + pressure_values + (len(self.rq.runq_running.difference(self.rq.runq_complete)), self.rq.number_tasks))) self.pressure_state = pressure_state return (exceeds_cpu_pressure or exceeds_io_pressure or exceeds_memory_pressure) @@ -1864,6 +1864,7 @@ class RunQueueExecute: self.max_io_pressure = self.cfgData.getVar("BB_PRESSURE_MAX_IO") self.max_memory_pressure = self.cfgData.getVar("BB_PRESSURE_MAX_MEMORY") self.max_loadfactor = self.cfgData.getVar("BB_LOADFACTOR_MAX") + self.note_pressure_state_change = (self.cfgData.getVar("BB_PRESSURE_NOTE_CHANGE") or "1") != "0" self.sq_buildable = set() self.sq_running = set()