Message ID | 20230803051916.1640841-1-Qi.Chen@windriver.com |
---|---|
State | New |
Headers | show |
Series | [bitbake-devel,V3] runqueue.py: fix PSI check logic | expand |
On Thu, 2023-08-03 at 13:19 +0800, Chen Qi via lists.openembedded.org wrote: > From: Chen Qi <Qi.Chen@windriver.com> > > The current logic is not correct because if the time interval > between the current check and the last check is very small, the PSI > checker is not likely to block things even if the system is heavy > loaded. > > It's not good to calculate the value too often. So we change to a 1s > check. As a build will usually take at least minutes, using the 1s value > seems reasonable. > > Signed-off-by: Chen Qi <Qi.Chen@windriver.com> > --- > bitbake/lib/bb/runqueue.py | 29 ++++++++++++++--------------- > 1 file changed, 14 insertions(+), 15 deletions(-) I think the challenge with the 1s difference is that it will cause two different behaviours. The first is that when it limits pressure, it will keep limiting for at least one second afterwards. That might not be too bad, it would potentially just delay a build slightly. The second issue is that when bitbake starts executing tasks, even if the pressure of the system spikes, it won't be seen for one second and bitbake will likely have executed BB_NUMBER_THREAD processes in that time, likely pushing the pressure on the system well above threshold. I suspect we're going to have to come up with something different to handle these issues... Cheers, Richard
Thanks Richard. You're right. I'll send out V4 to remove the 1s window, this will make bitbake do the check every time it tries to spawn a new task. Regards, Qi -----Original Message----- From: Richard Purdie <richard.purdie@linuxfoundation.org> Sent: Thursday, August 3, 2023 2:19 PM To: Chen, Qi <Qi.Chen@windriver.com>; bitbake-devel@lists.openembedded.org Cc: MacLeod, Randy <Randy.MacLeod@windriver.com> Subject: Re: [bitbake-devel][PATCH V3] runqueue.py: fix PSI check logic On Thu, 2023-08-03 at 13:19 +0800, Chen Qi via lists.openembedded.org wrote: > From: Chen Qi <Qi.Chen@windriver.com> > > The current logic is not correct because if the time interval between > the current check and the last check is very small, the PSI checker is > not likely to block things even if the system is heavy loaded. > > It's not good to calculate the value too often. So we change to a 1s > check. As a build will usually take at least minutes, using the 1s > value seems reasonable. > > Signed-off-by: Chen Qi <Qi.Chen@windriver.com> > --- > bitbake/lib/bb/runqueue.py | 29 ++++++++++++++--------------- > 1 file changed, 14 insertions(+), 15 deletions(-) I think the challenge with the 1s difference is that it will cause two different behaviours. The first is that when it limits pressure, it will keep limiting for at least one second afterwards. That might not be too bad, it would potentially just delay a build slightly. The second issue is that when bitbake starts executing tasks, even if the pressure of the system spikes, it won't be seen for one second and bitbake will likely have executed BB_NUMBER_THREAD processes in that time, likely pushing the pressure on the system well above threshold. I suspect we're going to have to come up with something different to handle these issues... Cheers, Richard
After more investigation, I found the problem is a little complicated. If the time interval is very small, the result might be 0, resulting in any small number having no effect. This means, even if we set BB_PRESSURE_MAX_IO/CPU/MEMORY all to "1", there will be a few tasks running together throughout the whole build process. In theory, there should only be 1 task running under such setting. So I've sent out V4, which blocks task spawning for at least 1s while checking the pressure every time if the task spawning is not blocked. This would improve the situation. Regards, Qi -----Original Message----- From: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org> On Behalf Of Chen Qi via lists.openembedded.org Sent: Thursday, August 3, 2023 2:38 PM To: Richard Purdie <richard.purdie@linuxfoundation.org>; bitbake-devel@lists.openembedded.org Cc: MacLeod, Randy <Randy.MacLeod@windriver.com> Subject: Re: [bitbake-devel][PATCH V3] runqueue.py: fix PSI check logic Thanks Richard. You're right. I'll send out V4 to remove the 1s window, this will make bitbake do the check every time it tries to spawn a new task. Regards, Qi -----Original Message----- From: Richard Purdie <richard.purdie@linuxfoundation.org> Sent: Thursday, August 3, 2023 2:19 PM To: Chen, Qi <Qi.Chen@windriver.com>; bitbake-devel@lists.openembedded.org Cc: MacLeod, Randy <Randy.MacLeod@windriver.com> Subject: Re: [bitbake-devel][PATCH V3] runqueue.py: fix PSI check logic On Thu, 2023-08-03 at 13:19 +0800, Chen Qi via lists.openembedded.org wrote: > From: Chen Qi <Qi.Chen@windriver.com> > > The current logic is not correct because if the time interval between > the current check and the last check is very small, the PSI checker is > not likely to block things even if the system is heavy loaded. > > It's not good to calculate the value too often. So we change to a 1s > check. As a build will usually take at least minutes, using the 1s > value seems reasonable. > > Signed-off-by: Chen Qi <Qi.Chen@windriver.com> > --- > bitbake/lib/bb/runqueue.py | 29 ++++++++++++++--------------- > 1 file changed, 14 insertions(+), 15 deletions(-) I think the challenge with the 1s difference is that it will cause two different behaviours. The first is that when it limits pressure, it will keep limiting for at least one second afterwards. That might not be too bad, it would potentially just delay a build slightly. The second issue is that when bitbake starts executing tasks, even if the pressure of the system spikes, it won't be seen for one second and bitbake will likely have executed BB_NUMBER_THREAD processes in that time, likely pushing the pressure on the system well above threshold. I suspect we're going to have to come up with something different to handle these issues... Cheers, Richard
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py index 48788f4aa6..769709f2d1 100644 --- a/bitbake/lib/bb/runqueue.py +++ b/bitbake/lib/bb/runqueue.py @@ -179,6 +179,7 @@ class RunQueueScheduler(object): self.prev_memory_pressure = memory_pressure_fds.readline().split()[4].split("=")[1] self.prev_pressure_time = time.time() self.check_pressure = True + self.psi_exceeded = False except: bb.note("The /proc/pressure files can't be read. Continuing build without monitoring pressure") self.check_pressure = False @@ -191,6 +192,10 @@ class RunQueueScheduler(object): BB_PRESSURE_MAX_{CPU|IO|MEMORY} are set, return True if above threshold. """ if self.check_pressure: + now = time.time() + tdiff = now - self.prev_pressure_time + if tdiff < 1.0: + return self.psi_exceeded with open("/proc/pressure/cpu") as cpu_pressure_fds, \ open("/proc/pressure/io") as io_pressure_fds, \ open("/proc/pressure/memory") as memory_pressure_fds: @@ -198,25 +203,19 @@ class RunQueueScheduler(object): curr_cpu_pressure = cpu_pressure_fds.readline().split()[4].split("=")[1] curr_io_pressure = io_pressure_fds.readline().split()[4].split("=")[1] curr_memory_pressure = memory_pressure_fds.readline().split()[4].split("=")[1] - now = time.time() - tdiff = now - self.prev_pressure_time - if tdiff > 1.0: - exceeds_cpu_pressure = self.rq.max_cpu_pressure and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) / tdiff > self.rq.max_cpu_pressure - exceeds_io_pressure = self.rq.max_io_pressure and (float(curr_io_pressure) - float(self.prev_io_pressure)) / tdiff > self.rq.max_io_pressure - exceeds_memory_pressure = self.rq.max_memory_pressure and (float(curr_memory_pressure) - float(self.prev_memory_pressure)) / tdiff > self.rq.max_memory_pressure - self.prev_cpu_pressure = curr_cpu_pressure - self.prev_io_pressure = curr_io_pressure - self.prev_memory_pressure = curr_memory_pressure - self.prev_pressure_time = now - else: - exceeds_cpu_pressure = self.rq.max_cpu_pressure and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) > self.rq.max_cpu_pressure - exceeds_io_pressure = self.rq.max_io_pressure and (float(curr_io_pressure) - float(self.prev_io_pressure)) > self.rq.max_io_pressure - exceeds_memory_pressure = self.rq.max_memory_pressure and (float(curr_memory_pressure) - float(self.prev_memory_pressure)) > self.rq.max_memory_pressure + exceeds_cpu_pressure = self.rq.max_cpu_pressure and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) / tdiff > self.rq.max_cpu_pressure + exceeds_io_pressure = self.rq.max_io_pressure and (float(curr_io_pressure) - float(self.prev_io_pressure)) / tdiff > self.rq.max_io_pressure + exceeds_memory_pressure = self.rq.max_memory_pressure and (float(curr_memory_pressure) - float(self.prev_memory_pressure)) / tdiff > self.rq.max_memory_pressure + self.prev_cpu_pressure = curr_cpu_pressure + self.prev_io_pressure = curr_io_pressure + self.prev_memory_pressure = curr_memory_pressure + self.prev_pressure_time = now + self.psi_exceeded = exceeds_cpu_pressure or exceeds_io_pressure or exceeds_memory_pressure pressure_state = (exceeds_cpu_pressure, exceeds_io_pressure, exceeds_memory_pressure) if hasattr(self, "pressure_state") and pressure_state != self.pressure_state: bb.note("Pressure status changed to CPU: %s, IO: %s, Mem: %s" % pressure_state) self.pressure_state = pressure_state - return (exceeds_cpu_pressure or exceeds_io_pressure or exceeds_memory_pressure) + return self.psi_exceeded return False def next_buildable_task(self):