diff mbox series

[3/3] testimage.bbclass: capture RuntimeError too

Message ID 20241111131604.364308-3-mikko.rapeli@linaro.org
State New
Headers show
Series [1/3] uki.bbclass: fix debug print logging level | expand

Commit Message

Mikko Rapeli Nov. 11, 2024, 1:16 p.m. UTC
runqemu can fail with RuntimeError exception. Non-cought exception
causes cooker process leaks which bind to successive bitbake command
line calls and that can cause really odd errors to users, e.g. when
build/tmp is wiped and cooker processes expect files to be there.

Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
---
 meta/classes-recipe/testimage.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Purdie Nov. 12, 2024, 11:25 a.m. UTC | #1
On Mon, 2024-11-11 at 13:16 +0000, Mikko Rapeli via lists.openembedded.org wrote:
> runqemu can fail with RuntimeError exception. Non-cought exception
> causes cooker process leaks which bind to successive bitbake command
> line calls and that can cause really odd errors to users, e.g. when
> build/tmp is wiped and cooker processes expect files to be there.
> 
> Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> ---
>  meta/classes-recipe/testimage.bbclass | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
> index 19075ce1f3..a9b031093a 100644
> --- a/meta/classes-recipe/testimage.bbclass
> +++ b/meta/classes-recipe/testimage.bbclass
> @@ -371,7 +371,7 @@ def testimage_main(d):
>          complete = True
>          if results.hasAnyFailingTest():
>              run_failed_tests_post_actions(d, tc)
> -    except (KeyboardInterrupt, BlockingIOError) as err:
> +    except (KeyboardInterrupt, BlockingIOError, RuntimeError) as err:
>          if isinstance(err, KeyboardInterrupt):
>              bb.error('testimage interrupted, shutting down...')
>          else:
> 

During review it is hard to understand what the real issue is from this
description. I don't like the sound of processes leaking and if that is
happening, adding another exception to this list doesn't feel correct.
I was going to ask for a better explanation but looking at the code,
perhaps this error handling path just needs rewriting/improving with
more of the code in the finally, conditionally?

I just want to make sure we fix the real bug here.

Cheers,

Richard
Mikko Rapeli Nov. 18, 2024, 8 a.m. UTC | #2
Hi,

On Tue, Nov 12, 2024 at 11:25:51AM +0000, Richard Purdie wrote:
> On Mon, 2024-11-11 at 13:16 +0000, Mikko Rapeli via lists.openembedded.org wrote:
> > runqemu can fail with RuntimeError exception. Non-cought exception
> > causes cooker process leaks which bind to successive bitbake command
> > line calls and that can cause really odd errors to users, e.g. when
> > build/tmp is wiped and cooker processes expect files to be there.
> > 
> > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > ---
> > �meta/classes-recipe/testimage.bbclass | 2 +-
> > �1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
> > index 19075ce1f3..a9b031093a 100644
> > --- a/meta/classes-recipe/testimage.bbclass
> > +++ b/meta/classes-recipe/testimage.bbclass
> > @@ -371,7 +371,7 @@ def testimage_main(d):
> > �������� complete = True
> > �������� if results.hasAnyFailingTest():
> > ������������ run_failed_tests_post_actions(d, tc)
> > -��� except (KeyboardInterrupt, BlockingIOError) as err:
> > +��� except (KeyboardInterrupt, BlockingIOError, RuntimeError) as err:
> > �������� if isinstance(err, KeyboardInterrupt):
> > ������������ bb.error('testimage interrupted, shutting down...')
> > �������� else:
> > 
> 
> During review it is hard to understand what the real issue is from this
> description. I don't like the sound of processes leaking and if that is
> happening, adding another exception to this list doesn't feel correct.
> I was going to ask for a better explanation but looking at the code,
> perhaps this error handling path just needs rewriting/improving with
> more of the code in the finally, conditionally?
> 
> I just want to make sure we fix the real bug here.

Sorry for being unclear. I thought the backtrace would be too verbose.

The bug happens when runqemu startup fails:

poky/meta/lib/oeqa/targetcontrol.py:            raise RuntimeError("%s - FAILED to start qemu - check the task log and the boot log" % self.pn)

cooker processes do leak when the exceptions are not cought.
Maybe these are not strictly related but it happens for me. It
can be that cleanup happens but just slowly, and when I run
other bitbake commands right after failure they connect to these
leaked cooker processes which then behave badly, for example when
build/tmp was already wiped.

Here is a failure log from one of my builds from meta-arm with other
changes on top which are the root cause for missing rootfs file:

ERROR: core-image-base-1.0-r0 do_testimage: Invalid rootfs /home/builder/src/base/meta-arm/build/tmp/deploy/images/qemuarm64-secureboot/core-image-base-qemuarm64-secureboot.rootfs.wic.qcow2
ERROR: core-image-base-1.0-r0 do_testimage: Error executing a python function in exec_func_python() autogenerated:

The stack trace of python calls that resulted in this exception/failure was:
File: 'exec_func_python() autogenerated', lineno: 2, function: <module>
     0001:
 *** 0002:do_testimage(d)
     0003:
File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 122, function: do_testimage
     0118:    dump-guest-memory {"paging":false,"protocol":"file:%s.img"}
     0119:}
     0120:
     0121:python do_testimage() {
 *** 0122:    testimage_main(d)
     0123:}
     0124:
     0125:addtask testimage
     0126:do_testimage[nostamp] = "1"
File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 364, function: testimage_main
     0360:    orig_sigterm_handler = signal.signal(signal.SIGTERM, sigterm_exception)
     0361:    try:
     0362:        # We need to check if runqemu ends unexpectedly
     0363:        # or if the worker send us a SIGTERM
 *** 0364:        tc.target.start(params=d.getVar("TEST_QEMUPARAMS"), runqemuparams=d.getVar("TEST_RUNQEMUPARAMS"))
     0365:        import threading
     0366:        try:
     0367:            threading.Timer(int(d.getVar("TEST_OVERALL_TIMEOUT")), handle_test_timeout, (int(d.getVar("TEST_OVERALL_TIMEOUT")),)).start()
     0368:        except ValueError:
File: '/home/builder/src/base/meta-arm/build/../poky/meta/lib/oeqa/core/target/qemu.py', lineno: 91, function: start
     0087:            except (subprocess.CalledProcessError, subprocess.TimeoutExpired, FileNotFoundError) as err:
     0088:                msg += "Error running command: %s\n%s\n" % (blcmd, err)
     0089:            msg += "\n\n===== end: snippet =====\n"
     0090:
 *** 0091:            raise RuntimeError("FAILED to start qemu - check the task log and the boot log %s" % (msg))
     0092:    
     0093:    def stop(self):
     0094:        self.runner.stop()
Exception: RuntimeError: FAILED to start qemu - check the task log and the boot log 

===== start: snippet =====



===== end: snippet =====

===== start: snippet =====

Error running command: ['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_boot_log.20241031142321']
Command '['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_boot_log.20241031142321']' returned non-zero exit status 1.


===== end: snippet =====


ERROR: Logfile of failure stored in: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/temp/log.do_testimage.1779338
NOTE: recipe core-image-base-1.0-r0: task do_testimage: Failed
ERROR: Task (/home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage) failed with exit code '1'
NOTE: Tasks Summary: Attempted 7084 tasks of which 7054 didn't need to be rerun and 1 failed.

Summary: 1 task failed:
  /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage

Cheers,

-Mikko
Richard Purdie Jan. 28, 2025, 1:04 p.m. UTC | #3
On Mon, 2024-11-18 at 10:00 +0200, Mikko Rapeli wrote:
> On Tue, Nov 12, 2024 at 11:25:51AM +0000, Richard Purdie wrote:
> > On Mon, 2024-11-11 at 13:16 +0000, Mikko Rapeli via lists.openembedded.org wrote:
> > > runqemu can fail with RuntimeError exception. Non-cought exception
> > > causes cooker process leaks which bind to successive bitbake command
> > > line calls and that can cause really odd errors to users, e.g. when
> > > build/tmp is wiped and cooker processes expect files to be there.
> > > 
> > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > > ---
> > >  meta/classes-recipe/testimage.bbclass | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
> > > index 19075ce1f3..a9b031093a 100644
> > > --- a/meta/classes-recipe/testimage.bbclass
> > > +++ b/meta/classes-recipe/testimage.bbclass
> > > @@ -371,7 +371,7 @@ def testimage_main(d):
> > >          complete = True
> > >          if results.hasAnyFailingTest():
> > >              run_failed_tests_post_actions(d, tc)
> > > -    except (KeyboardInterrupt, BlockingIOError) as err:
> > > +    except (KeyboardInterrupt, BlockingIOError, RuntimeError) as err:
> > >          if isinstance(err, KeyboardInterrupt):
> > >              bb.error('testimage interrupted, shutting down...')
> > >          else:
> > > 
> > 
> > During review it is hard to understand what the real issue is from this
> > description. I don't like the sound of processes leaking and if that is
> > happening, adding another exception to this list doesn't feel correct.
> > I was going to ask for a better explanation but looking at the code,
> > perhaps this error handling path just needs rewriting/improving with
> > more of the code in the finally, conditionally?
> > 
> > I just want to make sure we fix the real bug here.
> 
> Sorry for being unclear. I thought the backtrace would be too verbose.
> 
> The bug happens when runqemu startup fails:
> 
> poky/meta/lib/oeqa/targetcontrol.py:            raise RuntimeError("%s - FAILED to start qemu - check the task log and the boot log" % self.pn)
> 
> cooker processes do leak when the exceptions are not cought.
> Maybe these are not strictly related but it happens for me. It
> can be that cleanup happens but just slowly, and when I run
> other bitbake commands right after failure they connect to these
> leaked cooker processes which then behave badly, for example when
> build/tmp was already wiped.
> 

Sorry for the delay in looking at this patch. I'm a bit worried about
there being a leaked processes and wanted to understand if there was
other cleanup we should be doing.

Instead of this path, would it make sense to move the results.stop()
inside the finally? I'm worried that other forms of exception would
also leak processes.

Cheers,

Richard
Richard Purdie Jan. 28, 2025, 1:49 p.m. UTC | #4
On Tue, 2025-01-28 at 13:04 +0000, Richard Purdie via
lists.openembedded.org wrote:
> On Mon, 2024-11-18 at 10:00 +0200, Mikko Rapeli wrote:
> > On Tue, Nov 12, 2024 at 11:25:51AM +0000, Richard Purdie wrote:
> > > On Mon, 2024-11-11 at 13:16 +0000, Mikko Rapeli via
> > > lists.openembedded.org wrote:
> > > > runqemu can fail with RuntimeError exception. Non-cought
> > > > exception
> > > > causes cooker process leaks which bind to successive bitbake
> > > > command
> > > > line calls and that can cause really odd errors to users, e.g.
> > > > when
> > > > build/tmp is wiped and cooker processes expect files to be
> > > > there.
> > > > 
> > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > > > ---
> > > >  meta/classes-recipe/testimage.bbclass | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/meta/classes-recipe/testimage.bbclass
> > > > b/meta/classes-recipe/testimage.bbclass
> > > > index 19075ce1f3..a9b031093a 100644
> > > > --- a/meta/classes-recipe/testimage.bbclass
> > > > +++ b/meta/classes-recipe/testimage.bbclass
> > > > @@ -371,7 +371,7 @@ def testimage_main(d):
> > > >          complete = True
> > > >          if results.hasAnyFailingTest():
> > > >              run_failed_tests_post_actions(d, tc)
> > > > -    except (KeyboardInterrupt, BlockingIOError) as err:
> > > > +    except (KeyboardInterrupt, BlockingIOError, RuntimeError)
> > > > as err:
> > > >          if isinstance(err, KeyboardInterrupt):
> > > >              bb.error('testimage interrupted, shutting
> > > > down...')
> > > >          else:
> > > > 
> > > 
> > > During review it is hard to understand what the real issue is
> > > from this
> > > description. I don't like the sound of processes leaking and if
> > > that is
> > > happening, adding another exception to this list doesn't feel
> > > correct.
> > > I was going to ask for a better explanation but looking at the
> > > code,
> > > perhaps this error handling path just needs rewriting/improving
> > > with
> > > more of the code in the finally, conditionally?
> > > 
> > > I just want to make sure we fix the real bug here.
> > 
> > Sorry for being unclear. I thought the backtrace would be too
> > verbose.
> > 
> > The bug happens when runqemu startup fails:
> > 
> > poky/meta/lib/oeqa/targetcontrol.py:            raise
> > RuntimeError("%s - FAILED to start qemu - check the task log and
> > the boot log" % self.pn)
> > 
> > cooker processes do leak when the exceptions are not cought.
> > Maybe these are not strictly related but it happens for me. It
> > can be that cleanup happens but just slowly, and when I run
> > other bitbake commands right after failure they connect to these
> > leaked cooker processes which then behave badly, for example when
> > build/tmp was already wiped.
> > 
> 
> Sorry for the delay in looking at this patch. I'm a bit worried about
> there being a leaked processes and wanted to understand if there was
> other cleanup we should be doing.
> 
> Instead of this path, would it make sense to move the results.stop()
> inside the finally? I'm worried that other forms of exception would
> also leak processes.

I've looked more at this and I don't understand how this patch works.

By adding the exception to the exception clause, it will trigger a
bb.error() call. The results.stop() call won't happen as results isn't
set in this failure.

The only significant difference would therefore be the results =
tc.results assignment. Does that really stop processes? If so, should
we always be doing that?

I'd like to understand what we need to do to "fix" things so we can
handle other exceptions.

Cheers,

Richard
Mikko Rapeli Jan. 28, 2025, 2:37 p.m. UTC | #5
Hi,

On Tue, Jan 28, 2025 at 01:49:00PM +0000, Richard Purdie wrote:
> On Tue, 2025-01-28 at 13:04 +0000, Richard Purdie via
> lists.openembedded.org wrote:
> > On Mon, 2024-11-18 at 10:00 +0200, Mikko Rapeli wrote:
> > > On Tue, Nov 12, 2024 at 11:25:51AM +0000, Richard Purdie wrote:
> > > > On Mon, 2024-11-11 at 13:16 +0000, Mikko Rapeli via
> > > > lists.openembedded.org wrote:
> > > > > runqemu can fail with RuntimeError exception. Non-cought
> > > > > exception
> > > > > causes cooker process leaks which bind to successive bitbake
> > > > > command
> > > > > line calls and that can cause really odd errors to users, e.g.
> > > > > when
> > > > > build/tmp is wiped and cooker processes expect files to be
> > > > > there.
> > > > > 
> > > > > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > > > > ---
> > > > > �meta/classes-recipe/testimage.bbclass | 2 +-
> > > > > �1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/meta/classes-recipe/testimage.bbclass
> > > > > b/meta/classes-recipe/testimage.bbclass
> > > > > index 19075ce1f3..a9b031093a 100644
> > > > > --- a/meta/classes-recipe/testimage.bbclass
> > > > > +++ b/meta/classes-recipe/testimage.bbclass
> > > > > @@ -371,7 +371,7 @@ def testimage_main(d):
> > > > > �������� complete = True
> > > > > �������� if results.hasAnyFailingTest():
> > > > > ������������ run_failed_tests_post_actions(d, tc)
> > > > > -��� except (KeyboardInterrupt, BlockingIOError) as err:
> > > > > +��� except (KeyboardInterrupt, BlockingIOError, RuntimeError)
> > > > > as err:
> > > > > �������� if isinstance(err, KeyboardInterrupt):
> > > > > ������������ bb.error('testimage interrupted, shutting
> > > > > down...')
> > > > > �������� else:
> > > > > 
> > > > 
> > > > During review it is hard to understand what the real issue is
> > > > from this
> > > > description. I don't like the sound of processes leaking and if
> > > > that is
> > > > happening, adding another exception to this list doesn't feel
> > > > correct.
> > > > I was going to ask for a better explanation but looking at the
> > > > code,
> > > > perhaps this error handling path just needs rewriting/improving
> > > > with
> > > > more of the code in the finally, conditionally?
> > > > 
> > > > I just want to make sure we fix the real bug here.
> > > 
> > > Sorry for being unclear. I thought the backtrace would be too
> > > verbose.
> > > 
> > > The bug happens when runqemu startup fails:
> > > 
> > > poky/meta/lib/oeqa/targetcontrol.py:����������� raise
> > > RuntimeError("%s - FAILED to start qemu - check the task log and
> > > the boot log" % self.pn)
> > > 
> > > cooker processes do leak when the exceptions are not cought.
> > > Maybe these are not strictly related but it happens for me. It
> > > can be that cleanup happens but just slowly, and when I run
> > > other bitbake commands right after failure they connect to these
> > > leaked cooker processes which then behave badly, for example when
> > > build/tmp was already wiped.
> > > 
> > 
> > Sorry for the delay in looking at this patch. I'm a bit worried about
> > there being a leaked processes and wanted to understand if there was
> > other cleanup we should be doing.
> > 
> > Instead of this path, would it make sense to move the results.stop()
> > inside the finally? I'm worried that other forms of exception would
> > also leak processes.
> 
> I've looked more at this and I don't understand how this patch works.
> 
> By adding the exception to the exception clause, it will trigger a
> bb.error() call. The results.stop() call won't happen as results isn't
> set in this failure.
> 
> The only significant difference would therefore be the results =
> tc.results assignment. Does that really stop processes? If so, should
> we always be doing that?
> 
> I'd like to understand what we need to do to "fix" things so we can
> handle other exceptions.

If I remember correctly, I saw this when building meta-arm qemuarm64-secureboot
kas config which runs oeqa runtime tests with testimage.bbclass during build.

Without fix 4b3b37156cdf70230bc74c647784da8514c284cf in poky a single failing test
leaked Cooker processes. Then a rebuild of core-image-base connected to that Cooker
process and somehow failed to re-generate rootfs which became an empty directory
inside .wic image. This resulted qemu boot hanging which again throws RuntimeError
from meta/lib/oeqa/core/target/qemu.py line 52.

Example with poky from October 2024 (from old log files which I still had):

NOTE: Reconnecting to bitbake server...
...
NOTE: recipe core-image-base-1.0-r0: task do_create_image_spdx: Succeeded
QMP Available for connection at /home/builder/src/base/meta-arm/build/tmp/.ndozl5yn
QMP connected to QEMU at 10/31/24 13:55:56 and took 0.56 seconds
QMP released QEMU at 10/31/24 13:55:56 and took 0.19 seconds from connect
Bitbake still alive (no events for 600s). Active tasks:
/home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage
WARNING: core-image-base-1.0-r0 do_testimage: Target didn't reach login banner in 1000 seconds (10/31/24 14:12:36)
WARNING: core-image-base-1.0-r0 do_testimage: Last 25 lines of login console (22962):
[    4.345255] Key type fscrypt-provisioning registered
[    4.363120] Btrfs loaded, zoned=no, fsverity=no
[    4.522255] input: QEMU QEMU USB Tablet as /devices/platform/4010000000.pcie/pci0000:00/0000:00:03.0/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input
/input0
[    4.528765] hid-generic 0003:0627:0001.0001: input: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:03.0-1/input0
[    4.636796] Key type encrypted registered
[    4.644819] printk: legacy console [netcon0] enabled
[    4.645638] netconsole: network logging started
[    4.651065] clk: Disabling unused clocks
[    4.651780] PM: genpd: Disabling unused power domains
[    4.669045] usb 1-2: new high-speed USB device number 3 using xhci_hcd
[    4.794967] Freeing unused kernel memory: 1856K
[    4.803684] Run /init as init process
[    4.847358] input: QEMU QEMU USB Keyboard as /devices/platform/4010000000.pcie/pci0000:00/0000:00:03.0/usb1/1-2/1-2:1.0/0003:0627:0001.0002/inp
ut/input1
[    4.919633] hid-generic 0003:0627:0001.0002: input: USB HID v1.11 Keyboard [QEMU QEMU USB Keyboard] on usb-0000:00:03.0-2/input0
Starting systemd-udevd version 256.6

[    7.407191] virtio_net virtio0 enp0s1: renamed from eth0
[   12.633172] ext3: Unknown parameter 'umask'
[   12.634318] ext2: Unknown parameter 'umask'
[   12.634924] ext4: Unknown parameter 'umask'
[   12.676391] EXT4-fs (vda2): mounted filesystem 41dc66e6-d693-46b3-a5d6-a46998500609 r/w with ordered data mode. Quota mode: disabled.
[   13.080423] /dev/disk/by-label/root: Can't open blockdev
[   13.081201] /dev/disk/by-label/root: Can't open blockdev
ERROR: There's no '/dev' on rootfs.

WARNING: core-image-base-1.0-r0 do_testimage: Check full boot log: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-l
inux/core-image-base/1.0/testimage/qemu_boot_log.20241031135427
Output from runqemu:
runqemu - INFO - Received signal: 15
runqemu - INFO - Cleaning up
runqemu - INFO - Host uptime: 14098967.33

tput: No value for $TERM and no -T specified
WARNING: core-image-base-1.0-r0 do_testimage: Extra log data read: 

Output from runqemu:
runqemu - INFO - Received signal: 15
runqemu - INFO - Cleaning up
runqemu - INFO - Host uptime: 14098967.33

tput: No value for $TERM and no -T specified

Output from runqemu:
runqemu - INFO - Received signal: 15
runqemu - INFO - Cleaning up
runqemu - INFO - Host uptime: 14098967.33

tput: No value for $TERM and no -T specified

ERROR: core-image-base-1.0-r0 do_testimage: Error executing a python function in exec_func_python() autogenerated:

The stack trace of python calls that resulted in this exception/failure was:
File: 'exec_func_python() autogenerated', lineno: 2, function: <module>
     0001:
 *** 0002:do_testimage(d)
     0003:
File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 122, function: do_testimage
     0118:    dump-guest-memory {"paging":false,"protocol":"file:%s.img"}
     0119:}
     0120:
     0121:python do_testimage() {
 *** 0122:    testimage_main(d)
     0123:}
     0124:
     0125:addtask testimage
     0126:do_testimage[nostamp] = "1"
File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 364, function: testimage_main
     0360:    orig_sigterm_handler = signal.signal(signal.SIGTERM, sigterm_exception)
     0361:    try:
     0362:        # We need to check if runqemu ends unexpectedly
     0363:        # or if the worker send us a SIGTERM
 *** 0364:        tc.target.start(params=d.getVar("TEST_QEMUPARAMS"), runqemuparams=d.getVar("TEST_RUNQEMUPARAMS"))
     0365:        import threading
     0366:        try:
     0367:            threading.Timer(int(d.getVar("TEST_OVERALL_TIMEOUT")), handle_test_timeout, (int(d.getVar("TEST_OVERALL_TIMEOUT")),)).start(
)
     0368:        except ValueError:
File: '/home/builder/src/base/meta-arm/build/../poky/meta/lib/oeqa/core/target/qemu.py', lineno: 91, function: start
     0087:            except (subprocess.CalledProcessError, subprocess.TimeoutExpired, FileNotFoundError) as err:
     0088:                msg += "Error running command: %s\n%s\n" % (blcmd, err)
     0089:            msg += "\n\n===== end: snippet =====\n"
     0090:
 *** 0091:            raise RuntimeError("FAILED to start qemu - check the task log and the boot log %s" % (msg))
     0092:
     0093:    def stop(self):
     0094:        self.runner.stop()
Exception: RuntimeError: FAILED to start qemu - check the task log and the boot log 

===== start: snippet =====



===== end: snippet =====

===== start: snippet =====

Error running command: ['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/te
stimage/qemu_boot_log.20241031135427']
Command '['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_b
oot_log.20241031135427']' returned non-zero exit status 1.

===== end: snippet =====


ERROR: Logfile of failure stored in: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/temp/
log.do_testimage.1578012
NOTE: recipe core-image-base-1.0-r0: task do_testimage: Failed
ERROR: Task (/home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage) failed with exit code '1
'
NOTE: Tasks Summary: Attempted 7116 tasks of which 7081 didn't need to be rerun and 1 failed.

Summary: 1 task failed:
  /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage


Since this exit happens without catching RuntimeError, there is no cleanup
and Cooker, qemu-system-aarch64 etc processes get leaked again.

If the RuntimeError is captured, then clean up is done and qemu-system-aarch64
and Cooker processes get killed. Together with
4b3b37156cdf70230bc74c647784da8514c284cf in poky
the failing oeqa runtime test under qemu was handled correctly
and rebuilds always ended up with same failure, specifically
from parselogs which is a bit greedy when detecting errors.

Does this help? If not, I can try to replicate the setup and failures
again, and try different fixes.

Cheers,

-Mikko
Richard Purdie Jan. 28, 2025, 3:10 p.m. UTC | #6
On Tue, 2025-01-28 at 16:37 +0200, Mikko Rapeli wrote:
> On Tue, Jan 28, 2025 at 01:49:00PM +0000, Richard Purdie wrote:
> > On Tue, 2025-01-28 at 13:04 +0000, Richard Purdie via
> > lists.openembedded.org wrote:
> > > 
> > > Sorry for the delay in looking at this patch. I'm a bit worried about
> > > there being a leaked processes and wanted to understand if there was
> > > other cleanup we should be doing.
> > > 
> > > Instead of this path, would it make sense to move the results.stop()
> > > inside the finally? I'm worried that other forms of exception would
> > > also leak processes.
> > 
> > I've looked more at this and I don't understand how this patch works.
> > 
> > By adding the exception to the exception clause, it will trigger a
> > bb.error() call. The results.stop() call won't happen as results isn't
> > set in this failure.
> > 
> > The only significant difference would therefore be the results =
> > tc.results assignment. Does that really stop processes? If so, should
> > we always be doing that?
> > 
> > I'd like to understand what we need to do to "fix" things so we can
> > handle other exceptions.
> 
> If I remember correctly, I saw this when building meta-arm qemuarm64-secureboot
> kas config which runs oeqa runtime tests with testimage.bbclass during build.
> 
> Without fix 4b3b37156cdf70230bc74c647784da8514c284cf in poky a single failing test
> leaked Cooker processes. Then a rebuild of core-image-base connected to that Cooker
> process and somehow failed to re-generate rootfs which became an empty directory
> inside .wic image. This resulted qemu boot hanging which again throws RuntimeError
> from meta/lib/oeqa/core/target/qemu.py line 52.

Just to be clear, is memory resident bitbake set or not
(BB_SERVER_TIMEOUT) ?


> Example with poky from October 2024 (from old log files which I still had):
> 
> NOTE: Reconnecting to bitbake server...
> ...
> NOTE: recipe core-image-base-1.0-r0: task do_create_image_spdx: Succeeded
> QMP Available for connection at /home/builder/src/base/meta-arm/build/tmp/.ndozl5yn
> QMP connected to QEMU at 10/31/24 13:55:56 and took 0.56 seconds
> QMP released QEMU at 10/31/24 13:55:56 and took 0.19 seconds from connect
> Bitbake still alive (no events for 600s). Active tasks:
> /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage
> WARNING: core-image-base-1.0-r0 do_testimage: Target didn't reach login banner in 1000 seconds (10/31/24 14:12:36)
> WARNING: core-image-base-1.0-r0 do_testimage: Last 25 lines of login console (22962):
> [    4.345255] Key type fscrypt-provisioning registered
> [    4.363120] Btrfs loaded, zoned=no, fsverity=no
> [    4.522255] input: QEMU QEMU USB Tablet as /devices/platform/4010000000.pcie/pci0000:00/0000:00:03.0/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input
> /input0
> [    4.528765] hid-generic 0003:0627:0001.0001: input: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:03.0-1/input0
> [    4.636796] Key type encrypted registered
> [    4.644819] printk: legacy console [netcon0] enabled
> [    4.645638] netconsole: network logging started
> [    4.651065] clk: Disabling unused clocks
> [    4.651780] PM: genpd: Disabling unused power domains
> [    4.669045] usb 1-2: new high-speed USB device number 3 using xhci_hcd
> [    4.794967] Freeing unused kernel memory: 1856K
> [    4.803684] Run /init as init process
> [    4.847358] input: QEMU QEMU USB Keyboard as /devices/platform/4010000000.pcie/pci0000:00/0000:00:03.0/usb1/1-2/1-2:1.0/0003:0627:0001.0002/inp
> ut/input1
> [    4.919633] hid-generic 0003:0627:0001.0002: input: USB HID v1.11 Keyboard [QEMU QEMU USB Keyboard] on usb-0000:00:03.0-2/input0
> Starting systemd-udevd version 256.6
> 
> [    7.407191] virtio_net virtio0 enp0s1: renamed from eth0
> [   12.633172] ext3: Unknown parameter 'umask'
> [   12.634318] ext2: Unknown parameter 'umask'
> [   12.634924] ext4: Unknown parameter 'umask'
> [   12.676391] EXT4-fs (vda2): mounted filesystem 41dc66e6-d693-46b3-a5d6-a46998500609 r/w with ordered data mode. Quota mode: disabled.
> [   13.080423] /dev/disk/by-label/root: Can't open blockdev
> [   13.081201] /dev/disk/by-label/root: Can't open blockdev
> ERROR: There's no '/dev' on rootfs.
> 
> WARNING: core-image-base-1.0-r0 do_testimage: Check full boot log: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-l
> inux/core-image-base/1.0/testimage/qemu_boot_log.20241031135427
> Output from runqemu:
> runqemu - INFO - Received signal: 15
> runqemu - INFO - Cleaning up
> runqemu - INFO - Host uptime: 14098967.33
> 
> tput: No value for $TERM and no -T specified
> WARNING: core-image-base-1.0-r0 do_testimage: Extra log data read: 
> 
> Output from runqemu:
> runqemu - INFO - Received signal: 15
> runqemu - INFO - Cleaning up
> runqemu - INFO - Host uptime: 14098967.33
> 
> tput: No value for $TERM and no -T specified
> 
> Output from runqemu:
> runqemu - INFO - Received signal: 15
> runqemu - INFO - Cleaning up
> runqemu - INFO - Host uptime: 14098967.33
> 
> tput: No value for $TERM and no -T specified
> 
> ERROR: core-image-base-1.0-r0 do_testimage: Error executing a python function in exec_func_python() autogenerated:
> 
> The stack trace of python calls that resulted in this exception/failure was:
> File: 'exec_func_python() autogenerated', lineno: 2, function: <module>
>      0001:
>  *** 0002:do_testimage(d)
>      0003:
> File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 122, function: do_testimage
>      0118:    dump-guest-memory {"paging":false,"protocol":"file:%s.img"}
>      0119:}
>      0120:
>      0121:python do_testimage() {
>  *** 0122:    testimage_main(d)
>      0123:}
>      0124:
>      0125:addtask testimage
>      0126:do_testimage[nostamp] = "1"
> File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 364, function: testimage_main
>      0360:    orig_sigterm_handler = signal.signal(signal.SIGTERM, sigterm_exception)
>      0361:    try:
>      0362:        # We need to check if runqemu ends unexpectedly
>      0363:        # or if the worker send us a SIGTERM
>  *** 0364:        tc.target.start(params=d.getVar("TEST_QEMUPARAMS"), runqemuparams=d.getVar("TEST_RUNQEMUPARAMS"))
>      0365:        import threading
>      0366:        try:
>      0367:            threading.Timer(int(d.getVar("TEST_OVERALL_TIMEOUT")), handle_test_timeout, (int(d.getVar("TEST_OVERALL_TIMEOUT")),)).start(
> )
>      0368:        except ValueError:
> File: '/home/builder/src/base/meta-arm/build/../poky/meta/lib/oeqa/core/target/qemu.py', lineno: 91, function: start
>      0087:            except (subprocess.CalledProcessError, subprocess.TimeoutExpired, FileNotFoundError) as err:
>      0088:                msg += "Error running command: %s\n%s\n" % (blcmd, err)
>      0089:            msg += "\n\n===== end: snippet =====\n"
>      0090:
>  *** 0091:            raise RuntimeError("FAILED to start qemu - check the task log and the boot log %s" % (msg))
>      0092:
>      0093:    def stop(self):
>      0094:        self.runner.stop()
> Exception: RuntimeError: FAILED to start qemu - check the task log and the boot log 
> 
> ===== start: snippet =====
> 
> 
> 
> ===== end: snippet =====
> 
> ===== start: snippet =====
> 
> Error running command: ['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/te
> stimage/qemu_boot_log.20241031135427']
> Command '['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_b
> oot_log.20241031135427']' returned non-zero exit status 1.
> 
> ===== end: snippet =====
> 
> 
> ERROR: Logfile of failure stored in: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/temp/
> log.do_testimage.1578012
> NOTE: recipe core-image-base-1.0-r0: task do_testimage: Failed
> ERROR: Task (/home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage) failed with exit code '1
> '
> NOTE: Tasks Summary: Attempted 7116 tasks of which 7081 didn't need to be rerun and 1 failed.
> 
> Summary: 1 task failed:
>   /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage
> 
> 
> Since this exit happens without catching RuntimeError, there is no cleanup
> and Cooker, qemu-system-aarch64 etc processes get leaked again.

Can you point at the "no cleanup"? The code in question has a
try/except/finally clause and the finally should always run. I'm
therefore struggling to work out which cleanup code isn't running.

> If the RuntimeError is captured, then clean up is done and qemu-system-aarch64
> and Cooker processes get killed. Together with
> 4b3b37156cdf70230bc74c647784da8514c284cf in poky
> the failing oeqa runtime test under qemu was handled correctly
> and rebuilds always ended up with same failure, specifically
> from parselogs which is a bit greedy when detecting errors.
> 
> Does this help? If not, I can try to replicate the setup and failures
> again, and try different fixes.

I can't really sed which cleanup code we're missing and I'd like to
understand this before we take the patch as something isn't adding up.

Cheers,

Richard
Mikko Rapeli Jan. 28, 2025, 3:23 p.m. UTC | #7
Hi,

On Tue, Jan 28, 2025 at 03:10:34PM +0000, Richard Purdie wrote:
> On Tue, 2025-01-28 at 16:37 +0200, Mikko Rapeli wrote:
> > On Tue, Jan 28, 2025 at 01:49:00PM +0000, Richard Purdie wrote:
> > > On Tue, 2025-01-28 at 13:04 +0000, Richard Purdie via
> > > lists.openembedded.org wrote:
> > > > 
> > > > Sorry for the delay in looking at this patch. I'm a bit worried about
> > > > there being a leaked processes and wanted to understand if there was
> > > > other cleanup we should be doing.
> > > > 
> > > > Instead of this path, would it make sense to move the results.stop()
> > > > inside the finally? I'm worried that other forms of exception would
> > > > also leak processes.
> > > 
> > > I've looked more at this and I don't understand how this patch works.
> > > 
> > > By adding the exception to the exception clause, it will trigger a
> > > bb.error() call. The results.stop() call won't happen as results isn't
> > > set in this failure.
> > > 
> > > The only significant difference would therefore be the results =
> > > tc.results assignment. Does that really stop processes? If so, should
> > > we always be doing that?
> > > 
> > > I'd like to understand what we need to do to "fix" things so we can
> > > handle other exceptions.
> > 
> > If I remember correctly, I saw this when building meta-arm qemuarm64-secureboot
> > kas config which runs oeqa runtime tests with testimage.bbclass during build.
> > 
> > Without fix 4b3b37156cdf70230bc74c647784da8514c284cf in poky a single failing test
> > leaked Cooker processes. Then a rebuild of core-image-base connected to that Cooker
> > process and somehow failed to re-generate rootfs which became an empty directory
> > inside .wic image. This resulted qemu boot hanging which again throws RuntimeError
> > from meta/lib/oeqa/core/target/qemu.py line 52.
> 
> Just to be clear, is memory resident bitbake set or not
> (BB_SERVER_TIMEOUT) ?

meta-arm builds set:

ci/base.yml:    BB_SERVER_TIMEOUT = "300"

I was building in kas shell.

> > Example with poky from October 2024 (from old log files which I still had):
> > 
> > NOTE: Reconnecting to bitbake server...
> > ...
> > NOTE: recipe core-image-base-1.0-r0: task do_create_image_spdx: Succeeded
> > QMP Available for connection at /home/builder/src/base/meta-arm/build/tmp/.ndozl5yn
> > QMP connected to QEMU at 10/31/24 13:55:56 and took 0.56 seconds
> > QMP released QEMU at 10/31/24 13:55:56 and took 0.19 seconds from connect
> > Bitbake still alive (no events for 600s). Active tasks:
> > /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage
> > WARNING: core-image-base-1.0-r0 do_testimage: Target didn't reach login banner in 1000 seconds (10/31/24 14:12:36)
> > WARNING: core-image-base-1.0-r0 do_testimage: Last 25 lines of login console (22962):
> > [��� 4.345255] Key type fscrypt-provisioning registered
> > [��� 4.363120] Btrfs loaded, zoned=no, fsverity=no
> > [��� 4.522255] input: QEMU QEMU USB Tablet as /devices/platform/4010000000.pcie/pci0000:00/0000:00:03.0/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input
> > /input0
> > [��� 4.528765] hid-generic 0003:0627:0001.0001: input: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:03.0-1/input0
> > [��� 4.636796] Key type encrypted registered
> > [��� 4.644819] printk: legacy console [netcon0] enabled
> > [��� 4.645638] netconsole: network logging started
> > [��� 4.651065] clk: Disabling unused clocks
> > [��� 4.651780] PM: genpd: Disabling unused power domains
> > [��� 4.669045] usb 1-2: new high-speed USB device number 3 using xhci_hcd
> > [��� 4.794967] Freeing unused kernel memory: 1856K
> > [��� 4.803684] Run /init as init process
> > [��� 4.847358] input: QEMU QEMU USB Keyboard as /devices/platform/4010000000.pcie/pci0000:00/0000:00:03.0/usb1/1-2/1-2:1.0/0003:0627:0001.0002/inp
> > ut/input1
> > [��� 4.919633] hid-generic 0003:0627:0001.0002: input: USB HID v1.11 Keyboard [QEMU QEMU USB Keyboard] on usb-0000:00:03.0-2/input0
> > Starting systemd-udevd version 256.6
> > 
> > [��� 7.407191] virtio_net virtio0 enp0s1: renamed from eth0
> > [�� 12.633172] ext3: Unknown parameter 'umask'
> > [�� 12.634318] ext2: Unknown parameter 'umask'
> > [�� 12.634924] ext4: Unknown parameter 'umask'
> > [�� 12.676391] EXT4-fs (vda2): mounted filesystem 41dc66e6-d693-46b3-a5d6-a46998500609 r/w with ordered data mode. Quota mode: disabled.
> > [�� 13.080423] /dev/disk/by-label/root: Can't open blockdev
> > [�� 13.081201] /dev/disk/by-label/root: Can't open blockdev
> > ERROR: There's no '/dev' on rootfs.
> > 
> > WARNING: core-image-base-1.0-r0 do_testimage: Check full boot log: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-l
> > inux/core-image-base/1.0/testimage/qemu_boot_log.20241031135427
> > Output from runqemu:
> > runqemu - INFO - Received signal: 15
> > runqemu - INFO - Cleaning up
> > runqemu - INFO - Host uptime: 14098967.33
> > 
> > tput: No value for $TERM and no -T specified
> > WARNING: core-image-base-1.0-r0 do_testimage: Extra log data read: 
> > 
> > Output from runqemu:
> > runqemu - INFO - Received signal: 15
> > runqemu - INFO - Cleaning up
> > runqemu - INFO - Host uptime: 14098967.33
> > 
> > tput: No value for $TERM and no -T specified
> > 
> > Output from runqemu:
> > runqemu - INFO - Received signal: 15
> > runqemu - INFO - Cleaning up
> > runqemu - INFO - Host uptime: 14098967.33
> > 
> > tput: No value for $TERM and no -T specified
> > 
> > ERROR: core-image-base-1.0-r0 do_testimage: Error executing a python function in exec_func_python() autogenerated:
> > 
> > The stack trace of python calls that resulted in this exception/failure was:
> > File: 'exec_func_python() autogenerated', lineno: 2, function: <module>
> > ���� 0001:
> > �*** 0002:do_testimage(d)
> > ���� 0003:
> > File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 122, function: do_testimage
> > ���� 0118:��� dump-guest-memory {"paging":false,"protocol":"file:%s.img"}
> > ���� 0119:}
> > ���� 0120:
> > ���� 0121:python do_testimage() {
> > �*** 0122:��� testimage_main(d)
> > ���� 0123:}
> > ���� 0124:
> > ���� 0125:addtask testimage
> > ���� 0126:do_testimage[nostamp] = "1"
> > File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 364, function: testimage_main
> > ���� 0360:��� orig_sigterm_handler = signal.signal(signal.SIGTERM, sigterm_exception)
> > ���� 0361:��� try:
> > ���� 0362:������� # We need to check if runqemu ends unexpectedly
> > ���� 0363:������� # or if the worker send us a SIGTERM
> > �*** 0364:������� tc.target.start(params=d.getVar("TEST_QEMUPARAMS"), runqemuparams=d.getVar("TEST_RUNQEMUPARAMS"))
> > ���� 0365:������� import threading
> > ���� 0366:������� try:
> > ���� 0367:����������� threading.Timer(int(d.getVar("TEST_OVERALL_TIMEOUT")), handle_test_timeout, (int(d.getVar("TEST_OVERALL_TIMEOUT")),)).start(
> > )
> > ���� 0368:������� except ValueError:
> > File: '/home/builder/src/base/meta-arm/build/../poky/meta/lib/oeqa/core/target/qemu.py', lineno: 91, function: start
> > ���� 0087:����������� except (subprocess.CalledProcessError, subprocess.TimeoutExpired, FileNotFoundError) as err:
> > ���� 0088:��������������� msg += "Error running command: %s\n%s\n" % (blcmd, err)
> > ���� 0089:����������� msg += "\n\n===== end: snippet =====\n"
> > ���� 0090:
> > �*** 0091:����������� raise RuntimeError("FAILED to start qemu - check the task log and the boot log %s" % (msg))
> > ���� 0092:
> > ���� 0093:��� def stop(self):
> > ���� 0094:������� self.runner.stop()
> > Exception: RuntimeError: FAILED to start qemu - check the task log and the boot log 
> > 
> > ===== start: snippet =====
> > 
> > 
> > 
> > ===== end: snippet =====
> > 
> > ===== start: snippet =====
> > 
> > Error running command: ['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/te
> > stimage/qemu_boot_log.20241031135427']
> > Command '['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_b
> > oot_log.20241031135427']' returned non-zero exit status 1.
> > 
> > ===== end: snippet =====
> > 
> > 
> > ERROR: Logfile of failure stored in: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/temp/
> > log.do_testimage.1578012
> > NOTE: recipe core-image-base-1.0-r0: task do_testimage: Failed
> > ERROR: Task (/home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage) failed with exit code '1
> > '
> > NOTE: Tasks Summary: Attempted 7116 tasks of which 7081 didn't need to be rerun and 1 failed.
> > 
> > Summary: 1 task failed:
> > � /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage
> > 
> > 
> > Since this exit happens without catching RuntimeError, there is no cleanup
> > and Cooker, qemu-system-aarch64 etc processes get leaked again.
> 
> Can you point at the "no cleanup"? The code in question has a
> try/except/finally clause and the finally should always run. I'm
> therefore struggling to work out which cleanup code isn't running.
> 
> > If the RuntimeError is captured, then clean up is done and qemu-system-aarch64
> > and Cooker processes get killed. Together with
> > 4b3b37156cdf70230bc74c647784da8514c284cf in poky
> > the failing oeqa runtime test under qemu was handled correctly
> > and rebuilds always ended up with same failure, specifically
> > from parselogs which is a bit greedy when detecting errors.
> > 
> > Does this help? If not, I can try to replicate the setup and failures
> > again, and try different fixes.
> 
> I can't really sed which cleanup code we're missing and I'd like to
> understand this before we take the patch as something isn't adding up.

With TESTIMAGE_AUTO = "1" in local.conf, I have seen a lot of failures to kill
Cooker and qemu-system-aarch64 processes. Running tests with slirp networking
over ssh and even pressing ctrl-c multiple times on the build leaves those into
the background. Manual cleanup is needed.

I will setup the builds again to test these out.

Cheers,

-Mikko
diff mbox series

Patch

diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
index 19075ce1f3..a9b031093a 100644
--- a/meta/classes-recipe/testimage.bbclass
+++ b/meta/classes-recipe/testimage.bbclass
@@ -371,7 +371,7 @@  def testimage_main(d):
         complete = True
         if results.hasAnyFailingTest():
             run_failed_tests_post_actions(d, tc)
-    except (KeyboardInterrupt, BlockingIOError) as err:
+    except (KeyboardInterrupt, BlockingIOError, RuntimeError) as err:
         if isinstance(err, KeyboardInterrupt):
             bb.error('testimage interrupted, shutting down...')
         else: