Message ID | 20240704114148.3218721-1-richard.purdie@linuxfoundation.org |
---|---|
State | New |
Headers | show |
Series | oeqa/runtime/ssh: In case of failure, show exit code and handle -15 | expand |
On Thu, 4 Jul 2024 at 12:41, Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote: > > Ensure we show the failing exit code in case of failures. > > We're seeing autobuilder failures with -15 which is probably from slow > boot/init. Retry in these cases for now. > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> > --- > meta/lib/oeqa/runtime/cases/ssh.py | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/meta/lib/oeqa/runtime/cases/ssh.py b/meta/lib/oeqa/runtime/cases/ssh.py > index ae92bb34cd9..9295ed14787 100644 > --- a/meta/lib/oeqa/runtime/cases/ssh.py > +++ b/meta/lib/oeqa/runtime/cases/ssh.py > @@ -19,16 +19,18 @@ class SSHTest(OERuntimeTestCase): > status, output = self.target.run("uname -a", timeout=5) > if status == 0: > break > - elif status == 255: > + elif status == 255 or status == -15: According to subprocess.returncode documentation, negative values here indicate that the process was killed by a signal. Thus, this is likely a SIGTERM. If so, you could add clarity here by using -signals.SIGTERM or similar. > # ssh returns 255 only if a ssh error occurs. This could > # be an issue with "Connection refused" because the port > # isn't open yet, and this could check explicitly for that > # here. However, let's keep it simple and just retry for > # all errors a limited amount of times with a sleep to > # give it time for the port to open. > + # We sometimes see -15 on slow emulation machines too, likely > + # from boot/init not being 100% complete, retry for these too. I was curious to understand why you get a SIGTERM, but your comment here seems reasonable. > time.sleep(5) > continue > else: > - self.fail("uname failed with \"%s\"" %output) > + self.fail("uname failed with \"%s\" (exit code %s)" % (output, status)) > if status == 255: > self.fail("ssh error %s" %output) > -15 or SIGTERM, feel free to add by: Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Thanks, Andrew Murray
diff --git a/meta/lib/oeqa/runtime/cases/ssh.py b/meta/lib/oeqa/runtime/cases/ssh.py index ae92bb34cd9..9295ed14787 100644 --- a/meta/lib/oeqa/runtime/cases/ssh.py +++ b/meta/lib/oeqa/runtime/cases/ssh.py @@ -19,16 +19,18 @@ class SSHTest(OERuntimeTestCase): status, output = self.target.run("uname -a", timeout=5) if status == 0: break - elif status == 255: + elif status == 255 or status == -15: # ssh returns 255 only if a ssh error occurs. This could # be an issue with "Connection refused" because the port # isn't open yet, and this could check explicitly for that # here. However, let's keep it simple and just retry for # all errors a limited amount of times with a sleep to # give it time for the port to open. + # We sometimes see -15 on slow emulation machines too, likely + # from boot/init not being 100% complete, retry for these too. time.sleep(5) continue else: - self.fail("uname failed with \"%s\"" %output) + self.fail("uname failed with \"%s\" (exit code %s)" % (output, status)) if status == 255: self.fail("ssh error %s" %output)
Ensure we show the failing exit code in case of failures. We're seeing autobuilder failures with -15 which is probably from slow boot/init. Retry in these cases for now. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> --- meta/lib/oeqa/runtime/cases/ssh.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)