diff mbox series

oeqa/runtime/ssh: In case of failure, show exit code and handle -15

Message ID 20240704114148.3218721-1-richard.purdie@linuxfoundation.org
State New
Headers show
Series oeqa/runtime/ssh: In case of failure, show exit code and handle -15 | expand

Commit Message

Richard Purdie July 4, 2024, 11:41 a.m. UTC
Ensure we show the failing exit code in case of failures.

We're seeing autobuilder failures with -15 which is probably from slow
boot/init. Retry in these cases for now.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/lib/oeqa/runtime/cases/ssh.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Andrew Murray July 4, 2024, 12:10 p.m. UTC | #1
On Thu, 4 Jul 2024 at 12:41, Richard Purdie via lists.openembedded.org
<richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
>
> Ensure we show the failing exit code in case of failures.
>
> We're seeing autobuilder failures with -15 which is probably from slow
> boot/init. Retry in these cases for now.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>  meta/lib/oeqa/runtime/cases/ssh.py | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/meta/lib/oeqa/runtime/cases/ssh.py b/meta/lib/oeqa/runtime/cases/ssh.py
> index ae92bb34cd9..9295ed14787 100644
> --- a/meta/lib/oeqa/runtime/cases/ssh.py
> +++ b/meta/lib/oeqa/runtime/cases/ssh.py
> @@ -19,16 +19,18 @@ class SSHTest(OERuntimeTestCase):
>            status, output = self.target.run("uname -a", timeout=5)
>            if status == 0:
>                break
> -          elif status == 255:
> +          elif status == 255 or status == -15:

According to subprocess.returncode documentation, negative values here
indicate that the process was killed by a signal. Thus, this is likely
a SIGTERM. If so, you could add clarity here by using -signals.SIGTERM
or similar.

>                # ssh returns 255 only if a ssh error occurs.  This could
>                # be an issue with "Connection refused" because the port
>                # isn't open yet, and this could check explicitly for that
>                # here.  However, let's keep it simple and just retry for
>                # all errors a limited amount of times with a sleep to
>                # give it time for the port to open.
> +              # We sometimes see -15 on slow emulation machines too, likely
> +              # from boot/init not being 100% complete, retry for these too.

I was curious to understand why you get a SIGTERM, but your comment
here seems reasonable.

>                time.sleep(5)
>                continue
>            else:
> -              self.fail("uname failed with \"%s\"" %output)
> +              self.fail("uname failed with \"%s\" (exit code %s)" % (output, status))
>          if status == 255:
>              self.fail("ssh error %s" %output)
>

-15 or SIGTERM, feel free to add by:

Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

Thanks,

Andrew Murray
diff mbox series

Patch

diff --git a/meta/lib/oeqa/runtime/cases/ssh.py b/meta/lib/oeqa/runtime/cases/ssh.py
index ae92bb34cd9..9295ed14787 100644
--- a/meta/lib/oeqa/runtime/cases/ssh.py
+++ b/meta/lib/oeqa/runtime/cases/ssh.py
@@ -19,16 +19,18 @@  class SSHTest(OERuntimeTestCase):
           status, output = self.target.run("uname -a", timeout=5)
           if status == 0:
               break
-          elif status == 255:
+          elif status == 255 or status == -15:
               # ssh returns 255 only if a ssh error occurs.  This could
               # be an issue with "Connection refused" because the port
               # isn't open yet, and this could check explicitly for that
               # here.  However, let's keep it simple and just retry for
               # all errors a limited amount of times with a sleep to
               # give it time for the port to open.
+              # We sometimes see -15 on slow emulation machines too, likely
+              # from boot/init not being 100% complete, retry for these too.
               time.sleep(5)
               continue
           else:
-              self.fail("uname failed with \"%s\"" %output)
+              self.fail("uname failed with \"%s\" (exit code %s)" % (output, status))
         if status == 255:
             self.fail("ssh error %s" %output)