diff mbox series

oeqa/sshcontrol: Handle empty reads

Message ID 20250125113326.2475346-1-richard.purdie@linuxfoundation.org
State Accepted, archived
Commit 9d91a5674c515a43ae76d8615f72e5e2dc16c961
Headers show
Series oeqa/sshcontrol: Handle empty reads | expand

Commit Message

Richard Purdie Jan. 25, 2025, 11:33 a.m. UTC
Looking at some of the autobuilder failures, it seems that somehow empty
reads might be possible despite not being EOF. Tweak the code to be a little
more robust in handling this.

In theory this shouldn't be possible but python does handle signals a bit
differently (e.g. transparrently retrying syscalls for EINTR) so adding this
check and a bit of code safety at least rules out this problem.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/lib/oeqa/utils/sshcontrol.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Alexander Kanavin June 5, 2025, 8:46 p.m. UTC | #1
On Sat, 25 Jan 2025 at 12:33, Richard Purdie via
lists.openembedded.org
<richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
>
> Looking at some of the autobuilder failures, it seems that somehow empty
> reads might be possible despite not being EOF. Tweak the code to be a little
> more robust in handling this.
>
> In theory this shouldn't be possible but python does handle signals a bit
> differently (e.g. transparrently retrying syscalls for EINTR) so adding this
> check and a bit of code safety at least rules out this problem.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>  meta/lib/oeqa/utils/sshcontrol.py | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/meta/lib/oeqa/utils/sshcontrol.py b/meta/lib/oeqa/utils/sshcontrol.py
> index 36c2ecb3dba..6c5648779a5 100644
> --- a/meta/lib/oeqa/utils/sshcontrol.py
> +++ b/meta/lib/oeqa/utils/sshcontrol.py
> @@ -57,8 +57,10 @@ class SSHProcess(object):
>                      if select.select([self.process.stdout], [], [], 5)[0] != []:
>                          data = os.read(self.process.stdout.fileno(), 1024)
>                          if not data:
> -                            self.process.stdout.close()
> -                            eof = True
> +                            self.process.poll()
> +                            if self.process.returncode is None:
> +                                self.process.stdout.close()
> +                                eof = True
>                          else:
>                              data = data.decode("utf-8")
>                              output += data

This badly regresses selftest execution times when a a selftest is
issuing many commands over ssh on a target image. I've been wondering
why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu
takes almost an hour to complete (on fully populated sstate), and
noticed that every qemu.run() call takes 300 seconds to return from
(which means the loop exits on timeout, and not on the intended
poll()-based code path added above), even if the target command itself
instantly finishes. Reverting the commit confirms this, the test time
goes down to 12 minutes :-(

I have to close the laptop and stop now, but wanted to get this notice
out, as it's significant.

Alex
Alexander Kanavin June 6, 2025, 9 a.m. UTC | #2
On Thu, 5 Jun 2025 at 22:46, Alexander Kanavin via
lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org>
wrote:
> This badly regresses selftest execution times when a a selftest is
> issuing many commands over ssh on a target image. I've been wondering
> why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu
> takes almost an hour to complete (on fully populated sstate), and
> noticed that every qemu.run() call takes 300 seconds to return from
> (which means the loop exits on timeout, and not on the intended
> poll()-based code path added above), even if the target command itself
> instantly finishes. Reverting the commit confirms this, the test time
> goes down to 12 minutes :-(

I sent a patch for this. It's not a revert :)

Alex
Richard Purdie June 6, 2025, 9:22 a.m. UTC | #3
On Fri, 2025-06-06 at 11:00 +0200, Alexander Kanavin wrote:
> On Thu, 5 Jun 2025 at 22:46, Alexander Kanavin via
> lists.openembedded.org
> <alex.kanavin=gmail.com@lists.openembedded.org>
> wrote:
> > This badly regresses selftest execution times when a a selftest is
> > issuing many commands over ssh on a target image. I've been
> > wondering
> > why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu
> > takes almost an hour to complete (on fully populated sstate), and
> > noticed that every qemu.run() call takes 300 seconds to return from
> > (which means the loop exits on timeout, and not on the intended
> > poll()-based code path added above), even if the target command
> > itself
> > instantly finishes. Reverting the commit confirms this, the test
> > time
> > goes down to 12 minutes :-(
> 
> I sent a patch for this. It's not a revert :)

Well spotted, thanks!

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/lib/oeqa/utils/sshcontrol.py b/meta/lib/oeqa/utils/sshcontrol.py
index 36c2ecb3dba..6c5648779a5 100644
--- a/meta/lib/oeqa/utils/sshcontrol.py
+++ b/meta/lib/oeqa/utils/sshcontrol.py
@@ -57,8 +57,10 @@  class SSHProcess(object):
                     if select.select([self.process.stdout], [], [], 5)[0] != []:
                         data = os.read(self.process.stdout.fileno(), 1024)
                         if not data:
-                            self.process.stdout.close()
-                            eof = True
+                            self.process.poll()
+                            if self.process.returncode is None:
+                                self.process.stdout.close()
+                                eof = True
                         else:
                             data = data.decode("utf-8")
                             output += data