oeqa/sshcontrol: Handle empty reads

Message ID	20250125113326.2475346-1-richard.purdie@linuxfoundation.org
State	Accepted, archived
Commit	9d91a5674c515a43ae76d8615f72e5e2dc16c961
Headers	show Return-Path: <richard.purdie@linuxfoundation.org> ip: 209.85.128.51, mailfrom: richard.purdie@linuxfoundation.org) From: Richard Purdie <richard.purdie@linuxfoundation.org> To: openembedded-core@lists.openembedded.org Subject: [PATCH] oeqa/sshcontrol: Handle empty reads Date: Sat, 25 Jan 2025 11:33:26 +0000 Message-ID: <20250125113326.2475346-1-richard.purdie@linuxfoundation.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	oeqa/sshcontrol: Handle empty reads \| expand oeqa/sshcontrol: Handle empty reads

Message ID

20250125113326.2475346-1-richard.purdie@linuxfoundation.org

State

Accepted, archived

Commit

9d91a5674c515a43ae76d8615f72e5e2dc16c961

Headers

From: Richard Purdie <richard.purdie@linuxfoundation.org>
To: openembedded-core@lists.openembedded.org
Subject: [PATCH] oeqa/sshcontrol: Handle empty reads
Date: Sat, 25 Jan 2025 11:33:26 +0000
Message-ID: <20250125113326.2475346-1-richard.purdie@linuxfoundation.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

oeqa/sshcontrol: Handle empty reads | expand

Commit Message

Richard Purdie Jan. 25, 2025, 11:33 a.m. UTC

Looking at some of the autobuilder failures, it seems that somehow empty
reads might be possible despite not being EOF. Tweak the code to be a little
more robust in handling this.

In theory this shouldn't be possible but python does handle signals a bit
differently (e.g. transparrently retrying syscalls for EINTR) so adding this
check and a bit of code safety at least rules out this problem.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/lib/oeqa/utils/sshcontrol.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Alexander Kanavin June 5, 2025, 8:46 p.m. UTC | #1

On Sat, 25 Jan 2025 at 12:33, Richard Purdie via
lists.openembedded.org
<richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
>
> Looking at some of the autobuilder failures, it seems that somehow empty
> reads might be possible despite not being EOF. Tweak the code to be a little
> more robust in handling this.
>
> In theory this shouldn't be possible but python does handle signals a bit
> differently (e.g. transparrently retrying syscalls for EINTR) so adding this
> check and a bit of code safety at least rules out this problem.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>  meta/lib/oeqa/utils/sshcontrol.py | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/meta/lib/oeqa/utils/sshcontrol.py b/meta/lib/oeqa/utils/sshcontrol.py
> index 36c2ecb3dba..6c5648779a5 100644
> --- a/meta/lib/oeqa/utils/sshcontrol.py
> +++ b/meta/lib/oeqa/utils/sshcontrol.py
> @@ -57,8 +57,10 @@ class SSHProcess(object):
>                      if select.select([self.process.stdout], [], [], 5)[0] != []:
>                          data = os.read(self.process.stdout.fileno(), 1024)
>                          if not data:
> -                            self.process.stdout.close()
> -                            eof = True
> +                            self.process.poll()
> +                            if self.process.returncode is None:
> +                                self.process.stdout.close()
> +                                eof = True
>                          else:
>                              data = data.decode("utf-8")
>                              output += data

This badly regresses selftest execution times when a a selftest is
issuing many commands over ssh on a target image. I've been wondering
why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu
takes almost an hour to complete (on fully populated sstate), and
noticed that every qemu.run() call takes 300 seconds to return from
(which means the loop exits on timeout, and not on the intended
poll()-based code path added above), even if the target command itself
instantly finishes. Reverting the commit confirms this, the test time
goes down to 12 minutes :-(

I have to close the laptop and stop now, but wanted to get this notice
out, as it's significant.

Alex

Alexander Kanavin June 6, 2025, 9 a.m. UTC | #2

On Thu, 5 Jun 2025 at 22:46, Alexander Kanavin via
lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org>
wrote:
> This badly regresses selftest execution times when a a selftest is
> issuing many commands over ssh on a target image. I've been wondering
> why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu
> takes almost an hour to complete (on fully populated sstate), and
> noticed that every qemu.run() call takes 300 seconds to return from
> (which means the loop exits on timeout, and not on the intended
> poll()-based code path added above), even if the target command itself
> instantly finishes. Reverting the commit confirms this, the test time
> goes down to 12 minutes :-(

I sent a patch for this. It's not a revert :)

Alex

Richard Purdie June 6, 2025, 9:22 a.m. UTC | #3

On Fri, 2025-06-06 at 11:00 +0200, Alexander Kanavin wrote:
> On Thu, 5 Jun 2025 at 22:46, Alexander Kanavin via
> lists.openembedded.org
> <alex.kanavin=gmail.com@lists.openembedded.org>
> wrote:
> > This badly regresses selftest execution times when a a selftest is
> > issuing many commands over ssh on a target image. I've been
> > wondering
> > why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu
> > takes almost an hour to complete (on fully populated sstate), and
> > noticed that every qemu.run() call takes 300 seconds to return from
> > (which means the loop exits on timeout, and not on the intended
> > poll()-based code path added above), even if the target command
> > itself
> > instantly finishes. Reverting the commit confirms this, the test
> > time
> > goes down to 12 minutes :-(
> 
> I sent a patch for this. It's not a revert :)

Well spotted, thanks!

Cheers,

Richard

diff --git a/meta/lib/oeqa/utils/sshcontrol.py b/meta/lib/oeqa/utils/sshcontrol.py
index 36c2ecb3dba..6c5648779a5 100644
--- a/meta/lib/oeqa/utils/sshcontrol.py
+++ b/meta/lib/oeqa/utils/sshcontrol.py
@@ -57,8 +57,10 @@  class SSHProcess(object):
                     if select.select([self.process.stdout], [], [], 5)[0] != []:
                         data = os.read(self.process.stdout.fileno(), 1024)
                         if not data:
-                            self.process.stdout.close()
-                            eof = True
+                            self.process.poll()
+                            if self.process.returncode is None:
+                                self.process.stdout.close()
+                                eof = True
                         else:
                             data = data.decode("utf-8")
                             output += data

oeqa/sshcontrol: Handle empty reads

Commit Message

Comments

Patch