| Message ID | 20250125113326.2475346-1-richard.purdie@linuxfoundation.org |
|---|---|
| State | Accepted, archived |
| Commit | 9d91a5674c515a43ae76d8615f72e5e2dc16c961 |
| Headers | show |
| Series | oeqa/sshcontrol: Handle empty reads | expand |
On Sat, 25 Jan 2025 at 12:33, Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote: > > Looking at some of the autobuilder failures, it seems that somehow empty > reads might be possible despite not being EOF. Tweak the code to be a little > more robust in handling this. > > In theory this shouldn't be possible but python does handle signals a bit > differently (e.g. transparrently retrying syscalls for EINTR) so adding this > check and a bit of code safety at least rules out this problem. > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> > --- > meta/lib/oeqa/utils/sshcontrol.py | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/meta/lib/oeqa/utils/sshcontrol.py b/meta/lib/oeqa/utils/sshcontrol.py > index 36c2ecb3dba..6c5648779a5 100644 > --- a/meta/lib/oeqa/utils/sshcontrol.py > +++ b/meta/lib/oeqa/utils/sshcontrol.py > @@ -57,8 +57,10 @@ class SSHProcess(object): > if select.select([self.process.stdout], [], [], 5)[0] != []: > data = os.read(self.process.stdout.fileno(), 1024) > if not data: > - self.process.stdout.close() > - eof = True > + self.process.poll() > + if self.process.returncode is None: > + self.process.stdout.close() > + eof = True > else: > data = data.decode("utf-8") > output += data This badly regresses selftest execution times when a a selftest is issuing many commands over ssh on a target image. I've been wondering why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu takes almost an hour to complete (on fully populated sstate), and noticed that every qemu.run() call takes 300 seconds to return from (which means the loop exits on timeout, and not on the intended poll()-based code path added above), even if the target command itself instantly finishes. Reverting the commit confirms this, the test time goes down to 12 minutes :-( I have to close the laptop and stop now, but wanted to get this notice out, as it's significant. Alex
On Thu, 5 Jun 2025 at 22:46, Alexander Kanavin via lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org> wrote: > This badly regresses selftest execution times when a a selftest is > issuing many commands over ssh on a target image. I've been wondering > why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu > takes almost an hour to complete (on fully populated sstate), and > noticed that every qemu.run() call takes 300 seconds to return from > (which means the loop exits on timeout, and not on the intended > poll()-based code path added above), even if the target command itself > instantly finishes. Reverting the commit confirms this, the test time > goes down to 12 minutes :-( I sent a patch for this. It's not a revert :) Alex
On Fri, 2025-06-06 at 11:00 +0200, Alexander Kanavin wrote: > On Thu, 5 Jun 2025 at 22:46, Alexander Kanavin via > lists.openembedded.org > <alex.kanavin=gmail.com@lists.openembedded.org> > wrote: > > This badly regresses selftest execution times when a a selftest is > > issuing many commands over ssh on a target image. I've been > > wondering > > why e.g. devtool.DevtoolIdeSdkTests.test_devtool_ide_sdk_none_qemu > > takes almost an hour to complete (on fully populated sstate), and > > noticed that every qemu.run() call takes 300 seconds to return from > > (which means the loop exits on timeout, and not on the intended > > poll()-based code path added above), even if the target command > > itself > > instantly finishes. Reverting the commit confirms this, the test > > time > > goes down to 12 minutes :-( > > I sent a patch for this. It's not a revert :) Well spotted, thanks! Cheers, Richard
diff --git a/meta/lib/oeqa/utils/sshcontrol.py b/meta/lib/oeqa/utils/sshcontrol.py index 36c2ecb3dba..6c5648779a5 100644 --- a/meta/lib/oeqa/utils/sshcontrol.py +++ b/meta/lib/oeqa/utils/sshcontrol.py @@ -57,8 +57,10 @@ class SSHProcess(object): if select.select([self.process.stdout], [], [], 5)[0] != []: data = os.read(self.process.stdout.fileno(), 1024) if not data: - self.process.stdout.close() - eof = True + self.process.poll() + if self.process.returncode is None: + self.process.stdout.close() + eof = True else: data = data.decode("utf-8") output += data
Looking at some of the autobuilder failures, it seems that somehow empty reads might be possible despite not being EOF. Tweak the code to be a little more robust in handling this. In theory this shouldn't be possible but python does handle signals a bit differently (e.g. transparrently retrying syscalls for EINTR) so adding this check and a bit of code safety at least rules out this problem. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> --- meta/lib/oeqa/utils/sshcontrol.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)