From patchwork Thu Feb 9 08:09:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82B4BC64EC7 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web11.9260.1675930190051697966 for ; Thu, 09 Feb 2023 00:09:50 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ204-00B9so-3q; Thu, 09 Feb 2023 10:09:48 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ203-000ckB-QU; Thu, 09 Feb 2023 10:09:47 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 1/8] oeqa ssh.py: move output prints to new line Date: Thu, 9 Feb 2023 10:09:29 +0200 Message-Id: <20230209080936.148489-2-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-2-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176920 The output from is garbled otherwise and it's not easy to remove debug output form real command output on target. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/core/target/ssh.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py index 7a10ba9763..13fd5b2a49 100644 --- a/meta/lib/oeqa/core/target/ssh.py +++ b/meta/lib/oeqa/core/target/ssh.py @@ -240,7 +240,7 @@ def SSHCall(command, logger, timeout=None, **opts): eof = True else: output += data - logger.debug('Partial data from SSH call: %s' % data) + logger.debug('Partial data from SSH call:\n%s' % data) endtime = time.time() + timeout except InterruptedError: continue @@ -256,12 +256,12 @@ def SSHCall(command, logger, timeout=None, **opts): endtime = time.time() - starttime lastline = ("\nProcess killed - no output for %d seconds. Total" " running time: %d seconds." % (timeout, endtime)) - logger.debug('Received data from SSH call %s ' % lastline) + logger.debug('Received data from SSH call:\n%s ' % lastline) output += lastline else: output = process.communicate()[0].decode('utf-8', errors='ignore') - logger.debug('Data from SSH call: %s' % output.rstrip()) + logger.debug('Data from SSH call:\n%s' % output.rstrip()) options = { "stdout": subprocess.PIPE, From patchwork Thu Feb 9 08:09:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 659FFC61DA4 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web11.9261.1675930190900695518 for ; Thu, 09 Feb 2023 00:09:51 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ205-00B9tB-23; Thu, 09 Feb 2023 10:09:49 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ204-000cl1-O2; Thu, 09 Feb 2023 10:09:48 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 2/8] oeqa ssh.py: fix hangs in run() Date: Thu, 9 Feb 2023 10:09:30 +0200 Message-Id: <20230209080936.148489-3-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-3-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176921 When qemu machine hangs, the ssh commands done by tests are not timing out. do_testimage() task has last logs like this: DEBUG: time: 1673531086.3155053, endtime: 1673531686.315502 The test process is stuck for hours, or for ever if the executing command or test case did not set a timeout correctly. The default 300 second timeout is not working when target hangs. Note that timeout is really a "inactive timeout" since data returned by the process will reset the timeout. Make the process stdout non-blocking so read() will always return right away using os.set_blocking() available in python 3.5 and later. Then change from python codec reader to plain read() and make the ssh subprocess stdout non-blocking. Even with select() making sure the file had input to be read, the codec reader was trying to find more stuff and blocking for ever when process hangs. While at it, add a small timeout to read data in larger chunks if possible. This avoids reading data one or few characters at a time and makes the debug logs more readable. close() the stdout file in all cases after read loop is complete. Then make sure to wait or kill the ssh subprocess in all cases. Just reading the output stream and receiving EOF there does not mean that the process exited, and wait() needs a timeout if the process is hanging. In the end kill the process and return the return value and captured output utf-8 encoded, just like before these changes. This fixes ssh run() related deadlocks when a qemu target hangs completely. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/core/target/ssh.py | 33 +++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py index 13fd5b2a49..466a795eb4 100644 --- a/meta/lib/oeqa/core/target/ssh.py +++ b/meta/lib/oeqa/core/target/ssh.py @@ -229,22 +229,28 @@ def SSHCall(command, logger, timeout=None, **opts): if timeout: endtime = starttime + timeout eof = False + os.set_blocking(process.stdout.fileno(), False) while time.time() < endtime and not eof: - logger.debug('time: %s, endtime: %s' % (time.time(), endtime)) try: + logger.debug('Waiting for process output: time: %s, endtime: %s' % (time.time(), endtime)) if select.select([process.stdout], [], [], 5)[0] != []: - reader = codecs.getreader('utf-8')(process.stdout, 'ignore') - data = reader.read(1024, 4096) + # wait a bit for more data, tries to avoid reading single characters + time.sleep(0.2) + data = process.stdout.read() if not data: - process.stdout.close() eof = True else: - output += data - logger.debug('Partial data from SSH call:\n%s' % data) + # ignore errors to capture as much as possible + string = data.decode('utf-8', errors='ignore') + output += string + logger.debug('Partial data from SSH call:\n%s' % string) endtime = time.time() + timeout except InterruptedError: + logger.debug('InterruptedError') continue + process.stdout.close() + # process hasn't returned yet if not eof: process.terminate() @@ -252,6 +258,7 @@ def SSHCall(command, logger, timeout=None, **opts): try: process.kill() except OSError: + logger.debug('OSError when killing process') pass endtime = time.time() - starttime lastline = ("\nProcess killed - no output for %d seconds. Total" @@ -263,6 +270,17 @@ def SSHCall(command, logger, timeout=None, **opts): output = process.communicate()[0].decode('utf-8', errors='ignore') logger.debug('Data from SSH call:\n%s' % output.rstrip()) + # timout or not, make sure process exits and is not hanging + if process.returncode == None: + try: + process.wait(timeout=5) + except TimeoutExpired: + try: + process.kill() + except OSError: + logger.debug('OSError') + pass + options = { "stdout": subprocess.PIPE, "stderr": subprocess.STDOUT, @@ -290,4 +308,5 @@ def SSHCall(command, logger, timeout=None, **opts): process.kill() logger.debug('Something went wrong, killing SSH process') raise - return (process.wait(), output.rstrip()) + + return (process.returncode, output.rstrip()) From patchwork Thu Feb 9 08:09:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73E8DC636D4 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web11.9262.1675930191616214046 for ; Thu, 09 Feb 2023 00:09:51 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ205-00B9tO-Su; Thu, 09 Feb 2023 10:09:49 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ205-000clT-Jg; Thu, 09 Feb 2023 10:09:49 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 3/8] oeqa ssh.py: add connection keep alive options to ssh client Date: Thu, 9 Feb 2023 10:09:31 +0200 Message-Id: <20230209080936.148489-4-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-4-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176922 Configure ssh client to test that connection with server is up. If the server does not respond within a minute then the connection, target machine or sshd daemon are stuck and it's better to exit the command execution with errors. Some tests can execute a long time without returning stdout/stderror data and it's difficult to adjust timers for those cases if connection to target machine or the target machine itself hangs and output is not expected in minutes or even hours. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/core/target/ssh.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py index 466a795eb4..bb3f2e3dc3 100644 --- a/meta/lib/oeqa/core/target/ssh.py +++ b/meta/lib/oeqa/core/target/ssh.py @@ -34,6 +34,8 @@ class OESSHTarget(OETarget): self.timeout = timeout self.user = user ssh_options = [ + '-o', 'ServerAliveCountMax=2', + '-o', 'ServerAliveInterval=30', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-o', 'LogLevel=ERROR' From patchwork Thu Feb 9 08:09:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74EAAC6379F for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web10.9264.1675930192951082435 for ; Thu, 09 Feb 2023 00:09:53 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ206-00B9tl-Qb; Thu, 09 Feb 2023 10:09:50 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ206-000cmx-Fr; Thu, 09 Feb 2023 10:09:50 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 4/8] oeqa dump.py: add error counter and stop after 5 failures Date: Thu, 9 Feb 2023 10:09:32 +0200 Message-Id: <20230209080936.148489-5-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-5-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176923 If test target qemu machine hangs completely, dump_target() calls over serial console are taking a long time to time out, possibly for every failing ssh command execution and a lot of test cases, and same with dump_monitor(). Instead of trying for ever, count errors and after 5 stop trying to dump_target() and dump_monitor() completely. These help to end testing earlier when a test target is completely deadlocked and all ssh, serial and QMP communication with it are failing. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/utils/dump.py | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/meta/lib/oeqa/utils/dump.py b/meta/lib/oeqa/utils/dump.py index bcee03b576..d420b497f9 100644 --- a/meta/lib/oeqa/utils/dump.py +++ b/meta/lib/oeqa/utils/dump.py @@ -93,37 +93,55 @@ class HostDumper(BaseDumper): self._write_dump(cmd.split()[0], result.output) class TargetDumper(BaseDumper): - """ Class to get dumps from target, it only works with QemuRunner """ + """ Class to get dumps from target, it only works with QemuRunner. + Will give up permanently after 5 errors from running commands over + serial console. This helps to end testing when target is really dead, hanging + or unresponsive. + """ def __init__(self, cmds, parent_dir, runner): super(TargetDumper, self).__init__(cmds, parent_dir) self.runner = runner + self.errors = 0 def dump_target(self, dump_dir=""): + if self.errors >= 5: + print("Too many errors when dumping data from target, assuming it is dead! Will not dump data anymore!") + return if dump_dir: self.dump_dir = dump_dir for cmd in self.cmds: # We can continue with the testing if serial commands fail try: (status, output) = self.runner.run_serial(cmd) + if status == 0: + self.errors = self.errors + 1 self._write_dump(cmd.split()[0], output) except: + self.errors = self.errors + 1 print("Tried to dump info from target but " "serial console failed") print("Failed CMD: %s" % (cmd)) class MonitorDumper(BaseDumper): - """ Class to get dumps via the Qemu Monitor, it only works with QemuRunner """ + """ Class to get dumps via the Qemu Monitor, it only works with QemuRunner + Will stop completely if there are more than 5 errors when dumping monitor data. + This helps to end testing when target is really dead, hanging or unresponsive. + """ def __init__(self, cmds, parent_dir, runner): super(MonitorDumper, self).__init__(cmds, parent_dir) self.runner = runner + self.errors = 0 def dump_monitor(self, dump_dir=""): if self.runner is None: return if dump_dir: self.dump_dir = dump_dir + if self.errors >= 5: + print("Too many errors when dumping data from qemu monitor, assuming it is dead! Will not dump data anymore!") + return for cmd in self.cmds: cmd_name = cmd.split()[0] try: @@ -137,4 +155,5 @@ class MonitorDumper(BaseDumper): output = self.runner.run_monitor(cmd_name) self._write_dump(cmd_name, output) except Exception as e: + self.errors = self.errors + 1 print("Failed to dump QMP CMD: %s with\nException: %s" % (cmd_name, e)) From patchwork Thu Feb 9 08:09:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85F80C64ED6 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web11.9263.1675930193276590446 for ; Thu, 09 Feb 2023 00:09:53 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ207-00B9tx-Er; Thu, 09 Feb 2023 10:09:51 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ207-000cnI-5W; Thu, 09 Feb 2023 10:09:51 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 5/8] oeqa qemurunner: read more data at a time from serial Date: Thu, 9 Feb 2023 10:09:33 +0200 Message-Id: <20230209080936.148489-6-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-6-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176924 Use a short sleep to bundle serial console reads so that we are not reading one character at a time which reduces busy looping. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/utils/qemurunner.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py index 8b893601d4..2e054447fc 100644 --- a/meta/lib/oeqa/utils/qemurunner.py +++ b/meta/lib/oeqa/utils/qemurunner.py @@ -202,7 +202,7 @@ class QemuRunner: qmp_file = "." + next(tempfile._get_candidate_names()) qmp_param = ' -S -qmp unix:./%s,server,wait' % (qmp_file) qmp_port = self.tmpdir + "/" + qmp_file - # Create a second socket connection for debugging use, + # Create a second socket connection for debugging use, # note this will NOT cause qemu to block waiting for the connection qmp_file2 = "." + next(tempfile._get_candidate_names()) qmp_param += ' -qmp unix:./%s,server,nowait' % (qmp_file2) @@ -468,6 +468,8 @@ class QemuRunner: socklist.remove(self.server_socket) self.logger.debug("Connection from %s:%s" % addr) else: + # try to avoid reading only a single character at a time + time.sleep(0.1) data = data + sock.recv(1024) if data: bootlog += data From patchwork Thu Feb 9 08:09:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82670C64EC5 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web11.9264.1675930194008656620 for ; Thu, 09 Feb 2023 00:09:54 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ208-00B9uF-3D; Thu, 09 Feb 2023 10:09:52 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ207-000cnT-Q1; Thu, 09 Feb 2023 10:09:51 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 6/8] oeqa qemurunner.py: add timeout to QMP calls Date: Thu, 9 Feb 2023 10:09:34 +0200 Message-Id: <20230209080936.148489-7-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-7-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176925 When a qemu machine hangs, the QMP calls can hang for ever too, and when this happens any failing test commands from ssh runner may be followed by dump_monitor() calls which then also hang. Hangs followed by hangs. Use runqemutime at setup and run_monitor() specific timeout for later calls. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/utils/qemurunner.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py index 2e054447fc..bce00c696a 100644 --- a/meta/lib/oeqa/utils/qemurunner.py +++ b/meta/lib/oeqa/utils/qemurunner.py @@ -350,6 +350,8 @@ class QemuRunner: return False try: + # set timeout value for all QMP calls + self.qmp.settimeout(self.runqemutime) self.qmp.connect() connect_time = time.time() self.logger.info("QMP connected to QEMU at %s and took %s seconds" % @@ -628,6 +630,7 @@ class QemuRunner: def run_monitor(self, command, args=None, timeout=60): if hasattr(self, 'qmp') and self.qmp: + self.qmp.settimeout(timeout) if args is not None: return self.qmp.cmd(command, args) else: From patchwork Thu Feb 9 08:09:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19262 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66B49C05027 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web10.9265.1675930194524580633 for ; Thu, 09 Feb 2023 00:09:54 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ208-00B9ue-NY; Thu, 09 Feb 2023 10:09:52 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ208-000coC-Eq; Thu, 09 Feb 2023 10:09:52 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 7/8] oeqa qemurunner.py: kill qemu if it hangs Date: Thu, 9 Feb 2023 10:09:35 +0200 Message-Id: <20230209080936.148489-8-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-8-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176926 qemu doesn't always behave well and can hang too. kill it with force if was still alive. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/utils/qemurunner.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py index bce00c696a..8e3484385d 100644 --- a/meta/lib/oeqa/utils/qemurunner.py +++ b/meta/lib/oeqa/utils/qemurunner.py @@ -589,6 +589,13 @@ class QemuRunner: os.kill(self.qemupid, signal.SIGTERM) except ProcessLookupError as e: self.logger.warning('qemu-system ended unexpectedly') + time.sleep(5) + try: + # qemu-system did not behave well + os.kill(self.qemupid, signal.SIGKILL) + except ProcessLookupError as e: + # already dead + pass def stop_thread(self): if self.thread and self.thread.is_alive(): From patchwork Thu Feb 9 08:09:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikko Rapeli X-Patchwork-Id: 19266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97091C64ED8 for ; Thu, 9 Feb 2023 08:10:00 +0000 (UTC) Received: from mail.kapsi.fi (mail.kapsi.fi [91.232.154.25]) by mx.groups.io with SMTP id smtpd.web11.9265.1675930195127224521 for ; Thu, 09 Feb 2023 00:09:55 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: lakka.kapsi.fi, ip: 91.232.154.25, mailfrom: mcfrisk@lakka.kapsi.fi) Received: from kapsi.fi ([2001:67c:1be8::11] helo=lakka.kapsi.fi) by mail.kapsi.fi with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pQ209-00B9un-D6; Thu, 09 Feb 2023 10:09:53 +0200 Received: from mcfrisk by lakka.kapsi.fi with local (Exim 4.94.2) (envelope-from ) id 1pQ209-000cob-4b; Thu, 09 Feb 2023 10:09:53 +0200 From: Mikko Rapeli To: openembedded-core@lists.openembedded.org Cc: Mikko Rapeli Subject: [PATCH v2 8/8] oeqa qemurunner.py: try to avoid reading one character at a time Date: Thu, 9 Feb 2023 10:09:36 +0200 Message-Id: <20230209080936.148489-9-mikko.rapeli@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230209080936.148489-1-mikko.rapeli@linaro.org> References: <20230209080936.148489-1-mikko.rapeli@linaro.org> MIME-Version: 1.0 X-Rspam-Score: -1.2 (-) X-Rspam-Report: Action: no action Symbol: RCVD_TLS_LAST(0.00) Symbol: ARC_NA(0.00) Symbol: DMARC_POLICY_SOFTFAIL(0.10) Symbol: FROM_HAS_DN(0.00) Symbol: TO_DN_SOME(0.00) Symbol: R_MISSING_CHARSET(0.50) Symbol: TO_MATCH_ENVRCPT_ALL(0.00) Symbol: MIME_GOOD(-0.10) Symbol: RCPT_COUNT_TWO(0.00) Symbol: MID_CONTAINS_FROM(1.00) Symbol: R_SPF_NA(0.00) Symbol: FORGED_SENDER(0.30) Symbol: R_DKIM_NA(0.00) Symbol: MIME_TRACE(0.00) Symbol: ASN(0.00) Symbol: FROM_NEQ_ENVFROM(0.00) Symbol: BAYES_HAM(-3.00) Symbol: RCVD_COUNT_TWO(0.00) Message-ID: 20230209080936.148489-9-mikko.rapeli@linaro.org X-SA-Exim-Connect-IP: 2001:67c:1be8::11 X-SA-Exim-Mail-From: mcfrisk@lakka.kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 09 Feb 2023 08:10:00 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/176927 Read from serial console with a small delay to bundle data to e.g. full lines. Reading one character at a time is not needed and causes busy looping. Signed-off-by: Mikko Rapeli --- meta/lib/oeqa/utils/qemurunner.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py index 8e3484385d..0f4157d318 100644 --- a/meta/lib/oeqa/utils/qemurunner.py +++ b/meta/lib/oeqa/utils/qemurunner.py @@ -665,6 +665,8 @@ class QemuRunner: except InterruptedError: continue if sread: + # try to avoid reading single character at a time + time.sleep(0.1) answer = self.server_socket.recv(1024) if answer: data += answer.decode('utf-8')