diff mbox series

oeqa/runtime: fix race-condition in minidebuginfo test

Message ID 20240610123935.2102248-1-ecordonnier@snap.com
State New
Headers show
Series oeqa/runtime: fix race-condition in minidebuginfo test | expand

Commit Message

Etienne Cordonnier June 10, 2024, 12:39 p.m. UTC
From: Etienne Cordonnier <ecordonnier@snap.com>

Fix this error where 'coredumpctl info' warns that the coredump is still being
processed:

```
AssertionError: 1 != 0 : MiniDebugInfo Test failed: No match found.
-- Notice: 1 systemd-coredump@.service unit is running, output may be incomplete.
```

Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
---
 meta/lib/oeqa/runtime/cases/systemd.py | 3 +++
 1 file changed, 3 insertions(+)

Comments

Richard Purdie June 10, 2024, 12:53 p.m. UTC | #1
On Mon, 2024-06-10 at 14:39 +0200, Etienne Cordonnier via lists.openembedded.org wrote:
> From: Etienne Cordonnier <ecordonnier@snap.com>
> 
> Fix this error where 'coredumpctl info' warns that the coredump is still being
> processed:
> 
> ```
> AssertionError: 1 != 0 : MiniDebugInfo Test failed: No match found.
> -- Notice: 1 systemd-coredump@.service unit is running, output may be incomplete.
> ```
> 
> Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
> ---
>  meta/lib/oeqa/runtime/cases/systemd.py | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/meta/lib/oeqa/runtime/cases/systemd.py b/meta/lib/oeqa/runtime/cases/systemd.py
> index 80fdae240a..17fa660ace 100644
> --- a/meta/lib/oeqa/runtime/cases/systemd.py
> +++ b/meta/lib/oeqa/runtime/cases/systemd.py
> @@ -155,6 +155,9 @@ class SystemdServiceTests(SystemdTest):
>          self.target.run('kill -SEGV %s' % output)
>          self.assertEqual(status, 0, msg = 'Not able to find process that runs sleep, output : %s' % output)
>  
> +        # Give some time to systemd-coredump@.service to process the coredump
> +        time.sleep(1)
> +
>          (status, output) = self.target.run('coredumpctl info')
>          self.assertEqual(status, 0, msg='MiniDebugInfo Test failed: %s' % output)
>          self.assertEqual('sleep_for_duration (busybox.nosuid' in output or 'xnanosleep (sleep.coreutils' in output,
> 

Specific sleep values like this are a red flag, it all depends on how
much load systems are under and we've had a lot of problems with these
kinds of things.

Instead, you should probably detect "systemd-coredump@.service unit is
running" in the command output and retry if that is the case with an
overall timeout/error in case it never completes.

Cheers,

Richard
Etienne Cordonnier June 10, 2024, 4:11 p.m. UTC | #2
Hi Richard,
I sent a second version of the patch which shouldn't be racy. I have
implemented it slightly differently from your suggestion because I think it
is also a bit racy (if we start checking whether systemd-coredump@.service
is running in one thread only after it has already finished processing the
coredump in the other thread, then we'll be stuck in a retry loop until the
overall timeout).

Étienne

On Mon, Jun 10, 2024 at 2:53 PM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Mon, 2024-06-10 at 14:39 +0200, Etienne Cordonnier via
> lists.openembedded.org wrote:
> > From: Etienne Cordonnier <ecordonnier@snap.com>
> >
> > Fix this error where 'coredumpctl info' warns that the coredump is still
> being
> > processed:
> >
> > ```
> > AssertionError: 1 != 0 : MiniDebugInfo Test failed: No match found.
> > -- Notice: 1 systemd-coredump@.service unit is running, output may be
> incomplete.
> > ```
> >
> > Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
> > ---
> >  meta/lib/oeqa/runtime/cases/systemd.py | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/meta/lib/oeqa/runtime/cases/systemd.py
> b/meta/lib/oeqa/runtime/cases/systemd.py
> > index 80fdae240a..17fa660ace 100644
> > --- a/meta/lib/oeqa/runtime/cases/systemd.py
> > +++ b/meta/lib/oeqa/runtime/cases/systemd.py
> > @@ -155,6 +155,9 @@ class SystemdServiceTests(SystemdTest):
> >          self.target.run('kill -SEGV %s' % output)
> >          self.assertEqual(status, 0, msg = 'Not able to find process
> that runs sleep, output : %s' % output)
> >
> > +        # Give some time to systemd-coredump@.service to process the
> coredump
> > +        time.sleep(1)
> > +
> >          (status, output) = self.target.run('coredumpctl info')
> >          self.assertEqual(status, 0, msg='MiniDebugInfo Test failed: %s'
> % output)
> >          self.assertEqual('sleep_for_duration (busybox.nosuid' in output
> or 'xnanosleep (sleep.coreutils' in output,
> >
>
> Specific sleep values like this are a red flag, it all depends on how
> much load systems are under and we've had a lot of problems with these
> kinds of things.
>
> Instead, you should probably detect "systemd-coredump@.service unit is
> running" in the command output and retry if that is the case with an
> overall timeout/error in case it never completes.
>
> Cheers,
>
> Richard
>
diff mbox series

Patch

diff --git a/meta/lib/oeqa/runtime/cases/systemd.py b/meta/lib/oeqa/runtime/cases/systemd.py
index 80fdae240a..17fa660ace 100644
--- a/meta/lib/oeqa/runtime/cases/systemd.py
+++ b/meta/lib/oeqa/runtime/cases/systemd.py
@@ -155,6 +155,9 @@  class SystemdServiceTests(SystemdTest):
         self.target.run('kill -SEGV %s' % output)
         self.assertEqual(status, 0, msg = 'Not able to find process that runs sleep, output : %s' % output)
 
+        # Give some time to systemd-coredump@.service to process the coredump
+        time.sleep(1)
+
         (status, output) = self.target.run('coredumpctl info')
         self.assertEqual(status, 0, msg='MiniDebugInfo Test failed: %s' % output)
         self.assertEqual('sleep_for_duration (busybox.nosuid' in output or 'xnanosleep (sleep.coreutils' in output,