diff mbox series

staging.bbclass: run prepare_recipe_sysroot after unpack, not fetch

Message ID 20230309103225.3110783-1-alex@linutronix.de
State New
Headers show
Series staging.bbclass: run prepare_recipe_sysroot after unpack, not fetch | expand

Commit Message

Alexander Kanavin March 9, 2023, 10:32 a.m. UTC
Otherwise nasty races between unpack and prepare_recipe_sysroot can occur:

ERROR: Bitbake Fetcher Error: FetchError('Fetch command export PSEUDO_DISABLED=1; export PATH="/srv/work/alex/poky/scripts/native-intercept:/srv/storage/alex/yocto/build-64-alt/tmp/sysroots-uninative/x86_64-linux/usr/bin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin/python3-native:/srv/work/alex/poky/scripts:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin/x86_64-linux:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/sbin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/sbin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/bin:/srv/work/alex/poky/bitbake/bin:/srv/storage/alex/yocto/build-64-alt/tmp/hosttools"; export HOME="/home/alex"; git -c gc.autoDetach=false -c core.pager=cat checkout -B master 3ad1710a96678fe79066b1469cead4058713a1d9 failed with exit code 127, output:\npython3: error while loading shared libraries: libpython3.11.so.1.0: cannot open shared object file: No such file or directory\n', None)
DEBUG: Python function base_do_unpack finished

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
---
 meta/classes-global/staging.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Purdie March 9, 2023, 10:39 a.m. UTC | #1
On Thu, 2023-03-09 at 11:32 +0100, Alexander Kanavin wrote:
> Otherwise nasty races between unpack and prepare_recipe_sysroot can occur:
> 
> ERROR: Bitbake Fetcher Error: FetchError('Fetch command export PSEUDO_DISABLED=1; export PATH="/srv/work/alex/poky/scripts/native-intercept:/srv/storage/alex/yocto/build-64-alt/tmp/sysroots-uninative/x86_64-linux/usr/bin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin/python3-native:/srv/work/alex/poky/scripts:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin/x86_64-linux:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/sbin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/usr/bin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/sbin:/srv/storage/alex/yocto/build-64-alt/tmp/work/x86_64-linux/libslirp-native/4.7.0-r0/recipe-sysroot-native/bin:/srv/work/alex/poky/bitbake/bin:/srv/storage/alex/yocto/build-64-alt/tmp/hosttools"; export HOME="/home/alex"; git -c gc.autoDetach=false -c core.pager=cat checkout -B master 3ad1710a96678fe79066b1469cead4058713a1d9 failed with exit code 127, output:\npython3: error while loading shared libraries: libpython3.11.so.1.0: cannot open shared object file: No such file or directory\n', None)
> DEBUG: Python function base_do_unpack finished
> 
> Signed-off-by: Alexander Kanavin <alex@linutronix.de>
> ---
>  meta/classes-global/staging.bbclass | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meta/classes-global/staging.bbclass b/meta/classes-global/staging.bbclass
> index e6d0d1d55c..ade5c03cd9 100644
> --- a/meta/classes-global/staging.bbclass
> +++ b/meta/classes-global/staging.bbclass
> @@ -647,7 +647,7 @@ do_prepare_recipe_sysroot[deptask] = "do_populate_sysroot"
>  python do_prepare_recipe_sysroot () {
>      bb.build.exec_func("extend_recipe_sysroot", d)
>  }
> -addtask do_prepare_recipe_sysroot before do_configure after do_fetch
> +addtask do_prepare_recipe_sysroot before do_configure after do_unpack
>  
>  python staging_taskhandler() {
>      bbtasks = e.tasklist

Before we start changing core dependencies, we need to better
understand what happened there. Whilst this patch might fix this case,
we'd likely just leave races elsewhere as other tasks call git too.

Which recipe was this?
Why is git calling into python?
Why didn't the staging code to move libs into position before binaries
prevent this?

Cheers,

Richard
Alexander Kanavin March 9, 2023, 11:44 a.m. UTC | #2
On Thu, 9 Mar 2023 at 11:39, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

> Before we start changing core dependencies, we need to better
> understand what happened there. Whilst this patch might fix this case,
> we'd likely just leave races elsewhere as other tasks call git too.
>
> Which recipe was this?
> Why is git calling into python?
> Why didn't the staging code to move libs into position before binaries
> prevent this?

The recipe was libslirp-native (I don't think it matters). Git itself
is not calling into python; it's poky/scripts/git, which is a python
wrapper script around git.

I don't have a definite answer to the last question - I believe the
staging code also removes 'outdated' sysroot items before installing
current ones, and that's where the window of having python binary, but
not its library may have occurred?

Still, there seems to be a general problem here:
prepare_recipe_sysroot runs at the same time as tasks (such as unpack)
that have PATH pointing to directories being modified by it. So
there's no determinism in which python (native one or one from the
host) is going to be used for example; it may randomly change midway
through the task.

Alex
Richard Purdie March 9, 2023, 12:03 p.m. UTC | #3
On Thu, 2023-03-09 at 12:44 +0100, Alexander Kanavin wrote:
> On Thu, 9 Mar 2023 at 11:39, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> 
> > Before we start changing core dependencies, we need to better
> > understand what happened there. Whilst this patch might fix this case,
> > we'd likely just leave races elsewhere as other tasks call git too.
> > 
> > Which recipe was this?
> > Why is git calling into python?
> > Why didn't the staging code to move libs into position before binaries
> > prevent this?
> 
> The recipe was libslirp-native (I don't think it matters). Git itself
> is not calling into python; it's poky/scripts/git, which is a python
> wrapper script around git.
> 
> I don't have a definite answer to the last question - I believe the
> staging code also removes 'outdated' sysroot items before installing
> current ones, and that's where the window of having python binary, but
> not its library may have occurred?

It sounds like the removal is probably the thing that caused problems.

> Still, there seems to be a general problem here:
> prepare_recipe_sysroot runs at the same time as tasks (such as unpack)
> that have PATH pointing to directories being modified by it. So
> there's no determinism in which python (native one or one from the
> host) is going to be used for example; it may randomly change midway
> through the task.

This is a wider problem, the sysroots are often modified when things
are running against it as we don't require task serialisation. Instead
the code tries to be careful about it's manipulations. Things aren't
always installed/removed just from that task either, it can happen with
*any* task that tries to extend the sysroot (which fetch, unpack and
patch do as well).

Cheers,

Richard
Alexander Kanavin March 9, 2023, 1:04 p.m. UTC | #4
On Thu, 9 Mar 2023 at 13:03, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

> This is a wider problem, the sysroots are often modified when things
> are running against it as we don't require task serialisation. Instead
> the code tries to be careful about it's manipulations. Things aren't
> always installed/removed just from that task either, it can happen with
> *any* task that tries to extend the sysroot (which fetch, unpack and
> patch do as well).

Right, then I'm not sure what to do here (short of 'task-specific sysroots' :-D

We can lessen the chances by serializing prepare_recipe_sysroot into
fetch/unpack/patch/configure sequence. But it would not eliminate the
possibility of other tasks stepping on those or each other.
We can also make sysroot (de)population code more careful. But it
won't solve the non-determinism.
Making git wrapper run with python from hosttools/ solves a specific
issue I saw, but not the broader problem.

It happened just once, so it does seem exceedingly rare.

Alex
Richard Purdie March 9, 2023, 1:27 p.m. UTC | #5
On Thu, 2023-03-09 at 14:04 +0100, Alexander Kanavin wrote:
> On Thu, 9 Mar 2023 at 13:03, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> 
> > This is a wider problem, the sysroots are often modified when things
> > are running against it as we don't require task serialisation. Instead
> > the code tries to be careful about it's manipulations. Things aren't
> > always installed/removed just from that task either, it can happen with
> > *any* task that tries to extend the sysroot (which fetch, unpack and
> > patch do as well).
> 
> Right, then I'm not sure what to do here (short of 'task-specific sysroots' :-D
> 
> We can lessen the chances by serializing prepare_recipe_sysroot into
> fetch/unpack/patch/configure sequence. But it would not eliminate the
> possibility of other tasks stepping on those or each other.
> We can also make sysroot (de)population code more careful. But it
> won't solve the non-determinism.
> Making git wrapper run with python from hosttools/ solves a specific
> issue I saw, but not the broader problem.
> 
> It happened just once, so it does seem exceedingly rare.

I realised the "task specific sysroot" issue when implementing recipe
specific sysroots but I still don't know what to do about it. The
issues arising are very very rare.

The population code should avoid problems like you mentioned by
handling xxx/bin/* last. We should probably make the depopulation code
handle xxx/bin/* first which would avoid this there, too.

I've also been thinking we should have recipe specific hosttools for
quite some time but that isn't easy to get working.

For depopulation, I've wondered if bitbake should do more of this "up
front" with some changes to the taskgraph to handle it better. It would
need a new type of task which would complicate a lot of the mechanics
of the system though.

It comes back to a question of time/resources and whether anyone would
want to take on trying to make changes like this. I've slowly chipped
away at the issues but some of this is a much bigger scope of project.

https://git.yoctoproject.org/poky/commit/meta/classes/staging.bbclass?id=a819c88b875d2357751005df7594de290926ae64
was the fix for population FWIW.

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/classes-global/staging.bbclass b/meta/classes-global/staging.bbclass
index e6d0d1d55c..ade5c03cd9 100644
--- a/meta/classes-global/staging.bbclass
+++ b/meta/classes-global/staging.bbclass
@@ -647,7 +647,7 @@  do_prepare_recipe_sysroot[deptask] = "do_populate_sysroot"
 python do_prepare_recipe_sysroot () {
     bb.build.exec_func("extend_recipe_sysroot", d)
 }
-addtask do_prepare_recipe_sysroot before do_configure after do_fetch
+addtask do_prepare_recipe_sysroot before do_configure after do_unpack
 
 python staging_taskhandler() {
     bbtasks = e.tasklist