diff mbox series

[RFC,2/2] bitbake-setup: share sstate by default between builds

Message ID 20251215145418.2680311-2-alex.kanavin@gmail.com
State New
Headers show
Series [RFC,1/2] cooker: use BB_HASHSERVE_DB_DIR for hash server database location | expand

Commit Message

Alexander Kanavin Dec. 15, 2025, 2:54 p.m. UTC
From: Alexander Kanavin <alex@linutronix.de>

Nowadays sharing sstate must also include sharing the hash equivalency
information and thus, managing a hash equivalency server. There are
two ways to do it:

- starting/stopping the server outside the bitbake invocations, and
guaranteeing that it's available when bitbake is invoked.

- using bitbake's built-in start/stop code which launches a server
before a build starts and stops it when a build is finished; essentially
this is a private server, using a database private to a build directory
(by default).

I couldn't come up with a good way to do the first option in bitbake-setup:
it needs to be invisible to users, they should not have to run special commands
and they should not wonder why there is a mysterious background process.
It's not impossible to auto-start a shared server, but that will quickly
run into synchronization issues: if one server is being started, another
should not be started at the same time. If one server is shutting down
(e.g. after an inactivity timeout), another starting server should wait
until it frees the socket, and block all bitbake invocations on that.
Memory resident bitbake does this in lib/bb/server/process.py with a lot of
complexity, and I don't think it should be added to the hash server as well.

On the other hand, hash equivalency database is sqlite-driven, and sqlite
documentation reassures that sharing it between different simultaneous
processes is okay: nothing will get lost or corrupted: https://sqlite.org/faq.html#q5

I've confirmed this by running simultaneous builds that way: nothing
unusual happened, and sstate was shared as it's supposed to.

There's a new setting that turns off this behavior for situations
where the server and sstate are managed externally.

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
---
 bin/bitbake-setup | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Paul Barker Dec. 18, 2025, 7:49 p.m. UTC | #1
On Mon, 2025-12-15 at 15:54 +0100, Alexander Kanavin via
lists.openembedded.org wrote:
> From: Alexander Kanavin <alex@linutronix.de>
> 
> Nowadays sharing sstate must also include sharing the hash equivalency
> information and thus, managing a hash equivalency server. There are
> two ways to do it:
> 
> - starting/stopping the server outside the bitbake invocations, and
> guaranteeing that it's available when bitbake is invoked.
> 
> - using bitbake's built-in start/stop code which launches a server
> before a build starts and stops it when a build is finished; essentially
> this is a private server, using a database private to a build directory
> (by default).
> 
> I couldn't come up with a good way to do the first option in bitbake-setup:
> it needs to be invisible to users, they should not have to run special commands
> and they should not wonder why there is a mysterious background process.
> It's not impossible to auto-start a shared server, but that will quickly
> run into synchronization issues: if one server is being started, another
> should not be started at the same time. If one server is shutting down
> (e.g. after an inactivity timeout), another starting server should wait
> until it frees the socket, and block all bitbake invocations on that.
> Memory resident bitbake does this in lib/bb/server/process.py with a lot of
> complexity, and I don't think it should be added to the hash server as well.
> 
> On the other hand, hash equivalency database is sqlite-driven, and sqlite
> documentation reassures that sharing it between different simultaneous
> processes is okay: nothing will get lost or corrupted: https://sqlite.org/faq.html#q5
> 
> I've confirmed this by running simultaneous builds that way: nothing
> unusual happened, and sstate was shared as it's supposed to.
> 
> There's a new setting that turns off this behavior for situations
> where the server and sstate are managed externally.

To capture the concerns discussed on the patch review call this morning,
since Christmas & New Year are about to happen and we may all forget
things otherwise...

The sqlite faq you linked (https://sqlite.org/faq.html#q5) says:

  But use caution: this locking mechanism might not work correctly if
  the database file is kept on an NFS filesystem. This is because
  fcntl() file locking is broken on many NFS implementations. You should
  avoid putting SQLite database files on NFS if multiple processes might
  try to access the file at the same time.

We do promote the sharing of the sstate directory over NFS as safe, so
we should be very careful about placing the hashequiv db file in the
sstate directory in light of the above.

Thanks,
Richard Purdie Dec. 18, 2025, 10:39 p.m. UTC | #2
On Thu, 2025-12-18 at 19:49 +0000, Paul Barker wrote:
> On Mon, 2025-12-15 at 15:54 +0100, Alexander Kanavin via
> lists.openembedded.org wrote:
> > From: Alexander Kanavin <alex@linutronix.de>
> > 
> > Nowadays sharing sstate must also include sharing the hash equivalency
> > information and thus, managing a hash equivalency server. There are
> > two ways to do it:
> > 
> > - starting/stopping the server outside the bitbake invocations, and
> > guaranteeing that it's available when bitbake is invoked.
> > 
> > - using bitbake's built-in start/stop code which launches a server
> > before a build starts and stops it when a build is finished; essentially
> > this is a private server, using a database private to a build directory
> > (by default).
> > 
> > I couldn't come up with a good way to do the first option in bitbake-setup:
> > it needs to be invisible to users, they should not have to run special commands
> > and they should not wonder why there is a mysterious background process.
> > It's not impossible to auto-start a shared server, but that will quickly
> > run into synchronization issues: if one server is being started, another
> > should not be started at the same time. If one server is shutting down
> > (e.g. after an inactivity timeout), another starting server should wait
> > until it frees the socket, and block all bitbake invocations on that.
> > Memory resident bitbake does this in lib/bb/server/process.py with a lot of
> > complexity, and I don't think it should be added to the hash server as well.
> > 
> > On the other hand, hash equivalency database is sqlite-driven, and sqlite
> > documentation reassures that sharing it between different simultaneous
> > processes is okay: nothing will get lost or corrupted: https://sqlite.org/faq.html#q5
> > 
> > I've confirmed this by running simultaneous builds that way: nothing
> > unusual happened, and sstate was shared as it's supposed to.
> > 
> > There's a new setting that turns off this behavior for situations
> > where the server and sstate are managed externally.
> 
> To capture the concerns discussed on the patch review call this morning,
> since Christmas & New Year are about to happen and we may all forget
> things otherwise...
> 
> The sqlite faq you linked (https://sqlite.org/faq.html#q5) says:
> 
>   But use caution: this locking mechanism might not work correctly if
>   the database file is kept on an NFS filesystem. This is because
>   fcntl() file locking is broken on many NFS implementations. You should
>   avoid putting SQLite database files on NFS if multiple processes might
>   try to access the file at the same time.
> 
> We do promote the sharing of the sstate directory over NFS as safe, so
> we should be very careful about placing the hashequiv db file in the
> sstate directory in light of the above.

We've also been telling people they need a shared hashequiv instance if
they have a shared sstate. The number of people who will therefore try
and put this file on an NFS share is probably comparitively high even
if we don't show any example of that.

At the very least we need some kind of test of the filesystem it is on
to ensure it is not NFS as I don't think we want to support that.

Cheers,

Richard
Antonin Godard Dec. 19, 2025, 8:44 a.m. UTC | #3
Hi,

On Mon Dec 15, 2025 at 3:54 PM CET, Alexander Kanavin via lists.openembedded.org wrote:
> From: Alexander Kanavin <alex@linutronix.de>
>
> Nowadays sharing sstate must also include sharing the hash equivalency
> information and thus, managing a hash equivalency server. There are
> two ways to do it:
>
> - starting/stopping the server outside the bitbake invocations, and
> guaranteeing that it's available when bitbake is invoked.
>
> - using bitbake's built-in start/stop code which launches a server
> before a build starts and stops it when a build is finished; essentially
> this is a private server, using a database private to a build directory
> (by default).
>
> I couldn't come up with a good way to do the first option in bitbake-setup:
> it needs to be invisible to users, they should not have to run special commands
> and they should not wonder why there is a mysterious background process.
> It's not impossible to auto-start a shared server, but that will quickly
> run into synchronization issues: if one server is being started, another
> should not be started at the same time. If one server is shutting down
> (e.g. after an inactivity timeout), another starting server should wait
> until it frees the socket, and block all bitbake invocations on that.
> Memory resident bitbake does this in lib/bb/server/process.py with a lot of
> complexity, and I don't think it should be added to the hash server as well.
>
> On the other hand, hash equivalency database is sqlite-driven, and sqlite
> documentation reassures that sharing it between different simultaneous
> processes is okay: nothing will get lost or corrupted: https://sqlite.org/faq.html#q5
>
> I've confirmed this by running simultaneous builds that way: nothing
> unusual happened, and sstate was shared as it's supposed to.
>
> There's a new setting that turns off this behavior for situations
> where the server and sstate are managed externally.
>
> Signed-off-by: Alexander Kanavin <alex@linutronix.de>
> ---
>  bin/bitbake-setup | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/bin/bitbake-setup b/bin/bitbake-setup
> index 6f7dba16c..f382f6eb0 100755
> --- a/bin/bitbake-setup
> +++ b/bin/bitbake-setup
> @@ -777,6 +777,12 @@ def create_siteconf(top_dir, non_interactive, settings):
>  
>          os.makedirs(top_dir, exist_ok=True)
>          with open(siteconfpath, 'w') as siteconffile:
> +            sstate_settings = textwrap.dedent(
> +                """
> +                SSTATE_DIR = "{sstate_dir}"
> +                BB_HASHSERVE_DB_DIR = "{sstate_dir}"

Like DL_DIR, it would be nice to add a comment to explain what these variables
do.

Maybe needs some rework but something like this (adapted from poky's template
files):

```
# Where to place shared-state files
#
# BitBake has the capability to accelerate builds based on previously built output.
# This is done using "shared state" files which can be thought of as cache objects
# and this option determines where those files are placed.
#
# You can wipe out TMPDIR leaving this directory intact and the build would regenerate
# from these files if no changes were made to the configuration. If changes were made
# to the configuration, only shared state files where the state was still valid would
# be used (done using checksums).
SSTATE_DIR = "{sstate_dir}"

# Hash Equivalence database location
#
# Hash equivalence improves reuse of sstate by detecting when a given sstate
# artifact can be reused as equivalent, even if the current task hash doesn't
# match the one that generated the artifact. This variable controls where the
# Hash Equivalence database ("hashserv.db") is stored and can be shared between
# concurrent builds.
BB_HASHSERVE_DB_DIR = "{sstate_dir}"
```

We would probably also need an sstate-dir setting, like dl-dir?

FWIW, I've tested this locally and it works as expected but I did not run
concurrent builds.

Antonin
Alexander Kanavin Jan. 12, 2026, 12:47 p.m. UTC | #4
On Fri, 19 Dec 2025 at 09:44, Antonin Godard <antonin.godard@bootlin.com> wrote:
> Like DL_DIR, it would be nice to add a comment to explain what these variables
> do.
>
> Maybe needs some rework but something like this (adapted from poky's template
> files):

Thanks, I've added these.

> We would probably also need an sstate-dir setting, like dl-dir?

I'm not sure there's a use case for it. If someone wants to move it
out of bitbake-setup's top dir, and place it on a NFS mount, or do
some other custom sstate setup, they're better off switching off the
common-sstate setting, starting a hash server separately, and writing
the needed tweaks into site.conf manually.

I'll leave it out for now, but we can discuss further.

Alex
Alexander Kanavin Jan. 12, 2026, 12:51 p.m. UTC | #5
On Thu, 18 Dec 2025 at 23:39, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

> > The sqlite faq you linked (https://sqlite.org/faq.html#q5) says:
> >
> >   But use caution: this locking mechanism might not work correctly if
> >   the database file is kept on an NFS filesystem. This is because
> >   fcntl() file locking is broken on many NFS implementations. You should
> >   avoid putting SQLite database files on NFS if multiple processes might
> >   try to access the file at the same time.
> >
> > We do promote the sharing of the sstate directory over NFS as safe, so
> > we should be very careful about placing the hashequiv db file in the
> > sstate directory in light of the above.
>
> We've also been telling people they need a shared hashequiv instance if
> they have a shared sstate. The number of people who will therefore try
> and put this file on an NFS share is probably comparitively high even
> if we don't show any example of that.
>
> At the very least we need some kind of test of the filesystem it is on
> to ensure it is not NFS as I don't think we want to support that.

I've added restrictions and notices about NFS, patches will be sent soon.

Bitbake-setup basically would give two options:
- common-sstate = yes (the default): sstate and hash equivalency
database will be placed under a top directory,
- common-sstate = no: bitbake-setup does not do anything about sstate
or hash equivalency; the user is expected to write their own custom
settings into site.conf.

In both situations hash equivalency location is checked by bitbake for
not being a NFS mount with a hard error if it is.

I think this should be good enough for a start?

Alex
diff mbox series

Patch

diff --git a/bin/bitbake-setup b/bin/bitbake-setup
index 6f7dba16c..f382f6eb0 100755
--- a/bin/bitbake-setup
+++ b/bin/bitbake-setup
@@ -777,6 +777,12 @@  def create_siteconf(top_dir, non_interactive, settings):
 
         os.makedirs(top_dir, exist_ok=True)
         with open(siteconfpath, 'w') as siteconffile:
+            sstate_settings = textwrap.dedent(
+                """
+                SSTATE_DIR = "{sstate_dir}"
+                BB_HASHSERVE_DB_DIR = "{sstate_dir}"
+                """.format(sstate_dir=os.path.join(top_dir, ".sstate-cache"))
+            )
             siteconffile.write(
                 textwrap.dedent(
                     """\
@@ -794,7 +800,7 @@  def create_siteconf(top_dir, non_interactive, settings):
                     """.format(
                         dl_dir=settings["default"]["dl-dir"],
                     )
-                )
+                ) + (sstate_settings if settings["default"]["common-sstate"] == 'yes' else "")
             )
 
 
@@ -1001,6 +1007,7 @@  def main():
                          'top-dir-name':'bitbake-builds',
                          'registry':default_registry,
                          'use-full-setup-dir-name':'no',
+                         'common-sstate':'yes',
                          }
 
         global_settings = load_settings(global_settings_path(args))