diff mbox series

[RFC] license.bbclass: only create hardlinks within $TMPDIR

Message ID 20260501183615.2728124-1-ravi@prevas.dk
State New
Headers show
Series [RFC] license.bbclass: only create hardlinks within $TMPDIR | expand

Commit Message

Rasmus Villemoes May 1, 2026, 6:36 p.m. UTC
From: Rasmus Villemoes <ravi@prevas.dk>

This is _not_ meant to be applied, but merely acts as a place to start
a discussion.

In our CI setup (which is gitlab-based, but I'm not sure that's too
relevant), we sometimes see spurious errors like

  touch: setting times of '[...]/rootfs/usr/share/common-licenses/go2rtc/COPYING.MIT': Operation not permitted

Often, the error disappears when the job is re-run, which I don't
really claim to understand. But digging into this, it seems that it is
related to the use of hard-linking from common-licenses dir in the
oe-core repository and into the build directory (and then later steps
create further hard links).

While the test done here for whether a hard link can be made is mostly
correct (assuming the the sysctl protected_hardlinks is 1 as it is in
most distros), those are not the same requirements needed for a
subsequent utimensat(2) system call to be applied - that essentially
requires one to be the owner of the file, not merely having write
access.

We are not the only ones to have observed these spurious failures:
https://stackoverflow.com/questions/77926112/yocto-gitlab-ci-job-sometimes-causes-touch-to-give-operation-not-permitted

But apart from these concrete failures, at a higher level it seems
wrong for a build to cause the mtime property of files in the layer
directories to change, and it seems even more wrong considering that
if one of those files are rewritten (via open(O_TRUNC)+write(), not
via "write new file and rename on top"), that affects what is already
in build directories. Also, one could imagine two parallel builds
using the same layer directories as source, but with different values
for the REPRODUCIBLE_TIMESTAMP_ROOTFS, which would result in rather
random end results since both builds modify the mtimes of the same
inodes.

I do understand that using hard links is quite valuable for saving
time and space (I see GPL-2.0-only having 135 links, so that alone is
a few MB), which is why I don't think this should be applied as-is.

Could we do something like create a $TMPDIR/license-pool/, and
whenever encountering a src not inside TMPDIR, copy that file to
$TMPDIR/license-pool/<basename>-<sha256 of source> [if it doesn't
already exist], and then use the latter as src in the subsequent "can
we hardlink" logic? That would then also improve the case where the
meta-layers are (bind)mounted R/O, and we thus always end up copying.

Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
 meta/classes-global/license.bbclass | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
diff mbox series

Patch

diff --git a/meta/classes-global/license.bbclass b/meta/classes-global/license.bbclass
index af5f1ed41d..61049d76fe 100644
--- a/meta/classes-global/license.bbclass
+++ b/meta/classes-global/license.bbclass
@@ -36,7 +36,7 @@  python do_populate_lic() {
 
     # The base directory we wrangle licenses to
     destdir = os.path.join(d.getVar('LICSSTATEDIR'), d.getVar('LICENSE_DEPLOY_PATHCOMPONENT'), d.getVar('PN'))
-    copy_license_files(lic_files_paths, destdir)
+    copy_license_files(lic_files_paths, destdir, d)
     info = get_recipe_info(d)
     with open(os.path.join(destdir, "recipeinfo"), "w") as f:
         for key in sorted(info.keys()):
@@ -52,7 +52,7 @@  python perform_packagecopy:prepend () {
 
         # LICENSE_FILES_DIRECTORY starts with '/' so os.path.join cannot be used to join D and LICENSE_FILES_DIRECTORY
         destdir = d.getVar('D') + os.path.join(d.getVar('LICENSE_FILES_DIRECTORY'), d.getVar('PN'))
-        copy_license_files(lic_files_paths, destdir)
+        copy_license_files(lic_files_paths, destdir, d)
         add_package_and_files(d)
 }
 perform_packagecopy[vardeps] += "LICENSE_CREATE_PACKAGE"
@@ -76,10 +76,11 @@  def add_package_and_files(d):
         d.setVar('PACKAGES', "%s %s" % (pn_lic, packages))
         d.setVar('FILES:' + pn_lic, files)
 
-def copy_license_files(lic_files_paths, destdir):
+def copy_license_files(lic_files_paths, destdir, d):
     import shutil
     import errno
 
+    tmpdir = d.getVar("TMPDIR")
     bb.utils.mkdirhier(destdir)
     for (basename, path, beginline, endline) in lic_files_paths:
         try:
@@ -89,7 +90,7 @@  def copy_license_files(lic_files_paths, destdir):
                 os.remove(dst)
             if os.path.islink(src):
                 src = os.path.realpath(src)
-            canlink = os.access(src, os.W_OK) and (os.stat(src).st_dev == os.stat(destdir).st_dev) and beginline is None and endline is None
+            canlink = src.startswith(tmpdir) and os.access(src, os.W_OK) and (os.stat(src).st_dev == os.stat(destdir).st_dev) and beginline is None and endline is None
             if canlink:
                 try:
                     os.link(src, dst)