diff mbox series

[1/3] sbom30.py: reduce redundant spdxid-hash symlinks to save inode on host

Message ID 20241110030741.4108407-1-hongxu.jia@windriver.com
State New
Headers show
Series [1/3] sbom30.py: reduce redundant spdxid-hash symlinks to save inode on host | expand

Commit Message

Hongxu Jia Nov. 10, 2024, 3:07 a.m. UTC
In order to support all in-scope SPDX data within a single
JSON-LD file for SPDX 3.0.1, Yocto's SBOM:
- In native/target/nativesdk recipe, created spdxid-hash symlink
  for each element to point to the JSON-LD file that contains
  element details;
- In image recipe, use spdxid-hash symlink to collect element
  details from varies of JSON-LD files

While SPDX_INCLUDE_SOURCES = "1", it adds sources to JSON-LD file
and create 2N+ spdxid-hash symlinks for N source files.
(N for software_File, N for hasDeclaredLicense's Relationship)

For large numbers of source files, adding an extra symlink -> real file
will occupy one more inode (per file), which will need a slot in
the OS's inode cache. In this situation, disk performance is slow
and inode is used up quickly

While using function add_package_files to add source files to JSON-LD file,
the spdxid-hash symlinks for source files point to the same JSON-LD file,
then according to the format of spdxId

- spdxId of souce file:
http://spdx.org/spdxdocs/shadow-10e66933-65cf-5a2d-9a1d-99b12a405441/0838759b8d71923d250a0813dda7356ffd309576115bbf8ed7e266cf4aed86a5/sourcefile/1

Remove the count number ('/1') from spdxId suffix, then all
source files in one recipe will share one spdxid-hash symlink.

The same reason to sysroot and package files

- spdxId of sysroot file:
http://spdx.org/spdxdocs/shadow-10e66933-65cf-5a2d-9a1d-99b12a405441/0838759b8d71923d250a0813dda7356ffd309576115bbf8ed7e266cf4aed86a5/sysroot/1

- spdxId of pacakge file:
http://spdx.org/spdxdocs/shadow-10e66933-65cf-5a2d-9a1d-99b12a405441/0838759b8d71923d250a0813dda7356ffd309576115bbf8ed7e266cf4aed86a5/package/shadow-src/file/1

Build core-image-minimal with/without this commit, comparing the spdxid-hash number, 7 281 824 -> 70 508

echo 'SPDX_INCLUDE_SOURCES = "1"' >> local.conf

With this commit:
$ time bitbake core-image-minimal
real    95m6.960s
user    0m22.832s
sys     0m4.087s

$ find tmp/deploy/spdx/3.0.1/*/by-spdxid-hash/ -name "*.spdx.json" |wc -l
70508

Without this commit:
$ time bitbake core-image-minimal
real    100m17.769s
user    0m24.516s
sys     0m4.334s

$ find tmp/deploy/spdx/3.0.1/*/by-spdxid-hash -name "*.json" |wc -l
7281824

Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
---
 meta/lib/oe/sbom30.py | 9 +++++++++
 1 file changed, 9 insertions(+)
diff mbox series

Patch

diff --git a/meta/lib/oe/sbom30.py b/meta/lib/oe/sbom30.py
index e3a9428668..4efeaae3a0 100644
--- a/meta/lib/oe/sbom30.py
+++ b/meta/lib/oe/sbom30.py
@@ -911,6 +911,10 @@  def jsonld_arch_path(d, arch, subdir, name, deploydir=None):
 
 
 def jsonld_hash_path(_id):
+    # For the spdId added by add_package_files, remove suffix count number
+    if re.match(r".*/(sourcefile|sysroot|file)/\w+$", _id):
+        _id = os.path.dirname(_id)
+
     h = hashlib.sha256(_id.encode("utf-8")).hexdigest()
 
     return Path("by-spdxid-hash") / h[:2], h
@@ -992,6 +996,11 @@  def write_recipe_jsonld_doc(
             *hash_path,
             deploydir=deploydir,
         )
+
+        # Return if expected symlink exists
+        if link_name.is_symlink() and link_name.resolve() == dest:
+            return hash_path[-1]
+
         try:
             link_name.parent.mkdir(exist_ok=True, parents=True)
             link_name.symlink_to(os.path.relpath(dest, link_name.parent))