From patchwork Thu May 29 20:27:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30058C54FB3 for ; Thu, 29 May 2025 20:28:39 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.2839.1748550513278351396 for ; Thu, 29 May 2025 13:28:33 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 77FB01758 for ; Thu, 29 May 2025 13:27:56 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8F5EC3F673 for ; Thu, 29 May 2025 13:28:12 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 1/9] recipetool: create: Support creating extra files named after the recipe Date: Thu, 29 May 2025 21:27:54 +0100 Message-ID: <20250529202802.1198179-2-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:39 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217445 From: Peter Kjellerstedt Signed-off-by: Peter Kjellerstedt --- scripts/lib/recipetool/create.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py index ea2ef5be637..24315b95b08 100644 --- a/scripts/lib/recipetool/create.py +++ b/scripts/lib/recipetool/create.py @@ -824,7 +824,8 @@ def create_recipe(args): extraoutdir = os.path.join(os.path.dirname(outfile), pn) bb.utils.mkdirhier(extraoutdir) for destfn, extrafile in extrafiles.items(): - shutil.move(extrafile, os.path.join(extraoutdir, destfn)) + fn = destfn.format(pn=pn, pv=realpv) + shutil.move(extrafile, os.path.join(extraoutdir, fn)) lines = lines_before lines_before = [] From patchwork Thu May 29 20:27:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63830 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B6B4C5B553 for ; Thu, 29 May 2025 20:28:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.2811.1748550493985627702 for ; Thu, 29 May 2025 13:28:14 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 28CF52454 for ; Thu, 29 May 2025 13:27:57 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 408103F673 for ; Thu, 29 May 2025 13:28:13 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 2/9] recipetool: licenses.csv: Add mapping for BSD licenses Date: Thu, 29 May 2025 21:27:55 +0100 Message-ID: <20250529202802.1198179-3-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:19 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217439 From: Peter Kjellerstedt Add mapping for BSD license used by crucible. Add mapping for BSD license used by golang.org/x/crypto. Change-Id: Iea4c3575bc4a5e33b4dbd65a34439684e504ba82 Signed-off-by: Peter Kjellerstedt --- scripts/lib/recipetool/licenses.csv | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/lib/recipetool/licenses.csv b/scripts/lib/recipetool/licenses.csv index 80851111b31..ddc818f48c4 100644 --- a/scripts/lib/recipetool/licenses.csv +++ b/scripts/lib/recipetool/licenses.csv @@ -1,6 +1,7 @@ 0636e73ff0215e8d672dc4c32c317bb3,GPL-2.0-only 12f884d2ae1ff87c09e5b7ccc2c4ca7e,GPL-2.0-only 18810669f13b87348459e611d31ab760,GPL-2.0-only +201414b6610203caed355323b1ab3116,BSD-3-Clause 252890d9eee26aab7b432e8b8a616475,LGPL-2.0-only 2d5025d4aa3495befef8f17206a5b0a1,LGPL-2.1-only 3214f080875748938ba060314b4f727d,LGPL-2.0-only @@ -13,6 +14,7 @@ 54c7042be62e169199200bc6477f04d1,BSD-3-Clause 55ca817ccb7d5b5b66355690e9abc605,LGPL-2.0-only 59530bdf33659b29e73d4adb9f9f6552,GPL-2.0-only +5d4950ecb7b26d2c5e4e7b4e0dd74707,BSD-3-Clause 5f30f0716dfdd0d91eb439ebec522ec2,LGPL-2.0-only 6a6a8e020838b23406c81b19c1d46df6,LGPL-3.0-only 751419260aa954499f7abaabaa882bbe,GPL-2.0-only From patchwork Thu May 29 20:27:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63832 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39603C54FB3 for ; Thu, 29 May 2025 20:28:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.2829.1748550494931982748 for ; Thu, 29 May 2025 13:28:15 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E65A62454 for ; Thu, 29 May 2025 13:27:57 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E73673F673 for ; Thu, 29 May 2025 13:28:13 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 3/9] recipetool: create_go: Use gomod fetcher instead of go mod vendor Date: Thu, 29 May 2025 21:27:56 +0100 Message-ID: <20250529202802.1198179-4-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:19 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217440 From: Christian Lindeberg Use the go-mod bbclass together with the gomod fetcher instead of the go-vendor bbclass. Signed-off-by: Christian Lindeberg --- scripts/lib/recipetool/create_go.py | 731 ++++------------------------ 1 file changed, 104 insertions(+), 627 deletions(-) diff --git a/scripts/lib/recipetool/create_go.py b/scripts/lib/recipetool/create_go.py index 5cc53931f00..3e9fc857842 100644 --- a/scripts/lib/recipetool/create_go.py +++ b/scripts/lib/recipetool/create_go.py @@ -10,48 +10,31 @@ # -from collections import namedtuple -from enum import Enum -from html.parser import HTMLParser from recipetool.create import RecipeHandler, handle_license_vars -from recipetool.create import find_licenses, tidy_licenses, fixup_license -from recipetool.create import determine_from_url -from urllib.error import URLError, HTTPError +from recipetool.create import find_licenses import bb.utils import json import logging import os import re -import subprocess import sys -import shutil import tempfile import urllib.parse import urllib.request -GoImport = namedtuple('GoImport', 'root vcs url suffix') logger = logging.getLogger('recipetool') -CodeRepo = namedtuple( - 'CodeRepo', 'path codeRoot codeDir pathMajor pathPrefix pseudoMajor') tinfoil = None -# Regular expression to parse pseudo semantic version -# see https://go.dev/ref/mod#pseudo-versions -re_pseudo_semver = re.compile( - r"^v[0-9]+\.(0\.0-|\d+\.\d+-([^+]*\.)?0\.)(?P\d{14})-(?P[A-Za-z0-9]+)(\+[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$") -# Regular expression to parse semantic version -re_semver = re.compile( - r"^v(?P0|[1-9]\d*)\.(?P0|[1-9]\d*)\.(?P0|[1-9]\d*)(?:-(?P(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?P[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$") - def tinfoil_init(instance): global tinfoil tinfoil = instance + class GoRecipeHandler(RecipeHandler): """Class to handle the go recipe creation""" @@ -83,577 +66,96 @@ class GoRecipeHandler(RecipeHandler): return bindir - def __resolve_repository_static(self, modulepath): - """Resolve the repository in a static manner - - The method is based on the go implementation of - `repoRootFromVCSPaths` in - https://github.com/golang/go/blob/master/src/cmd/go/internal/vcs/vcs.go - """ - - url = urllib.parse.urlparse("https://" + modulepath) - req = urllib.request.Request(url.geturl()) - - try: - resp = urllib.request.urlopen(req) - # Some modulepath are just redirects to github (or some other vcs - # hoster). Therefore, we check if this modulepath redirects to - # somewhere else - if resp.geturl() != url.geturl(): - bb.debug(1, "%s is redirectred to %s" % - (url.geturl(), resp.geturl())) - url = urllib.parse.urlparse(resp.geturl()) - modulepath = url.netloc + url.path - - except URLError as url_err: - # This is probably because the module path - # contains the subdir and major path. Thus, - # we ignore this error for now - logger.debug( - 1, "Failed to fetch page from [%s]: %s" % (url, str(url_err))) - - host, _, _ = modulepath.partition('/') - - class vcs(Enum): - pathprefix = "pathprefix" - regexp = "regexp" - type = "type" - repo = "repo" - check = "check" - schemelessRepo = "schemelessRepo" - - # GitHub - vcsGitHub = {} - vcsGitHub[vcs.pathprefix] = "github.com" - vcsGitHub[vcs.regexp] = re.compile( - r'^(?Pgithub\.com/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(/(?P[A-Za-z0-9_.\-]+))*$') - vcsGitHub[vcs.type] = "git" - vcsGitHub[vcs.repo] = "https://\\g" - - # Bitbucket - vcsBitbucket = {} - vcsBitbucket[vcs.pathprefix] = "bitbucket.org" - vcsBitbucket[vcs.regexp] = re.compile( - r'^(?Pbitbucket\.org/(?P[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+))(/(?P[A-Za-z0-9_.\-]+))*$') - vcsBitbucket[vcs.type] = "git" - vcsBitbucket[vcs.repo] = "https://\\g" - - # IBM DevOps Services (JazzHub) - vcsIBMDevOps = {} - vcsIBMDevOps[vcs.pathprefix] = "hub.jazz.net/git" - vcsIBMDevOps[vcs.regexp] = re.compile( - r'^(?Phub\.jazz\.net/git/[a-z0-9]+/[A-Za-z0-9_.\-]+)(/(?P[A-Za-z0-9_.\-]+))*$') - vcsIBMDevOps[vcs.type] = "git" - vcsIBMDevOps[vcs.repo] = "https://\\g" - - # Git at Apache - vcsApacheGit = {} - vcsApacheGit[vcs.pathprefix] = "git.apache.org" - vcsApacheGit[vcs.regexp] = re.compile( - r'^(?Pgit\.apache\.org/[a-z0-9_.\-]+\.git)(/(?P[A-Za-z0-9_.\-]+))*$') - vcsApacheGit[vcs.type] = "git" - vcsApacheGit[vcs.repo] = "https://\\g" - - # Git at OpenStack - vcsOpenStackGit = {} - vcsOpenStackGit[vcs.pathprefix] = "git.openstack.org" - vcsOpenStackGit[vcs.regexp] = re.compile( - r'^(?Pgit\.openstack\.org/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(\.git)?(/(?P[A-Za-z0-9_.\-]+))*$') - vcsOpenStackGit[vcs.type] = "git" - vcsOpenStackGit[vcs.repo] = "https://\\g" - - # chiselapp.com for fossil - vcsChiselapp = {} - vcsChiselapp[vcs.pathprefix] = "chiselapp.com" - vcsChiselapp[vcs.regexp] = re.compile( - r'^(?Pchiselapp\.com/user/[A-Za-z0-9]+/repository/[A-Za-z0-9_.\-]+)$') - vcsChiselapp[vcs.type] = "fossil" - vcsChiselapp[vcs.repo] = "https://\\g" - - # General syntax for any server. - # Must be last. - vcsGeneralServer = {} - vcsGeneralServer[vcs.regexp] = re.compile( - "(?P(?P([a-z0-9.\\-]+\\.)+[a-z0-9.\\-]+(:[0-9]+)?(/~?[A-Za-z0-9_.\\-]+)+?)\\.(?Pbzr|fossil|git|hg|svn))(/~?(?P[A-Za-z0-9_.\\-]+))*$") - vcsGeneralServer[vcs.schemelessRepo] = True - - vcsPaths = [vcsGitHub, vcsBitbucket, vcsIBMDevOps, - vcsApacheGit, vcsOpenStackGit, vcsChiselapp, - vcsGeneralServer] - - if modulepath.startswith("example.net") or modulepath == "rsc.io": - logger.warning("Suspicious module path %s" % modulepath) - return None - if modulepath.startswith("http:") or modulepath.startswith("https:"): - logger.warning("Import path should not start with %s %s" % - ("http", "https")) - return None - - rootpath = None - vcstype = None - repourl = None - suffix = None - - for srv in vcsPaths: - m = srv[vcs.regexp].match(modulepath) - if vcs.pathprefix in srv: - if host == srv[vcs.pathprefix]: - rootpath = m.group('root') - vcstype = srv[vcs.type] - repourl = m.expand(srv[vcs.repo]) - suffix = m.group('suffix') - break - elif m and srv[vcs.schemelessRepo]: - rootpath = m.group('root') - vcstype = m[vcs.type] - repourl = m[vcs.repo] - suffix = m.group('suffix') - break - - return GoImport(rootpath, vcstype, repourl, suffix) - - def __resolve_repository_dynamic(self, modulepath): - """Resolve the repository root in a dynamic manner. - - The method is based on the go implementation of - `repoRootForImportDynamic` in - https://github.com/golang/go/blob/master/src/cmd/go/internal/vcs/vcs.go - """ - url = urllib.parse.urlparse("https://" + modulepath) - - class GoImportHTMLParser(HTMLParser): - - def __init__(self): - super().__init__() - self.__srv = {} - - def handle_starttag(self, tag, attrs): - if tag == 'meta' and list( - filter(lambda a: (a[0] == 'name' and a[1] == 'go-import'), attrs)): - content = list( - filter(lambda a: (a[0] == 'content'), attrs)) - if content: - srv = content[0][1].split() - self.__srv[srv[0]] = srv - - def go_import(self, modulepath): - if modulepath in self.__srv: - srv = self.__srv[modulepath] - return GoImport(srv[0], srv[1], srv[2], None) - return None - - url = url.geturl() + "?go-get=1" - req = urllib.request.Request(url) - - try: - body = urllib.request.urlopen(req).read() - except HTTPError as http_err: - logger.warning( - "Unclean status when fetching page from [%s]: %s", url, str(http_err)) - body = http_err.fp.read() - except URLError as url_err: - logger.warning( - "Failed to fetch page from [%s]: %s", url, str(url_err)) - return None - - parser = GoImportHTMLParser() - parser.feed(body.decode('utf-8')) - parser.close() - - return parser.go_import(modulepath) - - def __resolve_from_golang_proxy(self, modulepath, version): - """ - Resolves repository data from golang proxy - """ - url = urllib.parse.urlparse("https://proxy.golang.org/" - + modulepath - + "/@v/" - + version - + ".info") - - # Transform url to lower case, golang proxy doesn't like mixed case - req = urllib.request.Request(url.geturl().lower()) - - try: - resp = urllib.request.urlopen(req) - except URLError as url_err: - logger.warning( - "Failed to fetch page from [%s]: %s", url, str(url_err)) - return None - - golang_proxy_res = resp.read().decode('utf-8') - modinfo = json.loads(golang_proxy_res) - - if modinfo and 'Origin' in modinfo: - origin = modinfo['Origin'] - _root_url = urllib.parse.urlparse(origin['URL']) - - # We normalize the repo URL since we don't want the scheme in it - _subdir = origin['Subdir'] if 'Subdir' in origin else None - _root, _, _ = self.__split_path_version(modulepath) - if _subdir: - _root = _root[:-len(_subdir)].strip('/') - - _commit = origin['Hash'] - _vcs = origin['VCS'] - return (GoImport(_root, _vcs, _root_url.geturl(), None), _commit) - - return None - - def __resolve_repository(self, modulepath): - """ - Resolves src uri from go module-path - """ - repodata = self.__resolve_repository_static(modulepath) - if not repodata or not repodata.url: - repodata = self.__resolve_repository_dynamic(modulepath) - if not repodata or not repodata.url: - logger.error( - "Could not resolve repository for module path '%s'" % modulepath) - # There is no way to recover from this - sys.exit(14) - if repodata: - logger.debug(1, "Resolved download path for import '%s' => %s" % ( - modulepath, repodata.url)) - return repodata - - def __split_path_version(self, path): - i = len(path) - dot = False - for j in range(i, 0, -1): - if path[j - 1] < '0' or path[j - 1] > '9': - break - if path[j - 1] == '.': - dot = True - break - i = j - 1 - - if i <= 1 or i == len( - path) or path[i - 1] != 'v' or path[i - 2] != '/': - return path, "", True - - prefix, pathMajor = path[:i - 2], path[i - 2:] - if dot or len( - pathMajor) <= 2 or pathMajor[2] == '0' or pathMajor == "/v1": - return path, "", False - - return prefix, pathMajor, True - - def __get_path_major(self, pathMajor): - if not pathMajor: - return "" - - if pathMajor[0] != '/' and pathMajor[0] != '.': - logger.error( - "pathMajor suffix %s passed to PathMajorPrefix lacks separator", pathMajor) - - if pathMajor.startswith(".v") and pathMajor.endswith("-unstable"): - pathMajor = pathMajor[:len("-unstable") - 2] - - return pathMajor[1:] - - def __build_coderepo(self, repo, path): - codedir = "" - pathprefix, pathMajor, _ = self.__split_path_version(path) - if repo.root == path: - pathprefix = path - elif path.startswith(repo.root): - codedir = pathprefix[len(repo.root):].strip('/') - - pseudoMajor = self.__get_path_major(pathMajor) - - logger.debug("root='%s', codedir='%s', prefix='%s', pathMajor='%s', pseudoMajor='%s'", - repo.root, codedir, pathprefix, pathMajor, pseudoMajor) - - return CodeRepo(path, repo.root, codedir, - pathMajor, pathprefix, pseudoMajor) - - def __resolve_version(self, repo, path, version): - hash = None - coderoot = self.__build_coderepo(repo, path) - - def vcs_fetch_all(): - tmpdir = tempfile.mkdtemp() - clone_cmd = "%s clone --bare %s %s" % ('git', repo.url, tmpdir) - bb.process.run(clone_cmd) - log_cmd = "git log --all --pretty='%H %d' --decorate=short" - output, _ = bb.process.run( - log_cmd, shell=True, stderr=subprocess.PIPE, cwd=tmpdir) - bb.utils.prunedir(tmpdir) - return output.strip().split('\n') - - def vcs_fetch_remote(tag): - # add * to grab ^{} - refs = {} - ls_remote_cmd = "git ls-remote -q --tags {} {}*".format( - repo.url, tag) - output, _ = bb.process.run(ls_remote_cmd) - output = output.strip().split('\n') - for line in output: - f = line.split(maxsplit=1) - if len(f) != 2: - continue - - for prefix in ["HEAD", "refs/heads/", "refs/tags/"]: - if f[1].startswith(prefix): - refs[f[1][len(prefix):]] = f[0] - - for key, hash in refs.items(): - if key.endswith(r"^{}"): - refs[key.strip(r"^{}")] = hash - - return refs[tag] - - m_pseudo_semver = re_pseudo_semver.match(version) - - if m_pseudo_semver: - remote_refs = vcs_fetch_all() - short_commit = m_pseudo_semver.group('commithash') - for l in remote_refs: - r = l.split(maxsplit=1) - sha1 = r[0] if len(r) else None - if not sha1: - logger.error( - "Ups: could not resolve abbref commit for %s" % short_commit) - - elif sha1.startswith(short_commit): - hash = sha1 - break - else: - m_semver = re_semver.match(version) - if m_semver: - - def get_sha1_remote(re): - rsha1 = None - for line in remote_refs: - # Split lines of the following format: - # 22e90d9b964610628c10f673ca5f85b8c2a2ca9a (tag: sometag) - lineparts = line.split(maxsplit=1) - sha1 = lineparts[0] if len(lineparts) else None - refstring = lineparts[1] if len( - lineparts) == 2 else None - if refstring: - # Normalize tag string and split in case of multiple - # regs e.g. (tag: speech/v1.10.0, tag: orchestration/v1.5.0 ...) - refs = refstring.strip('(), ').split(',') - for ref in refs: - if re.match(ref.strip()): - rsha1 = sha1 - return rsha1 - - semver = "v" + m_semver.group('major') + "."\ - + m_semver.group('minor') + "."\ - + m_semver.group('patch') \ - + (("-" + m_semver.group('prerelease')) - if m_semver.group('prerelease') else "") - - tag = os.path.join( - coderoot.codeDir, semver) if coderoot.codeDir else semver - - # probe tag using 'ls-remote', which is faster than fetching - # complete history - hash = vcs_fetch_remote(tag) - if not hash: - # backup: fetch complete history - remote_refs = vcs_fetch_all() - hash = get_sha1_remote( - re.compile(fr"(tag:|HEAD ->) ({tag})")) - - logger.debug( - "Resolving commit for tag '%s' -> '%s'", tag, hash) - return hash - - def __generate_srcuri_inline_fcn(self, path, version, replaces=None): - """Generate SRC_URI functions for go imports""" - - logger.info("Resolving repository for module %s", path) - # First try to resolve repo and commit from golang proxy - # Most info is already there and we don't have to go through the - # repository or even perform the version resolve magic - golang_proxy_info = self.__resolve_from_golang_proxy(path, version) - if golang_proxy_info: - repo = golang_proxy_info[0] - commit = golang_proxy_info[1] - else: - # Fallback - # Resolve repository by 'hand' - repo = self.__resolve_repository(path) - commit = self.__resolve_version(repo, path, version) - - url = urllib.parse.urlparse(repo.url) - repo_url = url.netloc + url.path - - coderoot = self.__build_coderepo(repo, path) - - inline_fcn = "${@go_src_uri(" - inline_fcn += f"'{repo_url}','{version}'" - if repo_url != path: - inline_fcn += f",path='{path}'" - if coderoot.codeDir: - inline_fcn += f",subdir='{coderoot.codeDir}'" - if repo.vcs != 'git': - inline_fcn += f",vcs='{repo.vcs}'" - if replaces: - inline_fcn += f",replaces='{replaces}'" - if coderoot.pathMajor: - inline_fcn += f",pathmajor='{coderoot.pathMajor}'" - inline_fcn += ")}" - - return inline_fcn, commit - - def __go_handle_dependencies(self, go_mod, srctree, localfilesdir, extravalues, d): - - import re - src_uris = [] - src_revs = [] - - def generate_src_rev(path, version, commithash): - src_rev = f"# {path}@{version} => {commithash}\n" - # Ups...maybe someone manipulated the source repository and the - # version or commit could not be resolved. This is a sign of - # a) the supply chain was manipulated (bad) - # b) the implementation for the version resolving didn't work - # anymore (less bad) - if not commithash: - src_rev += f"#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n" - src_rev += f"#!!! Could not resolve version !!!\n" - src_rev += f"#!!! Possible supply chain attack !!!\n" - src_rev += f"#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n" - src_rev += f"SRCREV_{path.replace('/', '.')} = \"{commithash}\"" - - return src_rev - - # we first go over replacement list, because we are essentialy - # interested only in the replaced path - if go_mod['Replace']: - for replacement in go_mod['Replace']: - oldpath = replacement['Old']['Path'] - path = replacement['New']['Path'] - version = '' - if 'Version' in replacement['New']: - version = replacement['New']['Version'] - - if os.path.exists(os.path.join(srctree, path)): - # the module refers to the local path, remove it from requirement list - # because it's a local module - go_mod['Require'][:] = [v for v in go_mod['Require'] if v.get('Path') != oldpath] - else: - # Replace the path and the version, so we don't iterate replacement list anymore - for require in go_mod['Require']: - if require['Path'] == oldpath: - require.update({'Path': path, 'Version': version}) - break - - for require in go_mod['Require']: - path = require['Path'] - version = require['Version'] - - inline_fcn, commithash = self.__generate_srcuri_inline_fcn( - path, version) - src_uris.append(inline_fcn) - src_revs.append(generate_src_rev(path, version, commithash)) - - # strip version part from module URL /vXX - baseurl = re.sub(r'/v(\d+)$', '', go_mod['Module']['Path']) - pn, _ = determine_from_url(baseurl) - go_mods_basename = "%s-modules.inc" % pn - - go_mods_filename = os.path.join(localfilesdir, go_mods_basename) - with open(go_mods_filename, "w") as f: - # We introduce this indirection to make the tests a little easier - f.write("SRC_URI += \"${GO_DEPENDENCIES_SRC_URI}\"\n") - f.write("GO_DEPENDENCIES_SRC_URI = \"\\\n") - for uri in src_uris: - f.write(" " + uri + " \\\n") - f.write("\"\n\n") - for rev in src_revs: - f.write(rev + "\n") - - extravalues['extrafiles'][go_mods_basename] = go_mods_filename - - def __go_run_cmd(self, cmd, cwd, d): - return bb.process.run(cmd, env=dict(os.environ, PATH=d.getVar('PATH')), - shell=True, cwd=cwd) - - def __go_native_version(self, d): - stdout, _ = self.__go_run_cmd("go version", None, d) - m = re.match(r".*\sgo((\d+).(\d+).(\d+))\s([\w\/]*)", stdout) - major = int(m.group(2)) - minor = int(m.group(3)) - patch = int(m.group(4)) - - return major, minor, patch - - def __go_mod_patch(self, srctree, localfilesdir, extravalues, d): - - patchfilename = "go.mod.patch" - go_native_version_major, go_native_version_minor, _ = self.__go_native_version( - d) - self.__go_run_cmd("go mod tidy -go=%d.%d" % - (go_native_version_major, go_native_version_minor), srctree, d) - stdout, _ = self.__go_run_cmd("go mod edit -json", srctree, d) - - # Create patch in order to upgrade go version - self.__go_run_cmd("git diff go.mod > %s" % (patchfilename), srctree, d) - # Restore original state - self.__go_run_cmd("git checkout HEAD go.mod go.sum", srctree, d) - - go_mod = json.loads(stdout) - tmpfile = os.path.join(localfilesdir, patchfilename) - shutil.move(os.path.join(srctree, patchfilename), tmpfile) - - extravalues['extrafiles'][patchfilename] = tmpfile - - return go_mod, patchfilename - - def __go_mod_vendor(self, go_mod, srctree, localfilesdir, extravalues, d): - # Perform vendoring to retrieve the correct modules.txt - tmp_vendor_dir = tempfile.mkdtemp() - - # -v causes to go to print modules.txt to stderr - _, stderr = self.__go_run_cmd( - "go mod vendor -v -o %s" % (tmp_vendor_dir), srctree, d) - - modules_txt_basename = "modules.txt" - modules_txt_filename = os.path.join(localfilesdir, modules_txt_basename) - with open(modules_txt_filename, "w") as f: - f.write(stderr) - - extravalues['extrafiles'][modules_txt_basename] = modules_txt_filename - - licenses = [] + @staticmethod + def __unescape_path(path): + """Unescape capital letters using exclamation points.""" + return re.sub(r'!([a-z])', lambda m: m.group(1).upper(), path) + + @staticmethod + def __fold_uri(uri): + """Fold URI for sorting shorter module paths before longer.""" + return uri.replace(';', ' ').replace('/', '!') + + @staticmethod + def __go_run_cmd(cmd, cwd, d): + env = dict(os.environ, PATH=d.getVar('PATH'), GOMODCACHE=d.getVar('GOMODCACHE')) + return bb.process.run(cmd, env=env, shell=True, cwd=cwd) + + def __go_mod(self, go_mod, srctree, localfilesdir, extravalues, d): + moddir = d.getVar('GOMODCACHE') + + # List main packages and their dependencies with the go list command. + stdout, _ = self.__go_run_cmd(f"go list -json=Dir,Module -deps {go_mod['Module']['Path']}/...", srctree, d) + pkgs = json.loads('[' + stdout.replace('}\n{', '},\n{') + ']') + + # Collect licenses for the dependencies. + licenses = set() lic_files_chksum = [] - licvalues = find_licenses(tmp_vendor_dir, d) - shutil.rmtree(tmp_vendor_dir) + lic_files = {} + for pkg in pkgs: + # TODO: If the package is in a subdirectory with its own license + # files then report those istead of the license files found in the + # module root directory. + mod = pkg.get('Module', None) + if not mod or mod.get('Main', False): + continue + path = os.path.relpath(mod['Dir'], moddir) + for lic in find_licenses(mod['Dir'], d): + lic_files[os.path.join(path, lic[1])] = (lic[0], lic[2]) - if licvalues: - for licvalue in licvalues: - license = licvalue[0] - lics = tidy_licenses(fixup_license(license)) - lics = [lic for lic in lics if lic not in licenses] - if len(lics): - licenses.extend(lics) - lic_files_chksum.append( - 'file://src/${GO_IMPORT}/vendor/%s;md5=%s' % (licvalue[1], licvalue[2])) + for lic_file in lic_files: + licenses.add(lic_files[lic_file][0]) + lic_files_chksum.append( + f'file://pkg/mod/{lic_file};md5={lic_files[lic_file][1]}') - # strip version part from module URL /vXX - baseurl = re.sub(r'/v(\d+)$', '', go_mod['Module']['Path']) - pn, _ = determine_from_url(baseurl) - licenses_basename = "%s-licenses.inc" % pn + # Collect the module cache files downloaded by the go list command as + # the go list command knows best what the go list command needs and it + # needs more files in the module cache than the go install command as + # it doesn't do the dependency pruning mentioned in the Go module + # reference, https://go.dev/ref/mod, for go 1.17 or higher. + src_uris = [] + downloaddir = os.path.join(moddir, 'cache', 'download') + for dirpath, _, filenames in os.walk(downloaddir): + path, base = os.path.split(os.path.relpath(dirpath, downloaddir)) + if base != '@v': + continue + path = self.__unescape_path(path) + zipver = None + for name in filenames: + ver, ext = os.path.splitext(name) + if ext == '.zip': + chksum = bb.utils.sha256_file(os.path.join(dirpath, name)) + src_uris.append(f'gomod://{path};version={ver};sha256sum={chksum}') + zipver = ver + break + for name in filenames: + ver, ext = os.path.splitext(name) + if ext == '.mod' and ver != zipver: + chksum = bb.utils.sha256_file(os.path.join(dirpath, name)) + src_uris.append(f'gomod://{path};version={ver};mod=1;sha256sum={chksum}') + self.__go_run_cmd("go clean -modcache", srctree, d) + + licenses_basename = "{pn}-licenses.inc" licenses_filename = os.path.join(localfilesdir, licenses_basename) with open(licenses_filename, "w") as f: - f.write("GO_MOD_LICENSES = \"%s\"\n\n" % - ' & '.join(sorted(licenses, key=str.casefold))) - # We introduce this indirection to make the tests a little easier - f.write("LIC_FILES_CHKSUM += \"${VENDORED_LIC_FILES_CHKSUM}\"\n") - f.write("VENDORED_LIC_FILES_CHKSUM = \"\\\n") - for lic in lic_files_chksum: - f.write(" " + lic + " \\\n") - f.write("\"\n") + f.write(f'GO_MOD_LICENSES = "{" & ".join(sorted(licenses))}"\n\n') + f.write('LIC_FILES_CHKSUM += "\\\n') + for lic in sorted(lic_files_chksum, key=self.__fold_uri): + f.write(' ' + lic + ' \\\n') + f.write('"\n') - extravalues['extrafiles'][licenses_basename] = licenses_filename + extravalues['extrafiles'][f"../{licenses_basename}"] = licenses_filename + + go_mods_basename = "{pn}-go-mods.inc" + go_mods_filename = os.path.join(localfilesdir, go_mods_basename) + with open(go_mods_filename, "w") as f: + f.write('SRC_URI += "\\\n') + for uri in sorted(src_uris, key=self.__fold_uri): + f.write(' ' + uri + ' \\\n') + f.write('"\n') + + extravalues['extrafiles'][f"../{go_mods_basename}"] = go_mods_filename def process(self, srctree, classes, lines_before, lines_after, handled, extravalues): @@ -672,56 +174,30 @@ class GoRecipeHandler(RecipeHandler): d.prependVar('PATH', '%s:' % go_bindir) handled.append('buildsystem') - classes.append("go-vendor") + classes.append("go-mod") + + tmp_mod_dir = tempfile.mkdtemp(prefix='go-mod-') + d.setVar('GOMODCACHE', tmp_mod_dir) stdout, _ = self.__go_run_cmd("go mod edit -json", srctree, d) - go_mod = json.loads(stdout) - go_import = go_mod['Module']['Path'] - go_version_match = re.match("([0-9]+).([0-9]+)", go_mod['Go']) - go_version_major = int(go_version_match.group(1)) - go_version_minor = int(go_version_match.group(2)) - src_uris = [] + go_import = re.sub(r'/v([0-9]+)$', '', go_mod['Module']['Path']) localfilesdir = tempfile.mkdtemp(prefix='recipetool-go-') extravalues.setdefault('extrafiles', {}) - # Use an explicit name determined from the module name because it - # might differ from the actual URL for replaced modules - # strip version part from module URL /vXX - baseurl = re.sub(r'/v(\d+)$', '', go_mod['Module']['Path']) - pn, _ = determine_from_url(baseurl) - - # go.mod files with version < 1.17 may not include all indirect - # dependencies. Thus, we have to upgrade the go version. - if go_version_major == 1 and go_version_minor < 17: - logger.warning( - "go.mod files generated by Go < 1.17 might have incomplete indirect dependencies.") - go_mod, patchfilename = self.__go_mod_patch(srctree, localfilesdir, - extravalues, d) - src_uris.append( - "file://%s;patchdir=src/${GO_IMPORT}" % (patchfilename)) - - # Check whether the module is vendored. If so, we have nothing to do. - # Otherwise we gather all dependencies and add them to the recipe - if not os.path.exists(os.path.join(srctree, "vendor")): - - # Write additional $BPN-modules.inc file - self.__go_mod_vendor(go_mod, srctree, localfilesdir, extravalues, d) - lines_before.append("LICENSE += \" & ${GO_MOD_LICENSES}\"") - lines_before.append("require %s-licenses.inc" % (pn)) - - self.__rewrite_src_uri(lines_before, ["file://modules.txt"]) - - self.__go_handle_dependencies(go_mod, srctree, localfilesdir, extravalues, d) - lines_before.append("require %s-modules.inc" % (pn)) + # Write the ${BPN}-licenses.inc and ${BPN}-go-mods.inc files + self.__go_mod(go_mod, srctree, localfilesdir, extravalues, d) # Do generic license handling handle_license_vars(srctree, lines_before, handled, extravalues, d) - self.__rewrite_lic_uri(lines_before) + self.__rewrite_lic_vars(lines_before) - lines_before.append("GO_IMPORT = \"{}\"".format(baseurl)) - lines_before.append("SRCREV_FORMAT = \"${BPN}\"") + self.__rewrite_src_uri(lines_before) + + lines_before.append('require ${BPN}-licenses.inc') + lines_before.append('require ${BPN}-go-mods.inc') + lines_before.append(f'GO_IMPORT = "{go_import}"') def __update_lines_before(self, updated, newlines, lines_before): if updated: @@ -733,9 +209,11 @@ class GoRecipeHandler(RecipeHandler): lines_before.append(line) return updated - def __rewrite_lic_uri(self, lines_before): + def __rewrite_lic_vars(self, lines_before): def varfunc(varname, origvalue, op, newlines): + if varname == 'LICENSE': + return ' & '.join((origvalue, '${GO_MOD_LICENSES}')), None, -1, True if varname == 'LIC_FILES_CHKSUM': new_licenses = [] licenses = origvalue.split('\\') @@ -757,15 +235,14 @@ class GoRecipeHandler(RecipeHandler): return origvalue, None, 0, True updated, newlines = bb.utils.edit_metadata( - lines_before, ['LIC_FILES_CHKSUM'], varfunc) + lines_before, ['LICENSE', 'LIC_FILES_CHKSUM'], varfunc) return self.__update_lines_before(updated, newlines, lines_before) - def __rewrite_src_uri(self, lines_before, additional_uris = []): + def __rewrite_src_uri(self, lines_before): def varfunc(varname, origvalue, op, newlines): if varname == 'SRC_URI': - src_uri = ["git://${GO_IMPORT};destsuffix=git/src/${GO_IMPORT};nobranch=1;name=${BPN};protocol=https"] - src_uri.extend(additional_uris) + src_uri = ['git://${GO_IMPORT};protocol=https;nobranch=1;destsuffix=${GO_SRCURI_DESTSUFFIX}'] return src_uri, None, -1, True return origvalue, None, 0, True From patchwork Thu May 29 20:27:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 423DBC5B555 for ; Thu, 29 May 2025 20:28:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.2812.1748550495363949487 for ; Thu, 29 May 2025 13:28:15 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8E59E2574 for ; Thu, 29 May 2025 13:27:58 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id AF20B3F673 for ; Thu, 29 May 2025 13:28:14 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 4/9] scripts/recipetool/licenses.csv: add more licenses Date: Thu, 29 May 2025 21:27:57 +0100 Message-ID: <20250529202802.1198179-5-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:19 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217441 Some more hashes for some Go modules that I found in the wild. This method of license identification is not scaling... Ross --- scripts/lib/recipetool/licenses.csv | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/scripts/lib/recipetool/licenses.csv b/scripts/lib/recipetool/licenses.csv index ddc818f48c4..16397e85546 100644 --- a/scripts/lib/recipetool/licenses.csv +++ b/scripts/lib/recipetool/licenses.csv @@ -1,6 +1,9 @@ +02d4002e9171d41a8fad93aa7faf3956,BSD-3-Clause 0636e73ff0215e8d672dc4c32c317bb3,GPL-2.0-only +0dd48ae8103725bd7b401261520cdfbb,BSD-3-Clause 12f884d2ae1ff87c09e5b7ccc2c4ca7e,GPL-2.0-only 18810669f13b87348459e611d31ab760,GPL-2.0-only +19cbd64715b51267a47bf3750cc6a8a5,Apache-2.0 201414b6610203caed355323b1ab3116,BSD-3-Clause 252890d9eee26aab7b432e8b8a616475,LGPL-2.0-only 2d5025d4aa3495befef8f17206a5b0a1,LGPL-2.1-only @@ -10,6 +13,7 @@ 3b83ef96387f14655fc854ddc3c6bd57,Apache-2.0 3bf50002aefd002f49e7bb854063f7e7,LGPL-2.0-only 4325afd396febcb659c36b49533135d4,GPL-2.0-only +4ee4feb2b545c2231749e5c54ace343e,BSD-3-Clause 4fbd65380cdd255951079008b364516c,LGPL-2.1-only 54c7042be62e169199200bc6477f04d1,BSD-3-Clause 55ca817ccb7d5b5b66355690e9abc605,LGPL-2.0-only @@ -20,14 +24,17 @@ 751419260aa954499f7abaabaa882bbe,GPL-2.0-only 7fbc338309ac38fefcd64b04bb903e34,LGPL-2.1-only 8ca43cbc842c2336e835926c2166c28b,GPL-2.0-only +939cce1ec101726fa754e698ac871622,BSD-3-Clause 94d55d512a9ba36caa9b7df079bae19f,GPL-2.0-only 9ac2e7cff1ddaf48b6eab6028f23ef88,GPL-2.0-only 9f604d8a4f8e74f4f5140845a21b6674,LGPL-2.0-only +a651bb3d8b1c412632e28823bb432b40,BSD-3-Clause a6f89e2100d9b6cdffcea4f398e37343,LGPL-2.1-only b234ee4d69f5fce4486a80fdaf4a4263,GPL-2.0-only bbb461211a33b134d42ed5ee802b37ff,LGPL-2.1-only bfe1f75d606912a4111c90743d6c7325,MPL-1.1-only c93c0550bd3173f4504b2cbd8991e50b,GPL-2.0-only +d0b68be4a2dc957aaf09144970bc6696,MIT d32239bcb673463ab874e80d47fae504,GPL-3.0-only d7810fab7487fb0aad327b76f1be7cd7,GPL-2.0-only d8045f3b8f929c1cb29a1e3fd737b499,LGPL-2.1-only From patchwork Thu May 29 20:27:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B11AC5AD49 for ; Thu, 29 May 2025 20:28:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.2829.1748550494931982748 for ; Thu, 29 May 2025 13:28:16 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 402252454 for ; Thu, 29 May 2025 13:27:59 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 573DB3F673 for ; Thu, 29 May 2025 13:28:15 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 5/9] recipetool/create: show more of the license path when it can't be identified Date: Thu, 29 May 2025 21:27:58 +0100 Message-ID: <20250529202802.1198179-6-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:19 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217442 If there are multiple source trees in a project (incredibly common with go-mod, for example) then the relative path of the LICENSE file from the source tree could just be "LICENSE", which is not useful when there are tens of files across the recipe with that name. Show the parent directory name too, to clarify which file is unknown. Signed-off-by: Ross Burton --- scripts/lib/recipetool/create.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py index 24315b95b08..390cc37db43 100644 --- a/scripts/lib/recipetool/create.py +++ b/scripts/lib/recipetool/create.py @@ -1251,7 +1251,7 @@ def match_licenses(licfiles, srctree, d): license = 'Unknown' logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ "and replace `Unknown` with the license:\n" \ - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) + "%s,Unknown" % (os.path.relpath(licfile, srctree + "/.."), md5value)) if license: licenses.append((license, os.path.relpath(licfile, srctree), md5value)) From patchwork Thu May 29 20:27:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20286C5B552 for ; Thu, 29 May 2025 20:28:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.2812.1748550495363949487 for ; Thu, 29 May 2025 13:28:16 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E521A2454 for ; Thu, 29 May 2025 13:27:59 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 08BCC3F673 for ; Thu, 29 May 2025 13:28:15 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 6/9] scripts/scriptutils: silence warning about S not existing in emptysrc Date: Thu, 29 May 2025 21:27:59 +0100 Message-ID: <20250529202802.1198179-7-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:19 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217443 This function creates an emptysrc recipe, but S points to a directory that doesn't exist and bitbake warns about this. As it is under the temporary working directory which will be deleted later, create it to silence the warning. Signed-off-by: Ross Burton --- scripts/lib/scriptutils.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/scripts/lib/scriptutils.py b/scripts/lib/scriptutils.py index 81f0b01fa53..32e749dbb1a 100644 --- a/scripts/lib/scriptutils.py +++ b/scripts/lib/scriptutils.py @@ -182,7 +182,10 @@ def fetch_url(tinfoil, srcuri, srcrev, destdir, logger, preserve_tmp=False, mirr f.write('UNPACKDIR = "%s"\n' % destdir) # Set S out of the way so it doesn't get created under the workdir - f.write('S = "%s"\n' % os.path.join(tmpdir, 'emptysrc')) + s_dir = os.path.join(tmpdir, 'emptysrc') + bb.utils.mkdirhier(s_dir) + f.write('S = "%s"\n' % s_dir) + if not mirrors: # We do not need PREMIRRORS since we are almost certainly # fetching new source rather than something that has already From patchwork Thu May 29 20:28:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 857FFC54FB3 for ; Thu, 29 May 2025 20:33:59 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.2928.1748550838533589467 for ; Thu, 29 May 2025 13:33:58 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8D3A92454 for ; Thu, 29 May 2025 13:28:00 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id AE02D3F673 for ; Thu, 29 May 2025 13:28:16 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 7/9] lib/oeqa/subprocesstweak: clean up __str__() Date: Thu, 29 May 2025 21:28:00 +0100 Message-ID: <20250529202802.1198179-8-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:33:59 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217446 Call super().__str__ to get the bulk of the string representation, and we don't need to guard on output/strerr existing as they always set. Signed-off-by: Ross Burton --- meta/lib/oeqa/utils/subprocesstweak.py | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/meta/lib/oeqa/utils/subprocesstweak.py b/meta/lib/oeqa/utils/subprocesstweak.py index 3e43ed547bd..1774513023b 100644 --- a/meta/lib/oeqa/utils/subprocesstweak.py +++ b/meta/lib/oeqa/utils/subprocesstweak.py @@ -8,16 +8,11 @@ import subprocess class OETestCalledProcessError(subprocess.CalledProcessError): def __str__(self): def strify(o): - if isinstance(o, bytes): - return o.decode("utf-8", errors="replace") - else: - return o + return o.decode("utf-8", errors="replace") if isinstance(o, bytes) else o - s = "Command '%s' returned non-zero exit status %d" % (self.cmd, self.returncode) - if hasattr(self, "output") and self.output: - s = s + "\nStandard Output: " + strify(self.output) - if hasattr(self, "stderr") and self.stderr: - s = s + "\nStandard Error: " + strify(self.stderr) + s = super().__str__() + s = s + "\nStandard Output: " + strify(self.output) + s = s + "\nStandard Error: " + strify(self.stderr) return s def errors_have_output(): From patchwork Thu May 29 20:28:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59B7CC54FB3 for ; Thu, 29 May 2025 20:38:39 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.3046.1748551117921997177 for ; Thu, 29 May 2025 13:38:38 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 562E22574 for ; Thu, 29 May 2025 13:28:11 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 575093F673 for ; Thu, 29 May 2025 13:28:17 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 8/9] lib/oe/license_finder: extract license-finding code from recipetool Date: Thu, 29 May 2025 21:28:01 +0100 Message-ID: <20250529202802.1198179-9-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:38:39 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217447 Move the find-and-detect-licenses code from recipetool into lib/oe, so that it can be used outside of recipetool. Signed-off-by: Ross Burton --- .../files/license-hashes.csv | 2 + meta/lib/oe/license_finder.py | 226 ++++++++++++++++++ scripts/lib/recipetool/create.py | 226 +----------------- 3 files changed, 230 insertions(+), 224 deletions(-) rename scripts/lib/recipetool/licenses.csv => meta/files/license-hashes.csv (95%) create mode 100644 meta/lib/oe/license_finder.py diff --git a/scripts/lib/recipetool/licenses.csv b/meta/files/license-hashes.csv similarity index 95% rename from scripts/lib/recipetool/licenses.csv rename to meta/files/license-hashes.csv index 16397e85546..5729a7314bb 100644 --- a/scripts/lib/recipetool/licenses.csv +++ b/meta/files/license-hashes.csv @@ -1,5 +1,6 @@ 02d4002e9171d41a8fad93aa7faf3956,BSD-3-Clause 0636e73ff0215e8d672dc4c32c317bb3,GPL-2.0-only +0ceb9ff3b27d3a8cf451ca3785d73c71,BSD-3-Clause & MIT 0dd48ae8103725bd7b401261520cdfbb,BSD-3-Clause 12f884d2ae1ff87c09e5b7ccc2c4ca7e,GPL-2.0-only 18810669f13b87348459e611d31ab760,GPL-2.0-only @@ -22,6 +23,7 @@ 5f30f0716dfdd0d91eb439ebec522ec2,LGPL-2.0-only 6a6a8e020838b23406c81b19c1d46df6,LGPL-3.0-only 751419260aa954499f7abaabaa882bbe,GPL-2.0-only +7998cb338f82d15c0eff93b7004d272a,BSD-3-Clause 7fbc338309ac38fefcd64b04bb903e34,LGPL-2.1-only 8ca43cbc842c2336e835926c2166c28b,GPL-2.0-only 939cce1ec101726fa754e698ac871622,BSD-3-Clause diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py new file mode 100644 index 00000000000..189a39cb68a --- /dev/null +++ b/meta/lib/oe/license_finder.py @@ -0,0 +1,226 @@ +import fnmatch +import hashlib +import os +import re + +def get_license_md5sums(d, static_only=False, linenumbers=False): + import bb.utils + import csv + md5sums = {} + if not static_only and not linenumbers: + # Gather md5sums of license files in common license dir + commonlicdir = d.getVar('COMMON_LICENSE_DIR') + for fn in os.listdir(commonlicdir): + md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) + md5sums[md5value] = fn + + # The following were extracted from common values in various recipes + # (double checking the license against the license file itself, not just + # the LICENSE value in the recipe) + + # Read license md5sums from csv file + for path in d.getVar('BBPATH').split(':'): + csv_path = os.path.join(path, 'files', 'license-hashes.csv') + if os.path.isfile(csv_path): + with open(csv_path, newline='') as csv_file: + fieldnames = ['md5sum', 'license', 'beginline', 'endline', 'md5'] + reader = csv.DictReader(csv_file, delimiter=',', fieldnames=fieldnames) + for row in reader: + if linenumbers: + md5sums[row['md5sum']] = ( + row['license'], row['beginline'], row['endline'], row['md5']) + else: + md5sums[row['md5sum']] = row['license'] + + return md5sums + +def crunch_known_licenses(d): + ''' + Calculate the MD5 checksums for the crunched versions of all common + licenses. Also add additional known checksums. + ''' + + crunched_md5sums = {} + + # common licenses + crunched_md5sums['ad4e9d34a2e966dfe9837f18de03266d'] = 'GFDL-1.1-only' + crunched_md5sums['d014fb11a34eb67dc717fdcfc97e60ed'] = 'GFDL-1.2-only' + crunched_md5sums['e020ca655b06c112def28e597ab844f1'] = 'GFDL-1.3-only' + + # The following two were gleaned from the "forever" npm package + crunched_md5sums['0a97f8e4cbaf889d6fa51f84b89a79f6'] = 'ISC' + # https://github.com/waffle-gl/waffle/blob/master/LICENSE.txt + crunched_md5sums['50fab24ce589d69af8964fdbfe414c60'] = 'BSD-2-Clause' + # https://github.com/spigwitmer/fakeds1963s/blob/master/LICENSE + crunched_md5sums['88a4355858a1433fea99fae34a44da88'] = 'GPL-2.0-only' + # http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt + crunched_md5sums['063b5c3ebb5f3aa4c85a2ed18a31fbe7'] = 'GPL-2.0-only' + # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv2.1 + crunched_md5sums['7f5202f4d44ed15dcd4915f5210417d8'] = 'LGPL-2.1-only' + # unixODBC-2.3.4 COPYING + crunched_md5sums['3debde09238a8c8e1f6a847e1ec9055b'] = 'LGPL-2.1-only' + # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv3 + crunched_md5sums['f90c613c51aa35da4d79dd55fc724ceb'] = 'LGPL-3.0-only' + # https://raw.githubusercontent.com/eclipse/mosquitto/v1.4.14/epl-v10 + crunched_md5sums['efe2cb9a35826992b9df68224e3c2628'] = 'EPL-1.0' + + # https://raw.githubusercontent.com/jquery/esprima/3.1.3/LICENSE.BSD + crunched_md5sums['80fa7b56a28e8c902e6af194003220a5'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/npm/npm-install-checks/master/LICENSE + crunched_md5sums['e659f77bfd9002659e112d0d3d59b2c1'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/silverwind/default-gateway/4.2.0/LICENSE + crunched_md5sums['4c641f2d995c47f5cb08bdb4b5b6ea05'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/tad-lispy/node-damerau-levenshtein/v1.0.5/LICENSE + crunched_md5sums['2b8c039b2b9a25f0feb4410c4542d346'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/terser/terser/v3.17.0/LICENSE + crunched_md5sums['8bd23871802951c9ad63855151204c2c'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/alexei/sprintf.js/1.0.3/LICENSE + crunched_md5sums['008c22318c8ea65928bf730ddd0273e3'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/Caligatio/jsSHA/v3.2.0/LICENSE + crunched_md5sums['0e46634a01bfef056892949acaea85b1'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/d3/d3-path/v1.0.9/LICENSE + crunched_md5sums['b5f72aef53d3b2b432702c30b0215666'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/feross/ieee754/v1.1.13/LICENSE + crunched_md5sums['a39327c997c20da0937955192d86232d'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/joyent/node-extsprintf/v1.3.0/LICENSE + crunched_md5sums['721f23a96ff4161ca3a5f071bbe18108'] = 'MIT' + # https://raw.githubusercontent.com/pvorb/clone/v0.2.0/LICENSE + crunched_md5sums['b376d29a53c9573006b9970709231431'] = 'MIT' + # https://raw.githubusercontent.com/andris9/encoding/v0.1.12/LICENSE + crunched_md5sums['85d8a977ee9d7c5ab4ac03c9b95431c4'] = 'MIT-0' + # https://raw.githubusercontent.com/faye/websocket-driver-node/0.7.3/LICENSE.md + crunched_md5sums['b66384e7137e41a9b1904ef4d39703b6'] = 'Apache-2.0' + # https://raw.githubusercontent.com/less/less.js/v4.1.1/LICENSE + crunched_md5sums['b27575459e02221ccef97ec0bfd457ae'] = 'Apache-2.0' + # https://raw.githubusercontent.com/microsoft/TypeScript/v3.5.3/LICENSE.txt + crunched_md5sums['a54a1a6a39e7f9dbb4a23a42f5c7fd1c'] = 'Apache-2.0' + # https://raw.githubusercontent.com/request/request/v2.87.0/LICENSE + crunched_md5sums['1034431802e57486b393d00c5d262b8a'] = 'Apache-2.0' + # https://raw.githubusercontent.com/dchest/tweetnacl-js/v0.14.5/LICENSE + crunched_md5sums['75605e6bdd564791ab698fca65c94a4f'] = 'Unlicense' + # https://raw.githubusercontent.com/stackgl/gl-mat3/v2.0.0/LICENSE.md + crunched_md5sums['75512892d6f59dddb6d1c7e191957e9c'] = 'Zlib' + + commonlicdir = d.getVar('COMMON_LICENSE_DIR') + for fn in sorted(os.listdir(commonlicdir)): + md5value, lictext = crunch_license(os.path.join(commonlicdir, fn)) + if md5value not in crunched_md5sums: + crunched_md5sums[md5value] = fn + elif fn != crunched_md5sums[md5value]: + bb.debug(2, "crunched_md5sums['%s'] is already set to '%s' rather than '%s'" % (md5value, crunched_md5sums[md5value], fn)) + else: + bb.debug(2, "crunched_md5sums['%s'] is already set to '%s'" % (md5value, crunched_md5sums[md5value])) + + return crunched_md5sums + +def crunch_license(licfile): + ''' + Remove non-material text from a license file and then calculate its + md5sum. This works well for licenses that contain a copyright statement, + but is also a useful way to handle people's insistence upon reformatting + the license text slightly (with no material difference to the text of the + license). + ''' + + import oe.utils + + # Note: these are carefully constructed! + license_title_re = re.compile(r'^#*\(? *(This is )?([Tt]he )?.{0,15} ?[Ll]icen[sc]e( \(.{1,10}\))?\)?[:\.]? ?#*$') + license_statement_re = re.compile(r'^((This (project|software)|.{1,10}) is( free software)? (released|licen[sc]ed)|(Released|Licen[cs]ed)) under the .{1,10} [Ll]icen[sc]e:?$') + copyright_re = re.compile(r'^ *[#\*]* *(Modified work |MIT LICENSED )?Copyright ?(\([cC]\))? .*$') + disclaimer_re = re.compile(r'^ *\*? ?All [Rr]ights [Rr]eserved\.$') + email_re = re.compile(r'^.*<[\w\.-]*@[\w\.\-]*>$') + header_re = re.compile(r'^(\/\**!?)? ?[\-=\*]* ?(\*\/)?$') + tag_re = re.compile(r'^ *@?\(?([Ll]icense|MIT)\)?$') + url_re = re.compile(r'^ *[#\*]* *https?:\/\/[\w\.\/\-]+$') + + lictext = [] + with open(licfile, 'r', errors='surrogateescape') as f: + for line in f: + # Drop opening statements + if copyright_re.match(line): + continue + elif disclaimer_re.match(line): + continue + elif email_re.match(line): + continue + elif header_re.match(line): + continue + elif tag_re.match(line): + continue + elif url_re.match(line): + continue + elif license_title_re.match(line): + continue + elif license_statement_re.match(line): + continue + # Strip comment symbols + line = line.replace('*', '') \ + .replace('#', '') + # Unify spelling + line = line.replace('sub-license', 'sublicense') + # Squash spaces + line = oe.utils.squashspaces(line.strip()) + # Replace smart quotes, double quotes and backticks with single quotes + line = line.replace(u"\u2018", "'").replace(u"\u2019", "'").replace(u"\u201c","'").replace(u"\u201d", "'").replace('"', '\'').replace('`', '\'') + # Unify brackets + line = line.replace("{", "[").replace("}", "]") + if line: + lictext.append(line) + + m = hashlib.md5() + try: + m.update(' '.join(lictext).encode('utf-8')) + md5val = m.hexdigest() + except UnicodeEncodeError: + md5val = None + lictext = '' + return md5val, lictext + +def find_license_files(srctree): + licspecs = ['*LICEN[CS]E*', 'COPYING*', '*[Ll]icense*', 'LEGAL*', '[Ll]egal*', '*GPL*', 'README.lic*', 'COPYRIGHT*', '[Cc]opyright*', 'e[dp]l-v10'] + skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go") + licfiles = [] + for root, dirs, files in os.walk(srctree): + for fn in files: + if fn.endswith(skip_extensions): + continue + for spec in licspecs: + if fnmatch.fnmatch(fn, spec): + fullpath = os.path.join(root, fn) + if not fullpath in licfiles: + licfiles.append(fullpath) + + return licfiles + +def match_licenses(licfiles, srctree, d): + import bb + md5sums = get_license_md5sums(d) + + crunched_md5sums = crunch_known_licenses(d) + + licenses = [] + for licfile in sorted(licfiles): + resolved_licfile = d.expand(licfile) + md5value = bb.utils.md5_file(resolved_licfile) + license = md5sums.get(md5value, None) + if not license: + crunched_md5, lictext = crunch_license(resolved_licfile) + license = crunched_md5sums.get(crunched_md5, None) + if lictext and not license: + license = 'Unknown' + bb.warn("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ + "and replace `Unknown` with the license:\n" \ + "%s,Unknown" % (os.path.relpath(licfile, srctree + "/.."), md5value)) + if license: + licenses.append((license, os.path.relpath(licfile, srctree), md5value)) + + return licenses + +def find_licenses(srctree, d): + licfiles = find_license_files(srctree) + licenses = match_licenses(licfiles, srctree, d) + + # FIXME should we grab at least one source file with a license header and add that too? + + return licenses diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py index 390cc37db43..4900bfdbb42 100644 --- a/scripts/lib/recipetool/create.py +++ b/scripts/lib/recipetool/create.py @@ -956,6 +956,8 @@ def tidy_licenses(value): return sorted(list(set(flattened_licenses(value, _choose))), key=str.casefold) def handle_license_vars(srctree, lines_before, handled, extravalues, d): + from oe.license_finder import find_licenses + lichandled = [x for x in handled if x[0] == 'license'] if lichandled: # Someone else has already handled the license vars, just return their value @@ -1041,230 +1043,6 @@ def handle_license_vars(srctree, lines_before, handled, extravalues, d): handled.append(('license', licvalues)) return licvalues -def get_license_md5sums(d, static_only=False, linenumbers=False): - import bb.utils - import csv - md5sums = {} - if not static_only and not linenumbers: - # Gather md5sums of license files in common license dir - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in os.listdir(commonlicdir): - md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) - md5sums[md5value] = fn - - # The following were extracted from common values in various recipes - # (double checking the license against the license file itself, not just - # the LICENSE value in the recipe) - - # Read license md5sums from csv file - scripts_path = os.path.dirname(os.path.realpath(__file__)) - for path in (d.getVar('BBPATH').split(':') - + [os.path.join(scripts_path, '..', '..')]): - csv_path = os.path.join(path, 'lib', 'recipetool', 'licenses.csv') - if os.path.isfile(csv_path): - with open(csv_path, newline='') as csv_file: - fieldnames = ['md5sum', 'license', 'beginline', 'endline', 'md5'] - reader = csv.DictReader(csv_file, delimiter=',', fieldnames=fieldnames) - for row in reader: - if linenumbers: - md5sums[row['md5sum']] = ( - row['license'], row['beginline'], row['endline'], row['md5']) - else: - md5sums[row['md5sum']] = row['license'] - - return md5sums - -def crunch_known_licenses(d): - ''' - Calculate the MD5 checksums for the crunched versions of all common - licenses. Also add additional known checksums. - ''' - - crunched_md5sums = {} - - # common licenses - crunched_md5sums['ad4e9d34a2e966dfe9837f18de03266d'] = 'GFDL-1.1-only' - crunched_md5sums['d014fb11a34eb67dc717fdcfc97e60ed'] = 'GFDL-1.2-only' - crunched_md5sums['e020ca655b06c112def28e597ab844f1'] = 'GFDL-1.3-only' - - # The following two were gleaned from the "forever" npm package - crunched_md5sums['0a97f8e4cbaf889d6fa51f84b89a79f6'] = 'ISC' - # https://github.com/waffle-gl/waffle/blob/master/LICENSE.txt - crunched_md5sums['50fab24ce589d69af8964fdbfe414c60'] = 'BSD-2-Clause' - # https://github.com/spigwitmer/fakeds1963s/blob/master/LICENSE - crunched_md5sums['88a4355858a1433fea99fae34a44da88'] = 'GPL-2.0-only' - # http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt - crunched_md5sums['063b5c3ebb5f3aa4c85a2ed18a31fbe7'] = 'GPL-2.0-only' - # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv2.1 - crunched_md5sums['7f5202f4d44ed15dcd4915f5210417d8'] = 'LGPL-2.1-only' - # unixODBC-2.3.4 COPYING - crunched_md5sums['3debde09238a8c8e1f6a847e1ec9055b'] = 'LGPL-2.1-only' - # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv3 - crunched_md5sums['f90c613c51aa35da4d79dd55fc724ceb'] = 'LGPL-3.0-only' - # https://raw.githubusercontent.com/eclipse/mosquitto/v1.4.14/epl-v10 - crunched_md5sums['efe2cb9a35826992b9df68224e3c2628'] = 'EPL-1.0' - - # https://raw.githubusercontent.com/jquery/esprima/3.1.3/LICENSE.BSD - crunched_md5sums['80fa7b56a28e8c902e6af194003220a5'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/npm/npm-install-checks/master/LICENSE - crunched_md5sums['e659f77bfd9002659e112d0d3d59b2c1'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/silverwind/default-gateway/4.2.0/LICENSE - crunched_md5sums['4c641f2d995c47f5cb08bdb4b5b6ea05'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/tad-lispy/node-damerau-levenshtein/v1.0.5/LICENSE - crunched_md5sums['2b8c039b2b9a25f0feb4410c4542d346'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/terser/terser/v3.17.0/LICENSE - crunched_md5sums['8bd23871802951c9ad63855151204c2c'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/alexei/sprintf.js/1.0.3/LICENSE - crunched_md5sums['008c22318c8ea65928bf730ddd0273e3'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/Caligatio/jsSHA/v3.2.0/LICENSE - crunched_md5sums['0e46634a01bfef056892949acaea85b1'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/d3/d3-path/v1.0.9/LICENSE - crunched_md5sums['b5f72aef53d3b2b432702c30b0215666'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/feross/ieee754/v1.1.13/LICENSE - crunched_md5sums['a39327c997c20da0937955192d86232d'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/joyent/node-extsprintf/v1.3.0/LICENSE - crunched_md5sums['721f23a96ff4161ca3a5f071bbe18108'] = 'MIT' - # https://raw.githubusercontent.com/pvorb/clone/v0.2.0/LICENSE - crunched_md5sums['b376d29a53c9573006b9970709231431'] = 'MIT' - # https://raw.githubusercontent.com/andris9/encoding/v0.1.12/LICENSE - crunched_md5sums['85d8a977ee9d7c5ab4ac03c9b95431c4'] = 'MIT-0' - # https://raw.githubusercontent.com/faye/websocket-driver-node/0.7.3/LICENSE.md - crunched_md5sums['b66384e7137e41a9b1904ef4d39703b6'] = 'Apache-2.0' - # https://raw.githubusercontent.com/less/less.js/v4.1.1/LICENSE - crunched_md5sums['b27575459e02221ccef97ec0bfd457ae'] = 'Apache-2.0' - # https://raw.githubusercontent.com/microsoft/TypeScript/v3.5.3/LICENSE.txt - crunched_md5sums['a54a1a6a39e7f9dbb4a23a42f5c7fd1c'] = 'Apache-2.0' - # https://raw.githubusercontent.com/request/request/v2.87.0/LICENSE - crunched_md5sums['1034431802e57486b393d00c5d262b8a'] = 'Apache-2.0' - # https://raw.githubusercontent.com/dchest/tweetnacl-js/v0.14.5/LICENSE - crunched_md5sums['75605e6bdd564791ab698fca65c94a4f'] = 'Unlicense' - # https://raw.githubusercontent.com/stackgl/gl-mat3/v2.0.0/LICENSE.md - crunched_md5sums['75512892d6f59dddb6d1c7e191957e9c'] = 'Zlib' - - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in sorted(os.listdir(commonlicdir)): - md5value, lictext = crunch_license(os.path.join(commonlicdir, fn)) - if md5value not in crunched_md5sums: - crunched_md5sums[md5value] = fn - elif fn != crunched_md5sums[md5value]: - bb.debug(2, "crunched_md5sums['%s'] is already set to '%s' rather than '%s'" % (md5value, crunched_md5sums[md5value], fn)) - else: - bb.debug(2, "crunched_md5sums['%s'] is already set to '%s'" % (md5value, crunched_md5sums[md5value])) - - return crunched_md5sums - -def crunch_license(licfile): - ''' - Remove non-material text from a license file and then calculate its - md5sum. This works well for licenses that contain a copyright statement, - but is also a useful way to handle people's insistence upon reformatting - the license text slightly (with no material difference to the text of the - license). - ''' - - import oe.utils - - # Note: these are carefully constructed! - license_title_re = re.compile(r'^#*\(? *(This is )?([Tt]he )?.{0,15} ?[Ll]icen[sc]e( \(.{1,10}\))?\)?[:\.]? ?#*$') - license_statement_re = re.compile(r'^((This (project|software)|.{1,10}) is( free software)? (released|licen[sc]ed)|(Released|Licen[cs]ed)) under the .{1,10} [Ll]icen[sc]e:?$') - copyright_re = re.compile(r'^ *[#\*]* *(Modified work |MIT LICENSED )?Copyright ?(\([cC]\))? .*$') - disclaimer_re = re.compile(r'^ *\*? ?All [Rr]ights [Rr]eserved\.$') - email_re = re.compile(r'^.*<[\w\.-]*@[\w\.\-]*>$') - header_re = re.compile(r'^(\/\**!?)? ?[\-=\*]* ?(\*\/)?$') - tag_re = re.compile(r'^ *@?\(?([Ll]icense|MIT)\)?$') - url_re = re.compile(r'^ *[#\*]* *https?:\/\/[\w\.\/\-]+$') - - lictext = [] - with open(licfile, 'r', errors='surrogateescape') as f: - for line in f: - # Drop opening statements - if copyright_re.match(line): - continue - elif disclaimer_re.match(line): - continue - elif email_re.match(line): - continue - elif header_re.match(line): - continue - elif tag_re.match(line): - continue - elif url_re.match(line): - continue - elif license_title_re.match(line): - continue - elif license_statement_re.match(line): - continue - # Strip comment symbols - line = line.replace('*', '') \ - .replace('#', '') - # Unify spelling - line = line.replace('sub-license', 'sublicense') - # Squash spaces - line = oe.utils.squashspaces(line.strip()) - # Replace smart quotes, double quotes and backticks with single quotes - line = line.replace(u"\u2018", "'").replace(u"\u2019", "'").replace(u"\u201c","'").replace(u"\u201d", "'").replace('"', '\'').replace('`', '\'') - # Unify brackets - line = line.replace("{", "[").replace("}", "]") - if line: - lictext.append(line) - - m = hashlib.md5() - try: - m.update(' '.join(lictext).encode('utf-8')) - md5val = m.hexdigest() - except UnicodeEncodeError: - md5val = None - lictext = '' - return md5val, lictext - -def find_license_files(srctree): - licspecs = ['*LICEN[CS]E*', 'COPYING*', '*[Ll]icense*', 'LEGAL*', '[Ll]egal*', '*GPL*', 'README.lic*', 'COPYRIGHT*', '[Cc]opyright*', 'e[dp]l-v10'] - skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go") - licfiles = [] - for root, dirs, files in os.walk(srctree): - for fn in files: - if fn.endswith(skip_extensions): - continue - for spec in licspecs: - if fnmatch.fnmatch(fn, spec): - fullpath = os.path.join(root, fn) - if not fullpath in licfiles: - licfiles.append(fullpath) - - return licfiles - -def match_licenses(licfiles, srctree, d): - import bb - md5sums = get_license_md5sums(d) - - crunched_md5sums = crunch_known_licenses(d) - - licenses = [] - for licfile in sorted(licfiles): - resolved_licfile = d.expand(licfile) - md5value = bb.utils.md5_file(resolved_licfile) - license = md5sums.get(md5value, None) - if not license: - crunched_md5, lictext = crunch_license(resolved_licfile) - license = crunched_md5sums.get(crunched_md5, None) - if lictext and not license: - license = 'Unknown' - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ - "and replace `Unknown` with the license:\n" \ - "%s,Unknown" % (os.path.relpath(licfile, srctree + "/.."), md5value)) - if license: - licenses.append((license, os.path.relpath(licfile, srctree), md5value)) - - return licenses - -def find_licenses(srctree, d): - licfiles = find_license_files(srctree) - licenses = match_licenses(licfiles, srctree, d) - - # FIXME should we grab at least one source file with a license header and add that too? - - return licenses - def split_pkg_licenses(licvalues, packages, outlines, fallback_licenses=None, pn='${PN}'): """ Given a list of (license, path, md5sum) as returned by match_licenses(), From patchwork Thu May 29 20:28:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 63828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B735C5B554 for ; Thu, 29 May 2025 20:28:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.2814.1748550498770598512 for ; Thu, 29 May 2025 13:28:18 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F35752574 for ; Thu, 29 May 2025 13:28:01 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1EC663F792 for ; Thu, 29 May 2025 13:28:17 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 9/9] Prototype go-mod-update-modules class Date: Thu, 29 May 2025 21:28:02 +0100 Message-ID: <20250529202802.1198179-10-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250529202802.1198179-1-ross.burton@arm.com> References: <20250529202802.1198179-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 29 May 2025 20:28:19 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/217444 Almost entirely based on the create_go.py module for recipetool by Christian Lindeberg , this instead has the logic inside a class that can be used to update the list of Go module that are used, both SRC_URI and LICENSE. My test case is crucible: simply inherit this class and run the task, it will rewrite the include files. There's still plenty to be done: - Verify that the module list is the set of modules need to _build_ and not the longer set of modules need to run all the tests for all dependencies. - Test behaviour when used as part of 'devtool upgrade' - Determine how to integrate this with 'recipetool create': should the code be centralised into lib/oe and called in both places, or should recipetool write the skeleton of the recipe and then call the update task. - Port more recipes. Crucible works, etcd is proving tricky as I don't really understand Go very well. Signed-off-by: Ross Burton --- .../go-mod-update-modules.bbclass | 130 ++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 meta/classes-recipe/go-mod-update-modules.bbclass diff --git a/meta/classes-recipe/go-mod-update-modules.bbclass b/meta/classes-recipe/go-mod-update-modules.bbclass new file mode 100644 index 00000000000..3253f7f0a26 --- /dev/null +++ b/meta/classes-recipe/go-mod-update-modules.bbclass @@ -0,0 +1,130 @@ +addtask do_update_modules after do_configure +do_update_modules[nostamp] = "1" +do_update_modules[network] = "1" + +python do_update_modules() { + import subprocess, tempfile, json, re + from oe.license_finder import find_licenses + + def unescape_path(path): + """Unescape capital letters using exclamation points.""" + return re.sub(r'!([a-z])', lambda m: m.group(1).upper(), path) + + def fold_uri(uri): + """Fold URI for sorting shorter module paths before longer.""" + return uri.replace(';', ' ').replace('/', '!') + + # TODO duplicated in recipetools + def tidy_licenses(value): + """Flat, split and sort licenses""" + from oe.license import flattened_licenses + def _choose(a, b): + str_a, str_b = sorted((" & ".join(a), " & ".join(b)), key=str.casefold) + return ["(%s | %s)" % (str_a, str_b)] + if not isinstance(value, str): + value = " & ".join(value) + return sorted(list(set(flattened_licenses(value, _choose))), key=str.casefold) + + bpn = d.getVar("BPN") + thisdir = d.getVar("THISDIR") + + mod_dir = tempfile.mkdtemp(prefix='go-mod-') + bb.warn("using tmp mod %s" % mod_dir) + #d.setVar('GOMODCACHE', mod_dir) + env = dict(os.environ, GOMODCACHE=mod_dir) + + # TODO this feels magic + source = d.expand("${WORKDIR}/${GO_SRCURI_DESTSUFFIX}") + + # TODO is this needed in the refresh case? + output = subprocess.check_output(("go", "mod", "edit", "-json"), cwd=source, env=env, text=True) + go_mod = json.loads(output) + + output = subprocess.check_output(("go", "list", "-json=Dir,Module", "-deps", f"{go_mod['Module']['Path']}/..."), cwd=source, env=env, text=True) + + # + # Licenses + # + + # The output of this isn't actually valid JSON, but a series of dicts. + # Wrap in [] and join the dicts with , + # Very frustrating that the json parser in python can't repeatedly + # parse from a stream. + pkgs = json.loads('[' + output.replace('}\n{', '},\n{') + ']') + # Collect licenses for the dependencies. + licenses = set() + lic_files_chksum = [] + lic_files = {} + for pkg in pkgs: + # TODO: If the package is in a subdirectory with its own license + # files then report those istead of the license files found in the + # module root directory. + mod = pkg.get('Module', None) + if not mod or mod.get('Main', False): + continue + path = os.path.relpath(mod['Dir'], mod_dir) + for license_name, license_file, license_md5 in find_licenses(mod['Dir'], d): + lic_files[os.path.join(path, license_file)] = (license_name, license_md5) + + for lic_file in lic_files: + license_name, license_md5 = lic_files[lic_file] + if license_name == "Unknown": + bb.warn(f"Unknown license: {lic_file} {license_md5}") + + licenses.add(lic_files[lic_file][0]) + lic_files_chksum.append( + f'file://pkg/mod/{lic_file};md5={license_md5}') + + licenses_filename = os.path.join(thisdir, f"{bpn}-licenses.inc") + with open(licenses_filename, "w") as f: + f.write(f'LICENSE += "& {" & ".join(tidy_licenses(licenses))}"\n\n') + f.write('LIC_FILES_CHKSUM += "\\\n') + for lic in sorted(lic_files_chksum, key=fold_uri): + f.write(' ' + lic + ' \\\n') + f.write('"\n') + + # + # Sources + # + + # Collect the module cache files downloaded by the go list command as + # the go list command knows best what the go list command needs and it + # needs more files in the module cache than the go install command as + # it doesn't do the dependency pruning mentioned in the Go module + # reference, https://go.dev/ref/mod, for go 1.17 or higher. + src_uris = [] + downloaddir = os.path.join(mod_dir, 'cache', 'download') + for dirpath, _, filenames in os.walk(downloaddir): + # We want to process files under @v directories + path, base = os.path.split(os.path.relpath(dirpath, downloaddir)) + if base != '@v': + continue + + path = unescape_path(path) + zipver = None + for name in filenames: + ver, ext = os.path.splitext(name) + if ext == '.zip': + chksum = bb.utils.sha256_file(os.path.join(dirpath, name)) + src_uris.append(f'gomod://{path};version={ver};sha256sum={chksum}') + zipver = ver + break + for name in filenames: + ver, ext = os.path.splitext(name) + if ext == '.mod' and ver != zipver: + chksum = bb.utils.sha256_file(os.path.join(dirpath, name)) + src_uris.append(f'gomod://{path};version={ver};mod=1;sha256sum={chksum}') + + + go_mods_filename = os.path.join(thisdir, f"{bpn}-go-mods.inc") + with open(go_mods_filename, "w") as f: + f.write('SRC_URI += "\\\n') + for uri in sorted(src_uris, key=fold_uri): + f.write(' ' + uri + ' \\\n') + f.write('"\n') + + subprocess.check_output(("go", "clean", "-modcache"), cwd=source, env=env, text=True) +} + +# This doesn't work as we need to wipe the inc files first so we don't try looking for LICENSE files that don't yet exist +# RECIPE_UPGRADE_EXTRA_TASKS += "do_update_modules"