From patchwork Fri Jun 13 13:16:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id A121AC71136 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.10090.1749820582910912706 for ; Fri, 13 Jun 2025 06:16:23 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 288331C0A for ; Fri, 13 Jun 2025 06:16:02 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 20D423F59E for ; Fri, 13 Jun 2025 06:16:22 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 01/10] default-distrovars: set an empty default for LICENSE_PATH Date: Fri, 13 Jun 2025 14:16:10 +0100 Message-ID: <20250613131620.221912-1-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218601 This variable is a list of paths that contain extra license texts. It doesn't have a default so can be unset. Signed-off-by: Ross Burton --- meta/conf/distro/include/default-distrovars.inc | 1 + 1 file changed, 1 insertion(+) diff --git a/meta/conf/distro/include/default-distrovars.inc b/meta/conf/distro/include/default-distrovars.inc index 85835c4c617..9ea3b5414c7 100644 --- a/meta/conf/distro/include/default-distrovars.inc +++ b/meta/conf/distro/include/default-distrovars.inc @@ -36,6 +36,7 @@ COMMERCIAL_VIDEO_PLUGINS ?= "" # COMMERCIAL_VIDEO_PLUGINS ?= "gst-plugins-ugly-mpeg2dec gst-plugins-ugly-mpegstream gst-plugins-bad-mpegvideoparse" # Set of common licenses used for license.bbclass COMMON_LICENSE_DIR ??= "${COREBASE}/meta/files/common-licenses" +LICENSE_PATH ??= "" BB_GENERATE_MIRROR_TARBALLS ??= "0" From patchwork Fri Jun 13 13:16:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF657C71155 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.10091.1749820583688500431 for ; Fri, 13 Jun 2025 06:16:23 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C3BE21C0A for ; Fri, 13 Jun 2025 06:16:02 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id C80113F59E for ; Fri, 13 Jun 2025 06:16:22 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 02/10] lib/oe/license_finder: extract license finding code from recipetool Date: Fri, 13 Jun 2025 14:16:11 +0100 Message-ID: <20250613131620.221912-2-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218602 This code is 99% identical to the original code in recipetool/create.py, but with two minor changes: - The implicit recipetool logger is changed to an explicit logger - The CSV of license hashes is moved to meta/files/ Signed-off-by: Ross Burton --- meta/files/license-hashes.csv | 37 ++++++ meta/lib/oe/license_finder.py | 242 ++++++++++++++++++++++++++++++++++ 2 files changed, 279 insertions(+) create mode 100644 meta/files/license-hashes.csv create mode 100644 meta/lib/oe/license_finder.py diff --git a/meta/files/license-hashes.csv b/meta/files/license-hashes.csv new file mode 100644 index 00000000000..80851111b31 --- /dev/null +++ b/meta/files/license-hashes.csv @@ -0,0 +1,37 @@ +0636e73ff0215e8d672dc4c32c317bb3,GPL-2.0-only +12f884d2ae1ff87c09e5b7ccc2c4ca7e,GPL-2.0-only +18810669f13b87348459e611d31ab760,GPL-2.0-only +252890d9eee26aab7b432e8b8a616475,LGPL-2.0-only +2d5025d4aa3495befef8f17206a5b0a1,LGPL-2.1-only +3214f080875748938ba060314b4f727d,LGPL-2.0-only +385c55653886acac3821999a3ccd17b3,Artistic-1.0 | GPL-2.0-only +393a5ca445f6965873eca0259a17f833,GPL-2.0-only +3b83ef96387f14655fc854ddc3c6bd57,Apache-2.0 +3bf50002aefd002f49e7bb854063f7e7,LGPL-2.0-only +4325afd396febcb659c36b49533135d4,GPL-2.0-only +4fbd65380cdd255951079008b364516c,LGPL-2.1-only +54c7042be62e169199200bc6477f04d1,BSD-3-Clause +55ca817ccb7d5b5b66355690e9abc605,LGPL-2.0-only +59530bdf33659b29e73d4adb9f9f6552,GPL-2.0-only +5f30f0716dfdd0d91eb439ebec522ec2,LGPL-2.0-only +6a6a8e020838b23406c81b19c1d46df6,LGPL-3.0-only +751419260aa954499f7abaabaa882bbe,GPL-2.0-only +7fbc338309ac38fefcd64b04bb903e34,LGPL-2.1-only +8ca43cbc842c2336e835926c2166c28b,GPL-2.0-only +94d55d512a9ba36caa9b7df079bae19f,GPL-2.0-only +9ac2e7cff1ddaf48b6eab6028f23ef88,GPL-2.0-only +9f604d8a4f8e74f4f5140845a21b6674,LGPL-2.0-only +a6f89e2100d9b6cdffcea4f398e37343,LGPL-2.1-only +b234ee4d69f5fce4486a80fdaf4a4263,GPL-2.0-only +bbb461211a33b134d42ed5ee802b37ff,LGPL-2.1-only +bfe1f75d606912a4111c90743d6c7325,MPL-1.1-only +c93c0550bd3173f4504b2cbd8991e50b,GPL-2.0-only +d32239bcb673463ab874e80d47fae504,GPL-3.0-only +d7810fab7487fb0aad327b76f1be7cd7,GPL-2.0-only +d8045f3b8f929c1cb29a1e3fd737b499,LGPL-2.1-only +db979804f025cf55aabec7129cb671ed,LGPL-2.0-only +eb723b61539feef013de476e68b5c50a,GPL-2.0-only +ebb5c50ab7cab4baeffba14977030c07,GPL-2.0-only +f27defe1e96c2e1ecd4e0c9be8967949,GPL-3.0-only +fad9b3332be894bab9bc501572864b29,LGPL-2.1-only +fbc093901857fcd118f065f900982c24,LGPL-2.1-only diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py new file mode 100644 index 00000000000..5b09059576e --- /dev/null +++ b/meta/lib/oe/license_finder.py @@ -0,0 +1,242 @@ +# +# Copyright OpenEmbedded Contributors +# +# SPDX-License-Identifier: GPL-2.0-only +# + +import fnmatch +import hashlib +import logging +import os +import re + +import bb + +logger = logging.getLogger("BitBake.OE.LicenseFinder") + +def get_license_md5sums(d, static_only=False, linenumbers=False): + import bb.utils + import csv + md5sums = {} + if not static_only and not linenumbers: + # Gather md5sums of license files in common license dir + commonlicdir = d.getVar('COMMON_LICENSE_DIR') + for fn in os.listdir(commonlicdir): + md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) + md5sums[md5value] = fn + + # The following were extracted from common values in various recipes + # (double checking the license against the license file itself, not just + # the LICENSE value in the recipe) + + # Read license md5sums from csv file + for path in d.getVar('BBPATH').split(':'): + csv_path = os.path.join(path, 'files', 'license-hashes.csv') + if os.path.isfile(csv_path): + with open(csv_path, newline='') as csv_file: + fieldnames = ['md5sum', 'license', 'beginline', 'endline', 'md5'] + reader = csv.DictReader(csv_file, delimiter=',', fieldnames=fieldnames) + for row in reader: + if linenumbers: + md5sums[row['md5sum']] = ( + row['license'], row['beginline'], row['endline'], row['md5']) + else: + md5sums[row['md5sum']] = row['license'] + + return md5sums + + +def crunch_known_licenses(d): + ''' + Calculate the MD5 checksums for the crunched versions of all common + licenses. Also add additional known checksums. + ''' + + crunched_md5sums = {} + + # common licenses + crunched_md5sums['ad4e9d34a2e966dfe9837f18de03266d'] = 'GFDL-1.1-only' + crunched_md5sums['d014fb11a34eb67dc717fdcfc97e60ed'] = 'GFDL-1.2-only' + crunched_md5sums['e020ca655b06c112def28e597ab844f1'] = 'GFDL-1.3-only' + + # The following two were gleaned from the "forever" npm package + crunched_md5sums['0a97f8e4cbaf889d6fa51f84b89a79f6'] = 'ISC' + # https://github.com/waffle-gl/waffle/blob/master/LICENSE.txt + crunched_md5sums['50fab24ce589d69af8964fdbfe414c60'] = 'BSD-2-Clause' + # https://github.com/spigwitmer/fakeds1963s/blob/master/LICENSE + crunched_md5sums['88a4355858a1433fea99fae34a44da88'] = 'GPL-2.0-only' + # http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt + crunched_md5sums['063b5c3ebb5f3aa4c85a2ed18a31fbe7'] = 'GPL-2.0-only' + # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv2.1 + crunched_md5sums['7f5202f4d44ed15dcd4915f5210417d8'] = 'LGPL-2.1-only' + # unixODBC-2.3.4 COPYING + crunched_md5sums['3debde09238a8c8e1f6a847e1ec9055b'] = 'LGPL-2.1-only' + # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv3 + crunched_md5sums['f90c613c51aa35da4d79dd55fc724ceb'] = 'LGPL-3.0-only' + # https://raw.githubusercontent.com/eclipse/mosquitto/v1.4.14/epl-v10 + crunched_md5sums['efe2cb9a35826992b9df68224e3c2628'] = 'EPL-1.0' + + # https://raw.githubusercontent.com/jquery/esprima/3.1.3/LICENSE.BSD + crunched_md5sums['80fa7b56a28e8c902e6af194003220a5'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/npm/npm-install-checks/master/LICENSE + crunched_md5sums['e659f77bfd9002659e112d0d3d59b2c1'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/silverwind/default-gateway/4.2.0/LICENSE + crunched_md5sums['4c641f2d995c47f5cb08bdb4b5b6ea05'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/tad-lispy/node-damerau-levenshtein/v1.0.5/LICENSE + crunched_md5sums['2b8c039b2b9a25f0feb4410c4542d346'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/terser/terser/v3.17.0/LICENSE + crunched_md5sums['8bd23871802951c9ad63855151204c2c'] = 'BSD-2-Clause' + # https://raw.githubusercontent.com/alexei/sprintf.js/1.0.3/LICENSE + crunched_md5sums['008c22318c8ea65928bf730ddd0273e3'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/Caligatio/jsSHA/v3.2.0/LICENSE + crunched_md5sums['0e46634a01bfef056892949acaea85b1'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/d3/d3-path/v1.0.9/LICENSE + crunched_md5sums['b5f72aef53d3b2b432702c30b0215666'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/feross/ieee754/v1.1.13/LICENSE + crunched_md5sums['a39327c997c20da0937955192d86232d'] = 'BSD-3-Clause' + # https://raw.githubusercontent.com/joyent/node-extsprintf/v1.3.0/LICENSE + crunched_md5sums['721f23a96ff4161ca3a5f071bbe18108'] = 'MIT' + # https://raw.githubusercontent.com/pvorb/clone/v0.2.0/LICENSE + crunched_md5sums['b376d29a53c9573006b9970709231431'] = 'MIT' + # https://raw.githubusercontent.com/andris9/encoding/v0.1.12/LICENSE + crunched_md5sums['85d8a977ee9d7c5ab4ac03c9b95431c4'] = 'MIT-0' + # https://raw.githubusercontent.com/faye/websocket-driver-node/0.7.3/LICENSE.md + crunched_md5sums['b66384e7137e41a9b1904ef4d39703b6'] = 'Apache-2.0' + # https://raw.githubusercontent.com/less/less.js/v4.1.1/LICENSE + crunched_md5sums['b27575459e02221ccef97ec0bfd457ae'] = 'Apache-2.0' + # https://raw.githubusercontent.com/microsoft/TypeScript/v3.5.3/LICENSE.txt + crunched_md5sums['a54a1a6a39e7f9dbb4a23a42f5c7fd1c'] = 'Apache-2.0' + # https://raw.githubusercontent.com/request/request/v2.87.0/LICENSE + crunched_md5sums['1034431802e57486b393d00c5d262b8a'] = 'Apache-2.0' + # https://raw.githubusercontent.com/dchest/tweetnacl-js/v0.14.5/LICENSE + crunched_md5sums['75605e6bdd564791ab698fca65c94a4f'] = 'Unlicense' + # https://raw.githubusercontent.com/stackgl/gl-mat3/v2.0.0/LICENSE.md + crunched_md5sums['75512892d6f59dddb6d1c7e191957e9c'] = 'Zlib' + + commonlicdir = d.getVar('COMMON_LICENSE_DIR') + for fn in sorted(os.listdir(commonlicdir)): + md5value, lictext = crunch_license(os.path.join(commonlicdir, fn)) + if md5value not in crunched_md5sums: + crunched_md5sums[md5value] = fn + elif fn != crunched_md5sums[md5value]: + bb.debug(2, "crunched_md5sums['%s'] is already set to '%s' rather than '%s'" % (md5value, crunched_md5sums[md5value], fn)) + else: + bb.debug(2, "crunched_md5sums['%s'] is already set to '%s'" % (md5value, crunched_md5sums[md5value])) + + return crunched_md5sums + + +def crunch_license(licfile): + ''' + Remove non-material text from a license file and then calculate its + md5sum. This works well for licenses that contain a copyright statement, + but is also a useful way to handle people's insistence upon reformatting + the license text slightly (with no material difference to the text of the + license). + ''' + + import oe.utils + + # Note: these are carefully constructed! + license_title_re = re.compile(r'^#*\(? *(This is )?([Tt]he )?.{0,15} ?[Ll]icen[sc]e( \(.{1,10}\))?\)?[:\.]? ?#*$') + license_statement_re = re.compile(r'^((This (project|software)|.{1,10}) is( free software)? (released|licen[sc]ed)|(Released|Licen[cs]ed)) under the .{1,10} [Ll]icen[sc]e:?$') + copyright_re = re.compile(r'^ *[#\*]* *(Modified work |MIT LICENSED )?Copyright ?(\([cC]\))? .*$') + disclaimer_re = re.compile(r'^ *\*? ?All [Rr]ights [Rr]eserved\.$') + email_re = re.compile(r'^.*<[\w\.-]*@[\w\.\-]*>$') + header_re = re.compile(r'^(\/\**!?)? ?[\-=\*]* ?(\*\/)?$') + tag_re = re.compile(r'^ *@?\(?([Ll]icense|MIT)\)?$') + url_re = re.compile(r'^ *[#\*]* *https?:\/\/[\w\.\/\-]+$') + + lictext = [] + with open(licfile, 'r', errors='surrogateescape') as f: + for line in f: + # Drop opening statements + if copyright_re.match(line): + continue + elif disclaimer_re.match(line): + continue + elif email_re.match(line): + continue + elif header_re.match(line): + continue + elif tag_re.match(line): + continue + elif url_re.match(line): + continue + elif license_title_re.match(line): + continue + elif license_statement_re.match(line): + continue + # Strip comment symbols + line = line.replace('*', '') \ + .replace('#', '') + # Unify spelling + line = line.replace('sub-license', 'sublicense') + # Squash spaces + line = oe.utils.squashspaces(line.strip()) + # Replace smart quotes, double quotes and backticks with single quotes + line = line.replace(u"\u2018", "'").replace(u"\u2019", "'").replace(u"\u201c","'").replace(u"\u201d", "'").replace('"', '\'').replace('`', '\'') + # Unify brackets + line = line.replace("{", "[").replace("}", "]") + if line: + lictext.append(line) + + m = hashlib.md5() + try: + m.update(' '.join(lictext).encode('utf-8')) + md5val = m.hexdigest() + except UnicodeEncodeError: + md5val = None + lictext = '' + return md5val, lictext + + +def find_license_files(srctree): + licspecs = ['*LICEN[CS]E*', 'COPYING*', '*[Ll]icense*', 'LEGAL*', '[Ll]egal*', '*GPL*', 'README.lic*', 'COPYRIGHT*', '[Cc]opyright*', 'e[dp]l-v10'] + skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go") + licfiles = [] + for root, dirs, files in os.walk(srctree): + for fn in files: + if fn.endswith(skip_extensions): + continue + for spec in licspecs: + if fnmatch.fnmatch(fn, spec): + fullpath = os.path.join(root, fn) + if not fullpath in licfiles: + licfiles.append(fullpath) + + return licfiles + + +def match_licenses(licfiles, srctree, d): + import bb + md5sums = get_license_md5sums(d) + + crunched_md5sums = crunch_known_licenses(d) + + licenses = [] + for licfile in sorted(licfiles): + resolved_licfile = d.expand(licfile) + md5value = bb.utils.md5_file(resolved_licfile) + license = md5sums.get(md5value, None) + if not license: + crunched_md5, lictext = crunch_license(resolved_licfile) + license = crunched_md5sums.get(crunched_md5, None) + if lictext and not license: + license = 'Unknown' + logger.info("Please add the following line for '%s' to a 'license-hashes.csv' " \ + "and replace `Unknown` with the license:\n" \ + "%s,Unknown" % (os.path.relpath(licfile, srctree + "/.."), md5value)) + if license: + licenses.append((license, os.path.relpath(licfile, srctree), md5value)) + + return licenses + + +def find_licenses(srctree, d): + licfiles = find_license_files(srctree) + licenses = match_licenses(licfiles, srctree, d) + + # FIXME should we grab at least one source file with a license header and add that too? + + return licenses From patchwork Fri Jun 13 13:16:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6715C7115B for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10177.1749820584156868305 for ; Fri, 13 Jun 2025 06:16:24 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C7341CDD for ; Fri, 13 Jun 2025 06:16:03 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 70BE23F59E for ; Fri, 13 Jun 2025 06:16:23 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 03/10] recipetool: use oe.license_finder Date: Fri, 13 Jun 2025 14:16:12 +0100 Message-ID: <20250613131620.221912-3-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218603 Delete the now redundant code, and import oe.license_finder instead. Signed-off-by: Ross Burton --- scripts/lib/recipetool/create.py | 225 +-------------------------- scripts/lib/recipetool/create_npm.py | 3 +- scripts/lib/recipetool/licenses.csv | 37 ----- 3 files changed, 3 insertions(+), 262 deletions(-) delete mode 100644 scripts/lib/recipetool/licenses.csv diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py index 94d52d60772..3c6ef6719fa 100644 --- a/scripts/lib/recipetool/create.py +++ b/scripts/lib/recipetool/create.py @@ -18,6 +18,7 @@ from urllib.parse import urlparse, urldefrag, urlsplit import hashlib import bb.fetch2 logger = logging.getLogger('recipetool') +from oe.license_finder import find_licenses tinfoil = None plugins = None @@ -1040,230 +1041,6 @@ def handle_license_vars(srctree, lines_before, handled, extravalues, d): handled.append(('license', licvalues)) return licvalues -def get_license_md5sums(d, static_only=False, linenumbers=False): - import bb.utils - import csv - md5sums = {} - if not static_only and not linenumbers: - # Gather md5sums of license files in common license dir - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in os.listdir(commonlicdir): - md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) - md5sums[md5value] = fn - - # The following were extracted from common values in various recipes - # (double checking the license against the license file itself, not just - # the LICENSE value in the recipe) - - # Read license md5sums from csv file - scripts_path = os.path.dirname(os.path.realpath(__file__)) - for path in (d.getVar('BBPATH').split(':') - + [os.path.join(scripts_path, '..', '..')]): - csv_path = os.path.join(path, 'lib', 'recipetool', 'licenses.csv') - if os.path.isfile(csv_path): - with open(csv_path, newline='') as csv_file: - fieldnames = ['md5sum', 'license', 'beginline', 'endline', 'md5'] - reader = csv.DictReader(csv_file, delimiter=',', fieldnames=fieldnames) - for row in reader: - if linenumbers: - md5sums[row['md5sum']] = ( - row['license'], row['beginline'], row['endline'], row['md5']) - else: - md5sums[row['md5sum']] = row['license'] - - return md5sums - -def crunch_known_licenses(d): - ''' - Calculate the MD5 checksums for the crunched versions of all common - licenses. Also add additional known checksums. - ''' - - crunched_md5sums = {} - - # common licenses - crunched_md5sums['ad4e9d34a2e966dfe9837f18de03266d'] = 'GFDL-1.1-only' - crunched_md5sums['d014fb11a34eb67dc717fdcfc97e60ed'] = 'GFDL-1.2-only' - crunched_md5sums['e020ca655b06c112def28e597ab844f1'] = 'GFDL-1.3-only' - - # The following two were gleaned from the "forever" npm package - crunched_md5sums['0a97f8e4cbaf889d6fa51f84b89a79f6'] = 'ISC' - # https://github.com/waffle-gl/waffle/blob/master/LICENSE.txt - crunched_md5sums['50fab24ce589d69af8964fdbfe414c60'] = 'BSD-2-Clause' - # https://github.com/spigwitmer/fakeds1963s/blob/master/LICENSE - crunched_md5sums['88a4355858a1433fea99fae34a44da88'] = 'GPL-2.0-only' - # http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt - crunched_md5sums['063b5c3ebb5f3aa4c85a2ed18a31fbe7'] = 'GPL-2.0-only' - # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv2.1 - crunched_md5sums['7f5202f4d44ed15dcd4915f5210417d8'] = 'LGPL-2.1-only' - # unixODBC-2.3.4 COPYING - crunched_md5sums['3debde09238a8c8e1f6a847e1ec9055b'] = 'LGPL-2.1-only' - # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv3 - crunched_md5sums['f90c613c51aa35da4d79dd55fc724ceb'] = 'LGPL-3.0-only' - # https://raw.githubusercontent.com/eclipse/mosquitto/v1.4.14/epl-v10 - crunched_md5sums['efe2cb9a35826992b9df68224e3c2628'] = 'EPL-1.0' - - # https://raw.githubusercontent.com/jquery/esprima/3.1.3/LICENSE.BSD - crunched_md5sums['80fa7b56a28e8c902e6af194003220a5'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/npm/npm-install-checks/master/LICENSE - crunched_md5sums['e659f77bfd9002659e112d0d3d59b2c1'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/silverwind/default-gateway/4.2.0/LICENSE - crunched_md5sums['4c641f2d995c47f5cb08bdb4b5b6ea05'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/tad-lispy/node-damerau-levenshtein/v1.0.5/LICENSE - crunched_md5sums['2b8c039b2b9a25f0feb4410c4542d346'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/terser/terser/v3.17.0/LICENSE - crunched_md5sums['8bd23871802951c9ad63855151204c2c'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/alexei/sprintf.js/1.0.3/LICENSE - crunched_md5sums['008c22318c8ea65928bf730ddd0273e3'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/Caligatio/jsSHA/v3.2.0/LICENSE - crunched_md5sums['0e46634a01bfef056892949acaea85b1'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/d3/d3-path/v1.0.9/LICENSE - crunched_md5sums['b5f72aef53d3b2b432702c30b0215666'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/feross/ieee754/v1.1.13/LICENSE - crunched_md5sums['a39327c997c20da0937955192d86232d'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/joyent/node-extsprintf/v1.3.0/LICENSE - crunched_md5sums['721f23a96ff4161ca3a5f071bbe18108'] = 'MIT' - # https://raw.githubusercontent.com/pvorb/clone/v0.2.0/LICENSE - crunched_md5sums['b376d29a53c9573006b9970709231431'] = 'MIT' - # https://raw.githubusercontent.com/andris9/encoding/v0.1.12/LICENSE - crunched_md5sums['85d8a977ee9d7c5ab4ac03c9b95431c4'] = 'MIT-0' - # https://raw.githubusercontent.com/faye/websocket-driver-node/0.7.3/LICENSE.md - crunched_md5sums['b66384e7137e41a9b1904ef4d39703b6'] = 'Apache-2.0' - # https://raw.githubusercontent.com/less/less.js/v4.1.1/LICENSE - crunched_md5sums['b27575459e02221ccef97ec0bfd457ae'] = 'Apache-2.0' - # https://raw.githubusercontent.com/microsoft/TypeScript/v3.5.3/LICENSE.txt - crunched_md5sums['a54a1a6a39e7f9dbb4a23a42f5c7fd1c'] = 'Apache-2.0' - # https://raw.githubusercontent.com/request/request/v2.87.0/LICENSE - crunched_md5sums['1034431802e57486b393d00c5d262b8a'] = 'Apache-2.0' - # https://raw.githubusercontent.com/dchest/tweetnacl-js/v0.14.5/LICENSE - crunched_md5sums['75605e6bdd564791ab698fca65c94a4f'] = 'Unlicense' - # https://raw.githubusercontent.com/stackgl/gl-mat3/v2.0.0/LICENSE.md - crunched_md5sums['75512892d6f59dddb6d1c7e191957e9c'] = 'Zlib' - - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in sorted(os.listdir(commonlicdir)): - md5value, lictext = crunch_license(os.path.join(commonlicdir, fn)) - if md5value not in crunched_md5sums: - crunched_md5sums[md5value] = fn - elif fn != crunched_md5sums[md5value]: - bb.debug(2, "crunched_md5sums['%s'] is already set to '%s' rather than '%s'" % (md5value, crunched_md5sums[md5value], fn)) - else: - bb.debug(2, "crunched_md5sums['%s'] is already set to '%s'" % (md5value, crunched_md5sums[md5value])) - - return crunched_md5sums - -def crunch_license(licfile): - ''' - Remove non-material text from a license file and then calculate its - md5sum. This works well for licenses that contain a copyright statement, - but is also a useful way to handle people's insistence upon reformatting - the license text slightly (with no material difference to the text of the - license). - ''' - - import oe.utils - - # Note: these are carefully constructed! - license_title_re = re.compile(r'^#*\(? *(This is )?([Tt]he )?.{0,15} ?[Ll]icen[sc]e( \(.{1,10}\))?\)?[:\.]? ?#*$') - license_statement_re = re.compile(r'^((This (project|software)|.{1,10}) is( free software)? (released|licen[sc]ed)|(Released|Licen[cs]ed)) under the .{1,10} [Ll]icen[sc]e:?$') - copyright_re = re.compile(r'^ *[#\*]* *(Modified work |MIT LICENSED )?Copyright ?(\([cC]\))? .*$') - disclaimer_re = re.compile(r'^ *\*? ?All [Rr]ights [Rr]eserved\.$') - email_re = re.compile(r'^.*<[\w\.-]*@[\w\.\-]*>$') - header_re = re.compile(r'^(\/\**!?)? ?[\-=\*]* ?(\*\/)?$') - tag_re = re.compile(r'^ *@?\(?([Ll]icense|MIT)\)?$') - url_re = re.compile(r'^ *[#\*]* *https?:\/\/[\w\.\/\-]+$') - - lictext = [] - with open(licfile, 'r', errors='surrogateescape') as f: - for line in f: - # Drop opening statements - if copyright_re.match(line): - continue - elif disclaimer_re.match(line): - continue - elif email_re.match(line): - continue - elif header_re.match(line): - continue - elif tag_re.match(line): - continue - elif url_re.match(line): - continue - elif license_title_re.match(line): - continue - elif license_statement_re.match(line): - continue - # Strip comment symbols - line = line.replace('*', '') \ - .replace('#', '') - # Unify spelling - line = line.replace('sub-license', 'sublicense') - # Squash spaces - line = oe.utils.squashspaces(line.strip()) - # Replace smart quotes, double quotes and backticks with single quotes - line = line.replace(u"\u2018", "'").replace(u"\u2019", "'").replace(u"\u201c","'").replace(u"\u201d", "'").replace('"', '\'').replace('`', '\'') - # Unify brackets - line = line.replace("{", "[").replace("}", "]") - if line: - lictext.append(line) - - m = hashlib.md5() - try: - m.update(' '.join(lictext).encode('utf-8')) - md5val = m.hexdigest() - except UnicodeEncodeError: - md5val = None - lictext = '' - return md5val, lictext - -def find_license_files(srctree): - licspecs = ['*LICEN[CS]E*', 'COPYING*', '*[Ll]icense*', 'LEGAL*', '[Ll]egal*', '*GPL*', 'README.lic*', 'COPYRIGHT*', '[Cc]opyright*', 'e[dp]l-v10'] - skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go") - licfiles = [] - for root, dirs, files in os.walk(srctree): - for fn in files: - if fn.endswith(skip_extensions): - continue - for spec in licspecs: - if fnmatch.fnmatch(fn, spec): - fullpath = os.path.join(root, fn) - if not fullpath in licfiles: - licfiles.append(fullpath) - - return licfiles - -def match_licenses(licfiles, srctree, d): - import bb - md5sums = get_license_md5sums(d) - - crunched_md5sums = crunch_known_licenses(d) - - licenses = [] - for licfile in sorted(licfiles): - resolved_licfile = d.expand(licfile) - md5value = bb.utils.md5_file(resolved_licfile) - license = md5sums.get(md5value, None) - if not license: - crunched_md5, lictext = crunch_license(resolved_licfile) - license = crunched_md5sums.get(crunched_md5, None) - if lictext and not license: - license = 'Unknown' - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ - "and replace `Unknown` with the license:\n" \ - "%s,Unknown" % (os.path.relpath(licfile, srctree + "/.."), md5value)) - if license: - licenses.append((license, os.path.relpath(licfile, srctree), md5value)) - - return licenses - -def find_licenses(srctree, d): - licfiles = find_license_files(srctree) - licenses = match_licenses(licfiles, srctree, d) - - # FIXME should we grab at least one source file with a license header and add that too? - - return licenses - def split_pkg_licenses(licvalues, packages, outlines, fallback_licenses=None, pn='${PN}'): """ Given a list of (license, path, md5sum) as returned by match_licenses(), diff --git a/scripts/lib/recipetool/create_npm.py b/scripts/lib/recipetool/create_npm.py index 3363a0e7ee8..8c4cdd52340 100644 --- a/scripts/lib/recipetool/create_npm.py +++ b/scripts/lib/recipetool/create_npm.py @@ -15,8 +15,9 @@ import bb from bb.fetch2.npm import NpmEnvironment from bb.fetch2.npm import npm_package from bb.fetch2.npmsw import foreach_dependencies +from oe.license_finder import match_licenses, find_license_files from recipetool.create import RecipeHandler -from recipetool.create import match_licenses, find_license_files, generate_common_licenses_chksums +from recipetool.create import generate_common_licenses_chksums from recipetool.create import split_pkg_licenses logger = logging.getLogger('recipetool') diff --git a/scripts/lib/recipetool/licenses.csv b/scripts/lib/recipetool/licenses.csv deleted file mode 100644 index 80851111b31..00000000000 --- a/scripts/lib/recipetool/licenses.csv +++ /dev/null @@ -1,37 +0,0 @@ -0636e73ff0215e8d672dc4c32c317bb3,GPL-2.0-only -12f884d2ae1ff87c09e5b7ccc2c4ca7e,GPL-2.0-only -18810669f13b87348459e611d31ab760,GPL-2.0-only -252890d9eee26aab7b432e8b8a616475,LGPL-2.0-only -2d5025d4aa3495befef8f17206a5b0a1,LGPL-2.1-only -3214f080875748938ba060314b4f727d,LGPL-2.0-only -385c55653886acac3821999a3ccd17b3,Artistic-1.0 | GPL-2.0-only -393a5ca445f6965873eca0259a17f833,GPL-2.0-only -3b83ef96387f14655fc854ddc3c6bd57,Apache-2.0 -3bf50002aefd002f49e7bb854063f7e7,LGPL-2.0-only -4325afd396febcb659c36b49533135d4,GPL-2.0-only -4fbd65380cdd255951079008b364516c,LGPL-2.1-only -54c7042be62e169199200bc6477f04d1,BSD-3-Clause -55ca817ccb7d5b5b66355690e9abc605,LGPL-2.0-only -59530bdf33659b29e73d4adb9f9f6552,GPL-2.0-only -5f30f0716dfdd0d91eb439ebec522ec2,LGPL-2.0-only -6a6a8e020838b23406c81b19c1d46df6,LGPL-3.0-only -751419260aa954499f7abaabaa882bbe,GPL-2.0-only -7fbc338309ac38fefcd64b04bb903e34,LGPL-2.1-only -8ca43cbc842c2336e835926c2166c28b,GPL-2.0-only -94d55d512a9ba36caa9b7df079bae19f,GPL-2.0-only -9ac2e7cff1ddaf48b6eab6028f23ef88,GPL-2.0-only -9f604d8a4f8e74f4f5140845a21b6674,LGPL-2.0-only -a6f89e2100d9b6cdffcea4f398e37343,LGPL-2.1-only -b234ee4d69f5fce4486a80fdaf4a4263,GPL-2.0-only -bbb461211a33b134d42ed5ee802b37ff,LGPL-2.1-only -bfe1f75d606912a4111c90743d6c7325,MPL-1.1-only -c93c0550bd3173f4504b2cbd8991e50b,GPL-2.0-only -d32239bcb673463ab874e80d47fae504,GPL-3.0-only -d7810fab7487fb0aad327b76f1be7cd7,GPL-2.0-only -d8045f3b8f929c1cb29a1e3fd737b499,LGPL-2.1-only -db979804f025cf55aabec7129cb671ed,LGPL-2.0-only -eb723b61539feef013de476e68b5c50a,GPL-2.0-only -ebb5c50ab7cab4baeffba14977030c07,GPL-2.0-only -f27defe1e96c2e1ecd4e0c9be8967949,GPL-3.0-only -fad9b3332be894bab9bc501572864b29,LGPL-2.1-only -fbc093901857fcd118f065f900982c24,LGPL-2.1-only From patchwork Fri Jun 13 13:16:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED3B3C7115C for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10178.1749820584826668484 for ; Fri, 13 Jun 2025 06:16:24 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1D6471C0A for ; Fri, 13 Jun 2025 06:16:04 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 186F03F59E for ; Fri, 13 Jun 2025 06:16:23 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 04/10] oe/license_finder: skip .sh files when looking for licenses Date: Fri, 13 Jun 2025 14:16:13 +0100 Message-ID: <20250613131620.221912-4-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218604 Shell scripts are not licenses, so skip them. Signed-off-by: Ross Burton --- meta/lib/oe/license_finder.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index 5b09059576e..d5030c033e7 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -193,7 +193,7 @@ def crunch_license(licfile): def find_license_files(srctree): licspecs = ['*LICEN[CS]E*', 'COPYING*', '*[Ll]icense*', 'LEGAL*', '[Ll]egal*', '*GPL*', 'README.lic*', 'COPYRIGHT*', '[Cc]opyright*', 'e[dp]l-v10'] - skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go") + skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go", ".sh") licfiles = [] for root, dirs, files in os.walk(srctree): for fn in files: From patchwork Fri Jun 13 13:16:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6897C71151 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10179.1749820585421779591 for ; Fri, 13 Jun 2025 06:16:25 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B985B1C0A for ; Fri, 13 Jun 2025 06:16:04 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BDC903F59E for ; Fri, 13 Jun 2025 06:16:24 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 05/10] oe/license_finder: add first_only argument to find_licenses() Date: Fri, 13 Jun 2025 14:16:14 +0100 Message-ID: <20250613131620.221912-5-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218605 It may be desired to find only the "top-level" license file instead of every potential candidate, so add a first_only argument (defaulting to False to preserve existing behaviour) to return just the first license found. Signed-off-by: Ross Burton --- meta/lib/oe/license_finder.py | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index d5030c033e7..96961658e8b 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -191,12 +191,18 @@ def crunch_license(licfile): return md5val, lictext -def find_license_files(srctree): +def find_license_files(srctree, first_only=False): + """ + Search srctree for files that look like they could be licenses. + If first_only is True, only return the first file found. + """ licspecs = ['*LICEN[CS]E*', 'COPYING*', '*[Ll]icense*', 'LEGAL*', '[Ll]egal*', '*GPL*', 'README.lic*', 'COPYRIGHT*', '[Cc]opyright*', 'e[dp]l-v10'] skip_extensions = (".html", ".js", ".json", ".svg", ".ts", ".go", ".sh") licfiles = [] for root, dirs, files in os.walk(srctree): - for fn in files: + # Sort files so that LICENSE is before LICENSE.subcomponent, which is + # meaningful if first_only is set. + for fn in sorted(files): if fn.endswith(skip_extensions): continue for spec in licspecs: @@ -204,6 +210,8 @@ def find_license_files(srctree): fullpath = os.path.join(root, fn) if not fullpath in licfiles: licfiles.append(fullpath) + if first_only: + return licfiles return licfiles @@ -233,8 +241,8 @@ def match_licenses(licfiles, srctree, d): return licenses -def find_licenses(srctree, d): - licfiles = find_license_files(srctree) +def find_licenses(srctree, d, first_only=False): + licfiles = find_license_files(srctree, first_only) licenses = match_licenses(licfiles, srctree, d) # FIXME should we grab at least one source file with a license header and add that too? From patchwork Fri Jun 13 13:16:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF4C2C71153 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10181.1749820586055168348 for ; Fri, 13 Jun 2025 06:16:26 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6178E1C0A for ; Fri, 13 Jun 2025 06:16:05 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 65D813F59E for ; Fri, 13 Jun 2025 06:16:25 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 06/10] oe/license_finder: consolidate hash->license maps Date: Fri, 13 Jun 2025 14:16:15 +0100 Message-ID: <20250613131620.221912-6-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218606 There are two locations where mappings of checksums to license names are: the license-hashes.csv file and a hard-coded set of assignments in the code. There's no need for two, so remove the assignments and move the hashes into the CSV file. Signed-off-by: Ross Burton --- meta/files/license-hashes.csv | 41 ++++++++++++++++++++++++ meta/lib/oe/license_finder.py | 59 ----------------------------------- 2 files changed, 41 insertions(+), 59 deletions(-) diff --git a/meta/files/license-hashes.csv b/meta/files/license-hashes.csv index 80851111b31..906660b85df 100644 --- a/meta/files/license-hashes.csv +++ b/meta/files/license-hashes.csv @@ -1,37 +1,78 @@ +008c22318c8ea65928bf730ddd0273e3,BSD-3-Clause +02d4002e9171d41a8fad93aa7faf3956,BSD-3-Clause 0636e73ff0215e8d672dc4c32c317bb3,GPL-2.0-only +063b5c3ebb5f3aa4c85a2ed18a31fbe7,GPL-2.0-only +0a97f8e4cbaf889d6fa51f84b89a79f6,ISC +0ceb9ff3b27d3a8cf451ca3785d73c71,BSD-3-Clause & MIT +0dd48ae8103725bd7b401261520cdfbb,BSD-3-Clause +0e46634a01bfef056892949acaea85b1,BSD-3-Clause +1034431802e57486b393d00c5d262b8a,Apache-2.0 12f884d2ae1ff87c09e5b7ccc2c4ca7e,GPL-2.0-only 18810669f13b87348459e611d31ab760,GPL-2.0-only +19cbd64715b51267a47bf3750cc6a8a5,Apache-2.0 +201414b6610203caed355323b1ab3116,BSD-3-Clause 252890d9eee26aab7b432e8b8a616475,LGPL-2.0-only +2b8c039b2b9a25f0feb4410c4542d346,BSD-2-Clause 2d5025d4aa3495befef8f17206a5b0a1,LGPL-2.1-only 3214f080875748938ba060314b4f727d,LGPL-2.0-only 385c55653886acac3821999a3ccd17b3,Artistic-1.0 | GPL-2.0-only 393a5ca445f6965873eca0259a17f833,GPL-2.0-only 3b83ef96387f14655fc854ddc3c6bd57,Apache-2.0 3bf50002aefd002f49e7bb854063f7e7,LGPL-2.0-only +3debde09238a8c8e1f6a847e1ec9055b,LGPL-2.1-only 4325afd396febcb659c36b49533135d4,GPL-2.0-only +4c641f2d995c47f5cb08bdb4b5b6ea05,BSD-2-Clause +4ee4feb2b545c2231749e5c54ace343e,BSD-3-Clause 4fbd65380cdd255951079008b364516c,LGPL-2.1-only +50fab24ce589d69af8964fdbfe414c60,BSD-2-Clause 54c7042be62e169199200bc6477f04d1,BSD-3-Clause 55ca817ccb7d5b5b66355690e9abc605,LGPL-2.0-only 59530bdf33659b29e73d4adb9f9f6552,GPL-2.0-only +5d4950ecb7b26d2c5e4e7b4e0dd74707,BSD-3-Clause 5f30f0716dfdd0d91eb439ebec522ec2,LGPL-2.0-only 6a6a8e020838b23406c81b19c1d46df6,LGPL-3.0-only +721f23a96ff4161ca3a5f071bbe18108,MIT +7364d1e4653d3584181e9d22d81f275f,CC0-1.0 751419260aa954499f7abaabaa882bbe,GPL-2.0-only +75512892d6f59dddb6d1c7e191957e9c,Zlib +75605e6bdd564791ab698fca65c94a4f,Unlicense +7998cb338f82d15c0eff93b7004d272a,BSD-3-Clause +7f5202f4d44ed15dcd4915f5210417d8,LGPL-2.1-only 7fbc338309ac38fefcd64b04bb903e34,LGPL-2.1-only +80fa7b56a28e8c902e6af194003220a5,BSD-2-Clause +85d8a977ee9d7c5ab4ac03c9b95431c4,MIT-0 +88a4355858a1433fea99fae34a44da88,GPL-2.0-only +8bd23871802951c9ad63855151204c2c,BSD-2-Clause 8ca43cbc842c2336e835926c2166c28b,GPL-2.0-only +939cce1ec101726fa754e698ac871622,BSD-3-Clause 94d55d512a9ba36caa9b7df079bae19f,GPL-2.0-only 9ac2e7cff1ddaf48b6eab6028f23ef88,GPL-2.0-only 9f604d8a4f8e74f4f5140845a21b6674,LGPL-2.0-only +a39327c997c20da0937955192d86232d,BSD-3-Clause +a54a1a6a39e7f9dbb4a23a42f5c7fd1c,Apache-2.0 +a651bb3d8b1c412632e28823bb432b40,BSD-3-Clause a6f89e2100d9b6cdffcea4f398e37343,LGPL-2.1-only +ad4e9d34a2e966dfe9837f18de03266d,GFDL-1.1-only b234ee4d69f5fce4486a80fdaf4a4263,GPL-2.0-only +b27575459e02221ccef97ec0bfd457ae,Apache-2.0 +b376d29a53c9573006b9970709231431,MIT +b5f72aef53d3b2b432702c30b0215666,BSD-3-Clause +b66384e7137e41a9b1904ef4d39703b6,Apache-2.0 bbb461211a33b134d42ed5ee802b37ff,LGPL-2.1-only bfe1f75d606912a4111c90743d6c7325,MPL-1.1-only c93c0550bd3173f4504b2cbd8991e50b,GPL-2.0-only +d014fb11a34eb67dc717fdcfc97e60ed,GFDL-1.2-only +d0b68be4a2dc957aaf09144970bc6696,MIT d32239bcb673463ab874e80d47fae504,GPL-3.0-only d7810fab7487fb0aad327b76f1be7cd7,GPL-2.0-only d8045f3b8f929c1cb29a1e3fd737b499,LGPL-2.1-only db979804f025cf55aabec7129cb671ed,LGPL-2.0-only +e020ca655b06c112def28e597ab844f1,GFDL-1.3-only +e659f77bfd9002659e112d0d3d59b2c1,BSD-2-Clause eb723b61539feef013de476e68b5c50a,GPL-2.0-only ebb5c50ab7cab4baeffba14977030c07,GPL-2.0-only +efe2cb9a35826992b9df68224e3c2628,EPL-1.0 f27defe1e96c2e1ecd4e0c9be8967949,GPL-3.0-only +f90c613c51aa35da4d79dd55fc724ceb,LGPL-3.0-only fad9b3332be894bab9bc501572864b29,LGPL-2.1-only fbc093901857fcd118f065f900982c24,LGPL-2.1-only diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index 96961658e8b..097b324c585 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -54,65 +54,6 @@ def crunch_known_licenses(d): crunched_md5sums = {} - # common licenses - crunched_md5sums['ad4e9d34a2e966dfe9837f18de03266d'] = 'GFDL-1.1-only' - crunched_md5sums['d014fb11a34eb67dc717fdcfc97e60ed'] = 'GFDL-1.2-only' - crunched_md5sums['e020ca655b06c112def28e597ab844f1'] = 'GFDL-1.3-only' - - # The following two were gleaned from the "forever" npm package - crunched_md5sums['0a97f8e4cbaf889d6fa51f84b89a79f6'] = 'ISC' - # https://github.com/waffle-gl/waffle/blob/master/LICENSE.txt - crunched_md5sums['50fab24ce589d69af8964fdbfe414c60'] = 'BSD-2-Clause' - # https://github.com/spigwitmer/fakeds1963s/blob/master/LICENSE - crunched_md5sums['88a4355858a1433fea99fae34a44da88'] = 'GPL-2.0-only' - # http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt - crunched_md5sums['063b5c3ebb5f3aa4c85a2ed18a31fbe7'] = 'GPL-2.0-only' - # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv2.1 - crunched_md5sums['7f5202f4d44ed15dcd4915f5210417d8'] = 'LGPL-2.1-only' - # unixODBC-2.3.4 COPYING - crunched_md5sums['3debde09238a8c8e1f6a847e1ec9055b'] = 'LGPL-2.1-only' - # https://github.com/FFmpeg/FFmpeg/blob/master/COPYING.LGPLv3 - crunched_md5sums['f90c613c51aa35da4d79dd55fc724ceb'] = 'LGPL-3.0-only' - # https://raw.githubusercontent.com/eclipse/mosquitto/v1.4.14/epl-v10 - crunched_md5sums['efe2cb9a35826992b9df68224e3c2628'] = 'EPL-1.0' - - # https://raw.githubusercontent.com/jquery/esprima/3.1.3/LICENSE.BSD - crunched_md5sums['80fa7b56a28e8c902e6af194003220a5'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/npm/npm-install-checks/master/LICENSE - crunched_md5sums['e659f77bfd9002659e112d0d3d59b2c1'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/silverwind/default-gateway/4.2.0/LICENSE - crunched_md5sums['4c641f2d995c47f5cb08bdb4b5b6ea05'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/tad-lispy/node-damerau-levenshtein/v1.0.5/LICENSE - crunched_md5sums['2b8c039b2b9a25f0feb4410c4542d346'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/terser/terser/v3.17.0/LICENSE - crunched_md5sums['8bd23871802951c9ad63855151204c2c'] = 'BSD-2-Clause' - # https://raw.githubusercontent.com/alexei/sprintf.js/1.0.3/LICENSE - crunched_md5sums['008c22318c8ea65928bf730ddd0273e3'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/Caligatio/jsSHA/v3.2.0/LICENSE - crunched_md5sums['0e46634a01bfef056892949acaea85b1'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/d3/d3-path/v1.0.9/LICENSE - crunched_md5sums['b5f72aef53d3b2b432702c30b0215666'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/feross/ieee754/v1.1.13/LICENSE - crunched_md5sums['a39327c997c20da0937955192d86232d'] = 'BSD-3-Clause' - # https://raw.githubusercontent.com/joyent/node-extsprintf/v1.3.0/LICENSE - crunched_md5sums['721f23a96ff4161ca3a5f071bbe18108'] = 'MIT' - # https://raw.githubusercontent.com/pvorb/clone/v0.2.0/LICENSE - crunched_md5sums['b376d29a53c9573006b9970709231431'] = 'MIT' - # https://raw.githubusercontent.com/andris9/encoding/v0.1.12/LICENSE - crunched_md5sums['85d8a977ee9d7c5ab4ac03c9b95431c4'] = 'MIT-0' - # https://raw.githubusercontent.com/faye/websocket-driver-node/0.7.3/LICENSE.md - crunched_md5sums['b66384e7137e41a9b1904ef4d39703b6'] = 'Apache-2.0' - # https://raw.githubusercontent.com/less/less.js/v4.1.1/LICENSE - crunched_md5sums['b27575459e02221ccef97ec0bfd457ae'] = 'Apache-2.0' - # https://raw.githubusercontent.com/microsoft/TypeScript/v3.5.3/LICENSE.txt - crunched_md5sums['a54a1a6a39e7f9dbb4a23a42f5c7fd1c'] = 'Apache-2.0' - # https://raw.githubusercontent.com/request/request/v2.87.0/LICENSE - crunched_md5sums['1034431802e57486b393d00c5d262b8a'] = 'Apache-2.0' - # https://raw.githubusercontent.com/dchest/tweetnacl-js/v0.14.5/LICENSE - crunched_md5sums['75605e6bdd564791ab698fca65c94a4f'] = 'Unlicense' - # https://raw.githubusercontent.com/stackgl/gl-mat3/v2.0.0/LICENSE.md - crunched_md5sums['75512892d6f59dddb6d1c7e191957e9c'] = 'Zlib' - commonlicdir = d.getVar('COMMON_LICENSE_DIR') for fn in sorted(os.listdir(commonlicdir)): md5value, lictext = crunch_license(os.path.join(commonlicdir, fn)) From patchwork Fri Jun 13 13:16:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1488C71157 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10182.1749820586691947776 for ; Fri, 13 Jun 2025 06:16:26 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 095101C0A for ; Fri, 13 Jun 2025 06:16:06 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0D80E3F59E for ; Fri, 13 Jun 2025 06:16:25 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 07/10] oe/license_finder: remove unused arguments in get_license_md5sums Date: Fri, 13 Jun 2025 14:16:16 +0100 Message-ID: <20250613131620.221912-7-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218607 get_license_md5sums() has two optional arguments: - static_only: if set, don't checksum the licenses in COMMON_LICENSE_DIR - linenumbers: if set, the CSV file can contain begin/end/md5 values as used in LIC_FILES_CHKSUM. Neither of these are used and complicate the logic, so remove them. Signed-off-by: Ross Burton --- meta/lib/oe/license_finder.py | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index 097b324c585..be03e5d0846 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -14,16 +14,16 @@ import bb logger = logging.getLogger("BitBake.OE.LicenseFinder") -def get_license_md5sums(d, static_only=False, linenumbers=False): +def get_license_md5sums(d): import bb.utils import csv md5sums = {} - if not static_only and not linenumbers: - # Gather md5sums of license files in common license dir - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in os.listdir(commonlicdir): - md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) - md5sums[md5value] = fn + + # Gather md5sums of license files in common license dir + commonlicdir = d.getVar('COMMON_LICENSE_DIR') + for fn in os.listdir(commonlicdir): + md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) + md5sums[md5value] = fn # The following were extracted from common values in various recipes # (double checking the license against the license file itself, not just @@ -34,14 +34,9 @@ def get_license_md5sums(d, static_only=False, linenumbers=False): csv_path = os.path.join(path, 'files', 'license-hashes.csv') if os.path.isfile(csv_path): with open(csv_path, newline='') as csv_file: - fieldnames = ['md5sum', 'license', 'beginline', 'endline', 'md5'] - reader = csv.DictReader(csv_file, delimiter=',', fieldnames=fieldnames) + reader = csv.DictReader(csv_file, delimiter=',', fieldnames=['md5sum', 'license']) for row in reader: - if linenumbers: - md5sums[row['md5sum']] = ( - row['license'], row['beginline'], row['endline'], row['md5']) - else: - md5sums[row['md5sum']] = row['license'] + md5sums[row['md5sum']] = row['license'] return md5sums From patchwork Fri Jun 13 13:16:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id B23A1C71135 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10183.1749820587535028730 for ; Fri, 13 Jun 2025 06:16:27 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6CEC1C0A for ; Fri, 13 Jun 2025 06:16:06 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A98323F59E for ; Fri, 13 Jun 2025 06:16:26 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 08/10] oe/license_finder: don't return the "crunched" license text in crunch_license Date: Fri, 13 Jun 2025 14:16:17 +0100 Message-ID: <20250613131620.221912-8-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218608 crunch_license() will perform some basic text manipulation to try and canonicalise the license texts. It also returns the new license text but none of the callers use this, and as a slightly mangled version of the original it has no real purpose. Remove this return value and clean up the callers. Signed-off-by: Ross Burton --- meta/lib/oe/license_finder.py | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index be03e5d0846..cacb4cb19d6 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -51,7 +51,7 @@ def crunch_known_licenses(d): commonlicdir = d.getVar('COMMON_LICENSE_DIR') for fn in sorted(os.listdir(commonlicdir)): - md5value, lictext = crunch_license(os.path.join(commonlicdir, fn)) + md5value = crunch_license(os.path.join(commonlicdir, fn)) if md5value not in crunched_md5sums: crunched_md5sums[md5value] = fn elif fn != crunched_md5sums[md5value]: @@ -123,8 +123,7 @@ def crunch_license(licfile): md5val = m.hexdigest() except UnicodeEncodeError: md5val = None - lictext = '' - return md5val, lictext + return md5val def find_license_files(srctree, first_only=False): @@ -164,15 +163,15 @@ def match_licenses(licfiles, srctree, d): md5value = bb.utils.md5_file(resolved_licfile) license = md5sums.get(md5value, None) if not license: - crunched_md5, lictext = crunch_license(resolved_licfile) + crunched_md5 = crunch_license(resolved_licfile) license = crunched_md5sums.get(crunched_md5, None) - if lictext and not license: + if not license: license = 'Unknown' logger.info("Please add the following line for '%s' to a 'license-hashes.csv' " \ "and replace `Unknown` with the license:\n" \ "%s,Unknown" % (os.path.relpath(licfile, srctree + "/.."), md5value)) - if license: - licenses.append((license, os.path.relpath(licfile, srctree), md5value)) + + licenses.append((license, os.path.relpath(licfile, srctree), md5value)) return licenses From patchwork Fri Jun 13 13:16:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFEC3C71159 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.10096.1749820588015741745 for ; Fri, 13 Jun 2025 06:16:28 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4F0EA1C0A for ; Fri, 13 Jun 2025 06:16:07 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 518393F59E for ; Fri, 13 Jun 2025 06:16:27 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 09/10] oe/license_finder: rewrite license checksum loading, scan more licenses Date: Fri, 13 Jun 2025 14:16:18 +0100 Message-ID: <20250613131620.221912-9-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218609 Rewrite the license checksum generation and loading of CSV files to be clearer. This also expands the scan of COMMON_LICENSE_DIR to include LICENSE_PATH, which can be extended by layers to provide more license texts. Signed-off-by: Ross Burton --- meta/lib/oe/license_finder.py | 65 ++++++++++++++++------------------- 1 file changed, 29 insertions(+), 36 deletions(-) diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index cacb4cb19d6..1bdc39e1c53 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -11,24 +11,18 @@ import os import re import bb +import bb.utils logger = logging.getLogger("BitBake.OE.LicenseFinder") -def get_license_md5sums(d): - import bb.utils +def _load_hash_csv(d): + """ + Load a mapping of (checksum: license name) from all files/license-hashes.csv + files that can be found in the available layers. + """ import csv md5sums = {} - # Gather md5sums of license files in common license dir - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in os.listdir(commonlicdir): - md5value = bb.utils.md5_file(os.path.join(commonlicdir, fn)) - md5sums[md5value] = fn - - # The following were extracted from common values in various recipes - # (double checking the license against the license file itself, not just - # the LICENSE value in the recipe) - # Read license md5sums from csv file for path in d.getVar('BBPATH').split(':'): csv_path = os.path.join(path, 'files', 'license-hashes.csv') @@ -41,28 +35,28 @@ def get_license_md5sums(d): return md5sums -def crunch_known_licenses(d): - ''' - Calculate the MD5 checksums for the crunched versions of all common - licenses. Also add additional known checksums. - ''' - - crunched_md5sums = {} +def _crunch_known_licenses(d): + """ + Calculate the MD5 checksums for the original and "crunched" versions of all + known licenses. + """ + md5sums = {} - commonlicdir = d.getVar('COMMON_LICENSE_DIR') - for fn in sorted(os.listdir(commonlicdir)): - md5value = crunch_license(os.path.join(commonlicdir, fn)) - if md5value not in crunched_md5sums: - crunched_md5sums[md5value] = fn - elif fn != crunched_md5sums[md5value]: - bb.debug(2, "crunched_md5sums['%s'] is already set to '%s' rather than '%s'" % (md5value, crunched_md5sums[md5value], fn)) - else: - bb.debug(2, "crunched_md5sums['%s'] is already set to '%s'" % (md5value, crunched_md5sums[md5value])) + lic_dirs = [d.getVar('COMMON_LICENSE_DIR')] + (d.getVar('LICENSE_PATH') or "").split() + for lic_dir in lic_dirs: + for fn in os.listdir(lic_dir): + path = os.path.join(lic_dir, fn) + # Hash the exact contents + md5value = bb.utils.md5_file(path) + md5sums[md5value] = fn + # Also hash a "crunched" version + md5value = _crunch_license(path) + md5sums[md5value] = fn - return crunched_md5sums + return md5sums -def crunch_license(licfile): +def _crunch_license(licfile): ''' Remove non-material text from a license file and then calculate its md5sum. This works well for licenses that contain a copyright statement, @@ -152,10 +146,9 @@ def find_license_files(srctree, first_only=False): def match_licenses(licfiles, srctree, d): - import bb - md5sums = get_license_md5sums(d) - - crunched_md5sums = crunch_known_licenses(d) + md5sums = {} + md5sums.update(_load_hash_csv(d)) + md5sums.update(_crunch_known_licenses(d)) licenses = [] for licfile in sorted(licfiles): @@ -163,8 +156,8 @@ def match_licenses(licfiles, srctree, d): md5value = bb.utils.md5_file(resolved_licfile) license = md5sums.get(md5value, None) if not license: - crunched_md5 = crunch_license(resolved_licfile) - license = crunched_md5sums.get(crunched_md5, None) + crunched_md5 = _crunch_license(resolved_licfile) + license = md5sums.get(crunched_md5, None) if not license: license = 'Unknown' logger.info("Please add the following line for '%s' to a 'license-hashes.csv' " \ From patchwork Fri Jun 13 13:16:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 64922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF990C71156 for ; Fri, 13 Jun 2025 13:16:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.10097.1749820588788319456 for ; Fri, 13 Jun 2025 06:16:28 -0700 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E8C011C0A for ; Fri, 13 Jun 2025 06:16:07 -0700 (PDT) Received: from cesw-amp-gbt-1s-m12830-04.lab.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id ED3443F59E for ; Fri, 13 Jun 2025 06:16:27 -0700 (PDT) From: Ross Burton To: openembedded-core@lists.openembedded.org Subject: [PATCH 10/10] oe/license_finder: support extra hashes being passed to find_licenses Date: Fri, 13 Jun 2025 14:16:19 +0100 Message-ID: <20250613131620.221912-10-ross.burton@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250613131620.221912-1-ross.burton@arm.com> References: <20250613131620.221912-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 13 Jun 2025 13:16:31 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/openembedded-core/message/218610 When using the license finder the caller might know some more license hashes, for example if it is updating existing metadata. Allow the caller to pass more hashes that can be used when identifying licenses. Signed-off-by: Ross Burton --- meta/lib/oe/license_finder.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/meta/lib/oe/license_finder.py b/meta/lib/oe/license_finder.py index 1bdc39e1c53..16f5d7c94cb 100644 --- a/meta/lib/oe/license_finder.py +++ b/meta/lib/oe/license_finder.py @@ -145,10 +145,11 @@ def find_license_files(srctree, first_only=False): return licfiles -def match_licenses(licfiles, srctree, d): +def match_licenses(licfiles, srctree, d, extra_hashes={}): md5sums = {} md5sums.update(_load_hash_csv(d)) md5sums.update(_crunch_known_licenses(d)) + md5sums.update(extra_hashes) licenses = [] for licfile in sorted(licfiles): @@ -169,9 +170,9 @@ def match_licenses(licfiles, srctree, d): return licenses -def find_licenses(srctree, d, first_only=False): +def find_licenses(srctree, d, first_only=False, extra_hashes={}): licfiles = find_license_files(srctree, first_only) - licenses = match_licenses(licfiles, srctree, d) + licenses = match_licenses(licfiles, srctree, d, extra_hashes) # FIXME should we grab at least one source file with a license header and add that too?