diff mbox series

[v2,2/5] fetch2/github_release_artifact: fetcher for (private) release artifacts

Message ID 20250227090853.1632280-3-l.goehrs@pengutronix.de
State Not Applicable
Headers show
Series fetch2/github_release_artifact: fetcher for (private) release artifacts | expand

Commit Message

Leonard Göhrs Feb. 27, 2025, 9:08 a.m. UTC
This fetcher enables downloading artifacts attached to GitHub releases
in _private repositories_ (public repositories can just use download URLs
like `https://github.com/rauc/rauc/releases/download/v1.13/rauc-1.13.tar.xz`
which work without authentication).

The `SRC_URI` includes the Github account and repository (`rauc/rauc`),
git tag of the release (`v1.13`) and the name of the artifact file
(`rauc-1.13.tar.xz`):

  SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz"
  SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157...

Authentication is provided using tokens, for example by configuring them
in the `local.conf`¹:

  $ grep TOKEN build/conf/local.conf
  GH_TOKEN = "github_pat_123456789abc...

The token may also be provided using an environment variable,
e.g. for CI builds:

  $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN"
  $ export GH_TOKEN="github_pat_123456789abc...
  $ bitbake rauc

It is also possible to provide per-URI tokens using an URI parameter:

  SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz;token=g...
  SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157...

and per-recipe tokens by setting `GH_TOKEN` inside the recipe:

  GH_TOKEN = "github_pat_123456789abc...
  SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz"
  SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157...

¹ Note:

  When using a personal access token, it should be restricted in scope to
  only allow downloading of release artifacts for the required
  repositories.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
---
 lib/bb/fetch2/__init__.py                |  4 +-
 lib/bb/fetch2/github_release_artifact.py | 93 ++++++++++++++++++++++++
 2 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 lib/bb/fetch2/github_release_artifact.py

Comments

Alexander Kanavin Feb. 27, 2025, 9:42 a.m. UTC | #1
On Thu, 27 Feb 2025 at 10:09, Leonard Göhrs via lists.yoctoproject.org
<lgo=pengutronix.de@lists.yoctoproject.org> wrote:

> Authentication is provided using tokens, for example by configuring them
> in the `local.conf`¹:
>
>   $ grep TOKEN build/conf/local.conf
>   GH_TOKEN = "github_pat_123456789abc...
>
> The token may also be provided using an environment variable,
> e.g. for CI builds:
>
>   $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN"
>   $ export GH_TOKEN="github_pat_123456789abc...
>   $ bitbake rauc

Unfortunately I have to say no again. These still put the secret token
into a bitbake variable, and are therefore not any more secure than
just putting the token directly into a recipe. I have to restate: you
need to find a way for wget to pick up the needed token and produce a
header directly from netrc or similar file that is managed outside of
layer/bitbake metadata, so that bitbake never even sees it.

Alex
Alexander Kanavin Feb. 27, 2025, 4:33 p.m. UTC | #2
On Thu, 27 Feb 2025 at 11:37, Leonard Göhrs <l.goehrs@pengutronix.de> wrote:
> I've had a look at the `.netrc` / `.wgetrc` documentation and it looks
> like it possible to provide custom headers in the `.wgetrc`, but it do not
> see a way to specify which hosts they should be sent to.
>
> We could add a mechanism to `wget.py` that searches for a per-host
> `.wgetrc` and maybe allow overrides via a `SRC_URI`-Parameter.
> But I don't really like the idea. One misconfiguration would mean silently
> sending the GitHub token Authentication header on _every_ request to any host.

You can't make the argument that attacker gaining access to protected
artifacts is not a big deal; if it isn't then why are they protected?
Just publish them openly, without those pesky tokens.

You similarly can't make assumption that people will handle those
personal tokens with care; they won't. They'll put them into layers or
otherwise share them, against everything documentation says, because
it's convenient.

AZ_SAS and username/password support in https fetchers are both bad
precedents, and should be removed. The correct precedent is set in the
git fetcher, and with the recent changes to npm fetcher: bitbake is
not allowed to see secrets, period. Bitbake was never designed to
handle sensitive data, and it can easily leak into for example logs,
which are then published in CI for everyone to see.

What could help is wget picking up additional settings from a
wgetrc-like file specified with a command line parameter. Can you
check wget source code to see if /etc/wgetrc and ~/.wgetrc are both
hardcoded, or is there additional parametrization possible? Raising
this with upstream would be also appreciated, perhaps you can propose
a patch. Then we can think of how to provide those custom settings
from a fetcher.

Please also see the discussion around npm authentication we've
recently had, as it has many of the same arguments and approaches
discussed, and we were able to arrive at something that satisfies all
sides.

Alex
diff mbox series

Patch

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 06d4fd011..1946ca6cb 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1293,7 +1293,7 @@  class FetchData(object):
             elif checksum_plain_name in self.parm:
                 checksum_expected = self.parm[checksum_plain_name]
                 checksum_name = checksum_plain_name
-            elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm"]:
+            elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm", "ghra"]:
                 checksum_expected = None
             else:
                 checksum_expected = d.getVarFlag("SRC_URI", checksum_name)
@@ -2069,6 +2069,7 @@  from . import az
 from . import crate
 from . import gcp
 from . import gomod
+from . import github_release_artifact
 
 methods.append(local.Local())
 methods.append(wget.Wget())
@@ -2093,3 +2094,4 @@  methods.append(crate.Crate())
 methods.append(gcp.GCP())
 methods.append(gomod.GoMod())
 methods.append(gomod.GoModGit())
+methods.append(github_release_artifact.GitHubReleaseArtifact())
diff --git a/lib/bb/fetch2/github_release_artifact.py b/lib/bb/fetch2/github_release_artifact.py
new file mode 100644
index 000000000..d5a2646ce
--- /dev/null
+++ b/lib/bb/fetch2/github_release_artifact.py
@@ -0,0 +1,93 @@ 
+"""
+BitBake 'Fetch' GitHub release artifacts implementation
+
+"""
+
+# Copyright (C) 2025 Leonard Göhrs
+#
+# Based on bb.fetch2.wget:
+# Copyright (C) 2003, 2004  Chris Larson
+#
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Based on functions from the base bb module, Copyright 2003 Holger Schurig
+
+import json
+
+from urllib.request import urlopen, Request
+
+from bb.fetch2 import FetchError
+from bb.fetch2.wget import Wget
+
+
+class GitHubReleaseArtifact(Wget):
+    API_HEADERS = {
+        "Accept": "application/vnd.github+json",
+        "X-GitHub-Api-Version": "2022-11-28",
+    }
+
+    DOWNLOAD_HEADERS = {
+        "Accept": "application/octet-stream"
+    }
+
+    def supports(self, ud, d):
+        return ud.type in ["ghra"]
+
+    def _resolve_artifact_url(self, ud, d):
+        """Resolve `ghra://` pseudo URLs to `https://` URLs and set auth header.
+
+        This method resolved URLs like `ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz`
+        to a backing URL like `https://api.github.com/repos/rauc/rauc/releases/assets/222455085`
+        while optionally setting the required authentication headers to download from
+        private repositories.
+        """
+
+        try:
+            user, repo, tag, asset_name = ud.path.strip("/").split("/")
+        except ValueError as e:
+            raise FetchError(
+                f"Expected path like '/<user>/<repo>/<tag>/<asset_name>', got: '{ud.path}'"
+            ) from e
+
+        # The GitHub authentication token may be provided as URL parameter
+        # (to enable using different tokens for different URLs in the same recipe)
+        # or via a variable for cleaner URLs.
+        token = ud.parm.get("token") or d.getVar("GH_TOKEN")
+
+        meta_url = f"https://api.{ud.host}/repos/{user}/{repo}/releases/tags/{tag}"
+
+        auth_headers = {}
+
+        if token is not None:
+            auth_headers["Authorization"] = f"Bearer {token}"
+
+        try:
+            req = Request(url=meta_url, headers=(auth_headers | self.API_HEADERS))
+            with urlopen(req) as resp:
+                result = json.load(resp)
+
+        except Exception as e:
+            raise FetchError(f"Error downloading artifact list: {e}") from e
+
+        asset_urls = dict((asset["name"], asset["url"]) for asset in result["assets"])
+
+        if asset_name not in asset_urls:
+            asset_list = ", ".join(asset_urls.keys())
+            raise FetchError(
+                f"Did not find asset '{asset_name}' in release asset list: {asset_list}"
+            )
+
+        # Override the `url` and `headers` in the FetchData object,
+        # enabling the Wget class to perform the actual downloading.
+        ud.url = asset_urls[asset_name]
+        ud.headers = auth_headers | self.DOWNLOAD_HEADERS
+
+    def checkstatus(self, fetch, ud, d):
+        self._resolve_artifact_url(ud, d)
+
+        return super().checkstatus(fetch, ud, d)
+
+    def download(self, ud, d):
+        self._resolve_artifact_url(ud, d)
+
+        return super().download(ud, d)