diff mbox series

[v2,2/5] fetch2/github_release_artifact: fetcher for (private) release artifacts

Message ID 20250227090853.1632280-3-l.goehrs@pengutronix.de
State New
Headers show
Series fetch2/github_release_artifact: fetcher for (private) release artifacts | expand

Commit Message

Leonard Göhrs Feb. 27, 2025, 9:08 a.m. UTC
This fetcher enables downloading artifacts attached to GitHub releases
in _private repositories_ (public repositories can just use download URLs
like `https://github.com/rauc/rauc/releases/download/v1.13/rauc-1.13.tar.xz`
which work without authentication).

The `SRC_URI` includes the Github account and repository (`rauc/rauc`),
git tag of the release (`v1.13`) and the name of the artifact file
(`rauc-1.13.tar.xz`):

  SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz"
  SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157...

Authentication is provided using tokens, for example by configuring them
in the `local.conf`¹:

  $ grep TOKEN build/conf/local.conf
  GH_TOKEN = "github_pat_123456789abc...

The token may also be provided using an environment variable,
e.g. for CI builds:

  $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN"
  $ export GH_TOKEN="github_pat_123456789abc...
  $ bitbake rauc

It is also possible to provide per-URI tokens using an URI parameter:

  SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz;token=g...
  SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157...

and per-recipe tokens by setting `GH_TOKEN` inside the recipe:

  GH_TOKEN = "github_pat_123456789abc...
  SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz"
  SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157...

¹ Note:

  When using a personal access token, it should be restricted in scope to
  only allow downloading of release artifacts for the required
  repositories.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
---
 lib/bb/fetch2/__init__.py                |  4 +-
 lib/bb/fetch2/github_release_artifact.py | 93 ++++++++++++++++++++++++
 2 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 lib/bb/fetch2/github_release_artifact.py

Comments

Alexander Kanavin Feb. 27, 2025, 9:42 a.m. UTC | #1
On Thu, 27 Feb 2025 at 10:09, Leonard Göhrs via lists.yoctoproject.org
<lgo=pengutronix.de@lists.yoctoproject.org> wrote:

> Authentication is provided using tokens, for example by configuring them
> in the `local.conf`¹:
>
>   $ grep TOKEN build/conf/local.conf
>   GH_TOKEN = "github_pat_123456789abc...
>
> The token may also be provided using an environment variable,
> e.g. for CI builds:
>
>   $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN"
>   $ export GH_TOKEN="github_pat_123456789abc...
>   $ bitbake rauc

Unfortunately I have to say no again. These still put the secret token
into a bitbake variable, and are therefore not any more secure than
just putting the token directly into a recipe. I have to restate: you
need to find a way for wget to pick up the needed token and produce a
header directly from netrc or similar file that is managed outside of
layer/bitbake metadata, so that bitbake never even sees it.

Alex
Leonard Göhrs Feb. 27, 2025, 10:37 a.m. UTC | #2
Hi,

On 27.02.25 10:42, Alexander Kanavin via lists.openembedded.org wrote:
> On Thu, 27 Feb 2025 at 10:09, Leonard Göhrs via lists.yoctoproject.org
> <lgo=pengutronix.de@lists.yoctoproject.org> wrote:
> 
>> Authentication is provided using tokens, for example by configuring them
>> in the `local.conf`¹:
>>
>>    $ grep TOKEN build/conf/local.conf
>>    GH_TOKEN = "github_pat_123456789abc...
>>
>> The token may also be provided using an environment variable,
>> e.g. for CI builds:
>>
>>    $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN"
>>    $ export GH_TOKEN="github_pat_123456789abc...
>>    $ bitbake rauc
> 
> Unfortunately I have to say no again. These still put the secret token
> into a bitbake variable, and are therefore not any more secure than
> just putting the token directly into a recipe. 

The fetcher is written under the assumption that users will either use
the fetcher inside a GitHub action, where a GitHub token valid for the
duration of the build is automatically provided, or on their machine,
using a Fine-grained personal access token[1] that only allows download
access to a limited set of repositories.

I agree that it would be a very bad idea to use a classic personal access
token or a Fine-grained personal access token with too many permissions.
The documentation should be written in a way that makes this quite clear
and should warn of the dangers of doing so.

Given the fine grained permission model I don't really see why the token
needs stricter protection than e.g. the `AZ_SAS` secret used in the Azure
Storage fetcher or the `username:password` combinations used e.g.
in `https://`-URLs.

An attacker that is able to gain access to the token, which would enable
them to download release artifacts, would likely also be able to just gain
access to the artifacts that bitbake has downloaded.
The only thing they would have gained is access to future and prior
releases as well.

> I have to restate: you need to find a way for wget to pick up the
> needed token and produce a header directly from netrc or similar
> file that is managed outside of layer/bitbake metadata, so that
> bitbake never even sees it.
I've had a look at the `.netrc` / `.wgetrc` documentation and it looks
like it possible to provide custom headers in the `.wgetrc`, but it do not
see a way to specify which hosts they should be sent to.

We could add a mechanism to `wget.py` that searches for a per-host
`.wgetrc` and maybe allow overrides via a `SRC_URI`-Parameter.
But I don't really like the idea. One misconfiguration would mean silently
sending the GitHub token Authentication header on _every_ request to any host.

Leonard

[1]: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#fine-grained-personal-access-tokens
Alexander Kanavin Feb. 27, 2025, 4:33 p.m. UTC | #3
On Thu, 27 Feb 2025 at 11:37, Leonard Göhrs <l.goehrs@pengutronix.de> wrote:
> I've had a look at the `.netrc` / `.wgetrc` documentation and it looks
> like it possible to provide custom headers in the `.wgetrc`, but it do not
> see a way to specify which hosts they should be sent to.
>
> We could add a mechanism to `wget.py` that searches for a per-host
> `.wgetrc` and maybe allow overrides via a `SRC_URI`-Parameter.
> But I don't really like the idea. One misconfiguration would mean silently
> sending the GitHub token Authentication header on _every_ request to any host.

You can't make the argument that attacker gaining access to protected
artifacts is not a big deal; if it isn't then why are they protected?
Just publish them openly, without those pesky tokens.

You similarly can't make assumption that people will handle those
personal tokens with care; they won't. They'll put them into layers or
otherwise share them, against everything documentation says, because
it's convenient.

AZ_SAS and username/password support in https fetchers are both bad
precedents, and should be removed. The correct precedent is set in the
git fetcher, and with the recent changes to npm fetcher: bitbake is
not allowed to see secrets, period. Bitbake was never designed to
handle sensitive data, and it can easily leak into for example logs,
which are then published in CI for everyone to see.

What could help is wget picking up additional settings from a
wgetrc-like file specified with a command line parameter. Can you
check wget source code to see if /etc/wgetrc and ~/.wgetrc are both
hardcoded, or is there additional parametrization possible? Raising
this with upstream would be also appreciated, perhaps you can propose
a patch. Then we can think of how to provide those custom settings
from a fetcher.

Please also see the discussion around npm authentication we've
recently had, as it has many of the same arguments and approaches
discussed, and we were able to arrive at something that satisfies all
sides.

Alex
diff mbox series

Patch

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 06d4fd011..1946ca6cb 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1293,7 +1293,7 @@  class FetchData(object):
             elif checksum_plain_name in self.parm:
                 checksum_expected = self.parm[checksum_plain_name]
                 checksum_name = checksum_plain_name
-            elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm"]:
+            elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm", "ghra"]:
                 checksum_expected = None
             else:
                 checksum_expected = d.getVarFlag("SRC_URI", checksum_name)
@@ -2069,6 +2069,7 @@  from . import az
 from . import crate
 from . import gcp
 from . import gomod
+from . import github_release_artifact
 
 methods.append(local.Local())
 methods.append(wget.Wget())
@@ -2093,3 +2094,4 @@  methods.append(crate.Crate())
 methods.append(gcp.GCP())
 methods.append(gomod.GoMod())
 methods.append(gomod.GoModGit())
+methods.append(github_release_artifact.GitHubReleaseArtifact())
diff --git a/lib/bb/fetch2/github_release_artifact.py b/lib/bb/fetch2/github_release_artifact.py
new file mode 100644
index 000000000..d5a2646ce
--- /dev/null
+++ b/lib/bb/fetch2/github_release_artifact.py
@@ -0,0 +1,93 @@ 
+"""
+BitBake 'Fetch' GitHub release artifacts implementation
+
+"""
+
+# Copyright (C) 2025 Leonard Göhrs
+#
+# Based on bb.fetch2.wget:
+# Copyright (C) 2003, 2004  Chris Larson
+#
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Based on functions from the base bb module, Copyright 2003 Holger Schurig
+
+import json
+
+from urllib.request import urlopen, Request
+
+from bb.fetch2 import FetchError
+from bb.fetch2.wget import Wget
+
+
+class GitHubReleaseArtifact(Wget):
+    API_HEADERS = {
+        "Accept": "application/vnd.github+json",
+        "X-GitHub-Api-Version": "2022-11-28",
+    }
+
+    DOWNLOAD_HEADERS = {
+        "Accept": "application/octet-stream"
+    }
+
+    def supports(self, ud, d):
+        return ud.type in ["ghra"]
+
+    def _resolve_artifact_url(self, ud, d):
+        """Resolve `ghra://` pseudo URLs to `https://` URLs and set auth header.
+
+        This method resolved URLs like `ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz`
+        to a backing URL like `https://api.github.com/repos/rauc/rauc/releases/assets/222455085`
+        while optionally setting the required authentication headers to download from
+        private repositories.
+        """
+
+        try:
+            user, repo, tag, asset_name = ud.path.strip("/").split("/")
+        except ValueError as e:
+            raise FetchError(
+                f"Expected path like '/<user>/<repo>/<tag>/<asset_name>', got: '{ud.path}'"
+            ) from e
+
+        # The GitHub authentication token may be provided as URL parameter
+        # (to enable using different tokens for different URLs in the same recipe)
+        # or via a variable for cleaner URLs.
+        token = ud.parm.get("token") or d.getVar("GH_TOKEN")
+
+        meta_url = f"https://api.{ud.host}/repos/{user}/{repo}/releases/tags/{tag}"
+
+        auth_headers = {}
+
+        if token is not None:
+            auth_headers["Authorization"] = f"Bearer {token}"
+
+        try:
+            req = Request(url=meta_url, headers=(auth_headers | self.API_HEADERS))
+            with urlopen(req) as resp:
+                result = json.load(resp)
+
+        except Exception as e:
+            raise FetchError(f"Error downloading artifact list: {e}") from e
+
+        asset_urls = dict((asset["name"], asset["url"]) for asset in result["assets"])
+
+        if asset_name not in asset_urls:
+            asset_list = ", ".join(asset_urls.keys())
+            raise FetchError(
+                f"Did not find asset '{asset_name}' in release asset list: {asset_list}"
+            )
+
+        # Override the `url` and `headers` in the FetchData object,
+        # enabling the Wget class to perform the actual downloading.
+        ud.url = asset_urls[asset_name]
+        ud.headers = auth_headers | self.DOWNLOAD_HEADERS
+
+    def checkstatus(self, fetch, ud, d):
+        self._resolve_artifact_url(ud, d)
+
+        return super().checkstatus(fetch, ud, d)
+
+    def download(self, ud, d):
+        self._resolve_artifact_url(ud, d)
+
+        return super().download(ud, d)