Message ID | 20250227090853.1632280-3-l.goehrs@pengutronix.de |
---|---|
State | New |
Headers | show |
Series | fetch2/github_release_artifact: fetcher for (private) release artifacts | expand |
On Thu, 27 Feb 2025 at 10:09, Leonard Göhrs via lists.yoctoproject.org <lgo=pengutronix.de@lists.yoctoproject.org> wrote: > Authentication is provided using tokens, for example by configuring them > in the `local.conf`¹: > > $ grep TOKEN build/conf/local.conf > GH_TOKEN = "github_pat_123456789abc... > > The token may also be provided using an environment variable, > e.g. for CI builds: > > $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN" > $ export GH_TOKEN="github_pat_123456789abc... > $ bitbake rauc Unfortunately I have to say no again. These still put the secret token into a bitbake variable, and are therefore not any more secure than just putting the token directly into a recipe. I have to restate: you need to find a way for wget to pick up the needed token and produce a header directly from netrc or similar file that is managed outside of layer/bitbake metadata, so that bitbake never even sees it. Alex
Hi, On 27.02.25 10:42, Alexander Kanavin via lists.openembedded.org wrote: > On Thu, 27 Feb 2025 at 10:09, Leonard Göhrs via lists.yoctoproject.org > <lgo=pengutronix.de@lists.yoctoproject.org> wrote: > >> Authentication is provided using tokens, for example by configuring them >> in the `local.conf`¹: >> >> $ grep TOKEN build/conf/local.conf >> GH_TOKEN = "github_pat_123456789abc... >> >> The token may also be provided using an environment variable, >> e.g. for CI builds: >> >> $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN" >> $ export GH_TOKEN="github_pat_123456789abc... >> $ bitbake rauc > > Unfortunately I have to say no again. These still put the secret token > into a bitbake variable, and are therefore not any more secure than > just putting the token directly into a recipe. The fetcher is written under the assumption that users will either use the fetcher inside a GitHub action, where a GitHub token valid for the duration of the build is automatically provided, or on their machine, using a Fine-grained personal access token[1] that only allows download access to a limited set of repositories. I agree that it would be a very bad idea to use a classic personal access token or a Fine-grained personal access token with too many permissions. The documentation should be written in a way that makes this quite clear and should warn of the dangers of doing so. Given the fine grained permission model I don't really see why the token needs stricter protection than e.g. the `AZ_SAS` secret used in the Azure Storage fetcher or the `username:password` combinations used e.g. in `https://`-URLs. An attacker that is able to gain access to the token, which would enable them to download release artifacts, would likely also be able to just gain access to the artifacts that bitbake has downloaded. The only thing they would have gained is access to future and prior releases as well. > I have to restate: you need to find a way for wget to pick up the > needed token and produce a header directly from netrc or similar > file that is managed outside of layer/bitbake metadata, so that > bitbake never even sees it. I've had a look at the `.netrc` / `.wgetrc` documentation and it looks like it possible to provide custom headers in the `.wgetrc`, but it do not see a way to specify which hosts they should be sent to. We could add a mechanism to `wget.py` that searches for a per-host `.wgetrc` and maybe allow overrides via a `SRC_URI`-Parameter. But I don't really like the idea. One misconfiguration would mean silently sending the GitHub token Authentication header on _every_ request to any host. Leonard [1]: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#fine-grained-personal-access-tokens
On Thu, 27 Feb 2025 at 11:37, Leonard Göhrs <l.goehrs@pengutronix.de> wrote: > I've had a look at the `.netrc` / `.wgetrc` documentation and it looks > like it possible to provide custom headers in the `.wgetrc`, but it do not > see a way to specify which hosts they should be sent to. > > We could add a mechanism to `wget.py` that searches for a per-host > `.wgetrc` and maybe allow overrides via a `SRC_URI`-Parameter. > But I don't really like the idea. One misconfiguration would mean silently > sending the GitHub token Authentication header on _every_ request to any host. You can't make the argument that attacker gaining access to protected artifacts is not a big deal; if it isn't then why are they protected? Just publish them openly, without those pesky tokens. You similarly can't make assumption that people will handle those personal tokens with care; they won't. They'll put them into layers or otherwise share them, against everything documentation says, because it's convenient. AZ_SAS and username/password support in https fetchers are both bad precedents, and should be removed. The correct precedent is set in the git fetcher, and with the recent changes to npm fetcher: bitbake is not allowed to see secrets, period. Bitbake was never designed to handle sensitive data, and it can easily leak into for example logs, which are then published in CI for everyone to see. What could help is wget picking up additional settings from a wgetrc-like file specified with a command line parameter. Can you check wget source code to see if /etc/wgetrc and ~/.wgetrc are both hardcoded, or is there additional parametrization possible? Raising this with upstream would be also appreciated, perhaps you can propose a patch. Then we can think of how to provide those custom settings from a fetcher. Please also see the discussion around npm authentication we've recently had, as it has many of the same arguments and approaches discussed, and we were able to arrive at something that satisfies all sides. Alex
diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py index 06d4fd011..1946ca6cb 100644 --- a/lib/bb/fetch2/__init__.py +++ b/lib/bb/fetch2/__init__.py @@ -1293,7 +1293,7 @@ class FetchData(object): elif checksum_plain_name in self.parm: checksum_expected = self.parm[checksum_plain_name] checksum_name = checksum_plain_name - elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm"]: + elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm", "ghra"]: checksum_expected = None else: checksum_expected = d.getVarFlag("SRC_URI", checksum_name) @@ -2069,6 +2069,7 @@ from . import az from . import crate from . import gcp from . import gomod +from . import github_release_artifact methods.append(local.Local()) methods.append(wget.Wget()) @@ -2093,3 +2094,4 @@ methods.append(crate.Crate()) methods.append(gcp.GCP()) methods.append(gomod.GoMod()) methods.append(gomod.GoModGit()) +methods.append(github_release_artifact.GitHubReleaseArtifact()) diff --git a/lib/bb/fetch2/github_release_artifact.py b/lib/bb/fetch2/github_release_artifact.py new file mode 100644 index 000000000..d5a2646ce --- /dev/null +++ b/lib/bb/fetch2/github_release_artifact.py @@ -0,0 +1,93 @@ +""" +BitBake 'Fetch' GitHub release artifacts implementation + +""" + +# Copyright (C) 2025 Leonard Göhrs +# +# Based on bb.fetch2.wget: +# Copyright (C) 2003, 2004 Chris Larson +# +# SPDX-License-Identifier: GPL-2.0-only +# +# Based on functions from the base bb module, Copyright 2003 Holger Schurig + +import json + +from urllib.request import urlopen, Request + +from bb.fetch2 import FetchError +from bb.fetch2.wget import Wget + + +class GitHubReleaseArtifact(Wget): + API_HEADERS = { + "Accept": "application/vnd.github+json", + "X-GitHub-Api-Version": "2022-11-28", + } + + DOWNLOAD_HEADERS = { + "Accept": "application/octet-stream" + } + + def supports(self, ud, d): + return ud.type in ["ghra"] + + def _resolve_artifact_url(self, ud, d): + """Resolve `ghra://` pseudo URLs to `https://` URLs and set auth header. + + This method resolved URLs like `ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz` + to a backing URL like `https://api.github.com/repos/rauc/rauc/releases/assets/222455085` + while optionally setting the required authentication headers to download from + private repositories. + """ + + try: + user, repo, tag, asset_name = ud.path.strip("/").split("/") + except ValueError as e: + raise FetchError( + f"Expected path like '/<user>/<repo>/<tag>/<asset_name>', got: '{ud.path}'" + ) from e + + # The GitHub authentication token may be provided as URL parameter + # (to enable using different tokens for different URLs in the same recipe) + # or via a variable for cleaner URLs. + token = ud.parm.get("token") or d.getVar("GH_TOKEN") + + meta_url = f"https://api.{ud.host}/repos/{user}/{repo}/releases/tags/{tag}" + + auth_headers = {} + + if token is not None: + auth_headers["Authorization"] = f"Bearer {token}" + + try: + req = Request(url=meta_url, headers=(auth_headers | self.API_HEADERS)) + with urlopen(req) as resp: + result = json.load(resp) + + except Exception as e: + raise FetchError(f"Error downloading artifact list: {e}") from e + + asset_urls = dict((asset["name"], asset["url"]) for asset in result["assets"]) + + if asset_name not in asset_urls: + asset_list = ", ".join(asset_urls.keys()) + raise FetchError( + f"Did not find asset '{asset_name}' in release asset list: {asset_list}" + ) + + # Override the `url` and `headers` in the FetchData object, + # enabling the Wget class to perform the actual downloading. + ud.url = asset_urls[asset_name] + ud.headers = auth_headers | self.DOWNLOAD_HEADERS + + def checkstatus(self, fetch, ud, d): + self._resolve_artifact_url(ud, d) + + return super().checkstatus(fetch, ud, d) + + def download(self, ud, d): + self._resolve_artifact_url(ud, d) + + return super().download(ud, d)
This fetcher enables downloading artifacts attached to GitHub releases in _private repositories_ (public repositories can just use download URLs like `https://github.com/rauc/rauc/releases/download/v1.13/rauc-1.13.tar.xz` which work without authentication). The `SRC_URI` includes the Github account and repository (`rauc/rauc`), git tag of the release (`v1.13`) and the name of the artifact file (`rauc-1.13.tar.xz`): SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz" SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157... Authentication is provided using tokens, for example by configuring them in the `local.conf`¹: $ grep TOKEN build/conf/local.conf GH_TOKEN = "github_pat_123456789abc... The token may also be provided using an environment variable, e.g. for CI builds: $ export BB_ENV_PASSTHROUGH_ADDITIONS="GH_TOKEN" $ export GH_TOKEN="github_pat_123456789abc... $ bitbake rauc It is also possible to provide per-URI tokens using an URI parameter: SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz;token=g... SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157... and per-recipe tokens by setting `GH_TOKEN` inside the recipe: GH_TOKEN = "github_pat_123456789abc... SRC_URI = "ghra://github.com/rauc/rauc/v1.13/rauc-1.13.tar.xz" SRC_URI[sha256sum] = "1ddb218a5d713c8dbd6e04d5501d96629f1c8e252157... ¹ Note: When using a personal access token, it should be restricted in scope to only allow downloading of release artifacts for the required repositories. Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de> --- lib/bb/fetch2/__init__.py | 4 +- lib/bb/fetch2/github_release_artifact.py | 93 ++++++++++++++++++++++++ 2 files changed, 96 insertions(+), 1 deletion(-) create mode 100644 lib/bb/fetch2/github_release_artifact.py