From patchwork Tue Dec 3 20:02:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 53539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D4C9E74ACD for ; Tue, 3 Dec 2024 20:02:46 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.29405.1733256165572366793 for ; Tue, 03 Dec 2024 12:02:45 -0800 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 12BAAFEC for ; Tue, 3 Dec 2024 12:03:13 -0800 (PST) Received: from cesw-amp-gbt-1s-m12830-04.oss.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id ADEB63F5A1 for ; Tue, 3 Dec 2024 12:02:44 -0800 (PST) From: Ross Burton To: bitbake-devel@lists.openembedded.org Subject: [PATCH 1/2] fetch2/wget: handle HTTP 308 Permanent Redirect Date: Tue, 3 Dec 2024 20:02:41 +0000 Message-Id: <20241203200242.2955858-1-ross.burton@arm.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Tue, 03 Dec 2024 20:02:46 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/16858 urllib2.HTTPRedirectHandler.redirect_request doesn't handle HTTP reponse code 308 (Permanent Redirect). This was fixed in c379bc5 but can't be worked around without copying the entire redirect_request() method. When we can depend on Python 3.13, FixedHTTPRedirectHandler can be removed. Signed-off-by: Ross Burton --- bitbake/lib/bb/fetch2/wget.py | 42 ++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py index 773d41ca81a..fcb71246d9a 100644 --- a/bitbake/lib/bb/fetch2/wget.py +++ b/bitbake/lib/bb/fetch2/wget.py @@ -305,13 +305,45 @@ class Wget(FetchMethod): class FixedHTTPRedirectHandler(urllib.request.HTTPRedirectHandler): """ - urllib2.HTTPRedirectHandler resets the method to GET on redirect, - when we want to follow redirects using the original method. + urllib2.HTTPRedirectHandler before 3.13 has two flaws: + + It resets the method to GET on redirect when we want to follow + redirects using the original method (typically HEAD). This was fixed + in 759e8e7. + + It also doesn't handle 308 (Permanent Redirect). This was fixed in + c379bc5. + + Until we depend on Python 3.13 onwards, copy the redirect_request + method to fix these issues. """ def redirect_request(self, req, fp, code, msg, headers, newurl): - newreq = urllib.request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, newurl) - newreq.get_method = req.get_method - return newreq + m = req.get_method() + if (not (code in (301, 302, 303, 307, 308) and m in ("GET", "HEAD") + or code in (301, 302, 303) and m == "POST")): + raise urllib.HTTPError(req.full_url, code, msg, headers, fp) + + # Strictly (according to RFC 2616), 301 or 302 in response to + # a POST MUST NOT cause a redirection without confirmation + # from the user (of urllib.request, in this case). In practice, + # essentially all clients do redirect in this case, so we do + # the same. + + # Be conciliant with URIs containing a space. This is mainly + # redundant with the more complete encoding done in http_error_302(), + # but it is kept for compatibility with other callers. + newurl = newurl.replace(' ', '%20') + + CONTENT_HEADERS = ("content-length", "content-type") + newheaders = {k: v for k, v in req.headers.items() + if k.lower() not in CONTENT_HEADERS} + return urllib.request.Request(newurl, + method="HEAD" if m == "HEAD" else "GET", + headers=newheaders, + origin_req_host=req.origin_req_host, + unverifiable=True) + + http_error_308 = urllib.request.HTTPRedirectHandler.http_error_302 # We need to update the environment here as both the proxy and HTTPS # handlers need variables set. The proxy needs http_proxy and friends to From patchwork Tue Dec 3 20:02:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Burton X-Patchwork-Id: 53540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90EE4E74AC8 for ; Tue, 3 Dec 2024 20:02:46 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web10.29407.1733256166083961591 for ; Tue, 03 Dec 2024 12:02:46 -0800 Authentication-Results: mx.groups.io; dkim=none (message not signed); spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ross.burton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B5951143D for ; Tue, 3 Dec 2024 12:03:13 -0800 (PST) Received: from cesw-amp-gbt-1s-m12830-04.oss.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5CBCB3F5A1 for ; Tue, 3 Dec 2024 12:02:45 -0800 (PST) From: Ross Burton To: bitbake-devel@lists.openembedded.org Subject: [PATCH 2/2] fetch2/wget: URL-escape the path when constructing a URL for checkstatus Date: Tue, 3 Dec 2024 20:02:42 +0000 Message-Id: <20241203200242.2955858-2-ross.burton@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241203200242.2955858-1-ross.burton@arm.com> References: <20241203200242.2955858-1-ross.burton@arm.com> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Tue, 03 Dec 2024 20:02:46 -0000 X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/16859 ud.path has been unescaped (eg %20 is space) but as we're reconstructing a URL we should re-escape it. For example, unzip has a SRC_URI containing "UnZip%206.x%20%28latest%29/UnZip%206.0/unzip60.tar.gz" which then throws exceptions if the unescaped string " (latest)" is used. Also, this code uses the extracted ud.host and ud.path variables. These are unescaped but potentially stale as eg the cargo fetcher subclasses Wget() and reassigns ud.url on construction. Simplify the code by reconstructing a URL from ud.url directly instead of bouncing through intermediate variables that may be wrong or unescaped. Signed-off-by: Ross Burton --- bitbake/lib/bb/fetch2/wget.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py index fcb71246d9a..198426065b4 100644 --- a/bitbake/lib/bb/fetch2/wget.py +++ b/bitbake/lib/bb/fetch2/wget.py @@ -376,8 +376,8 @@ class Wget(FetchMethod): opener = urllib.request.build_opener(*handlers) try: - uri_base = ud.url.split(";")[0] - uri = "{}://{}{}".format(urllib.parse.urlparse(uri_base).scheme, ud.host, ud.path) + parts = urllib.parse.urlparse(ud.url.split(";")[0]) + uri = "{}://{}{}".format(parts.scheme, parts.netloc, parts.path) r = urllib.request.Request(uri) r.get_method = lambda: "HEAD" # Some servers (FusionForge, as used on Alioth) require that the