diff mbox series

[2/2] fetch2/wget: URL-escape the path when constructing a URL for checkstatus

Message ID 20241203200242.2955858-2-ross.burton@arm.com
State New
Headers show
Series [1/2] fetch2/wget: handle HTTP 308 Permanent Redirect | expand

Commit Message

Ross Burton Dec. 3, 2024, 8:02 p.m. UTC
ud.path has been unescaped (eg %20 is space) but as we're reconstructing
a URL we should re-escape it. For example, unzip has a SRC_URI
containing "UnZip%206.x%20%28latest%29/UnZip%206.0/unzip60.tar.gz" which
then throws exceptions if the unescaped string " (latest)" is used.

Also, this code uses the extracted ud.host and ud.path variables. These
are unescaped but potentially stale as eg the cargo fetcher subclasses
Wget() and reassigns ud.url on construction.

Simplify the code by reconstructing a URL from ud.url directly instead
of bouncing through intermediate variables that may be wrong or
unescaped.

Signed-off-by: Ross Burton <ross.burton@arm.com>
---
 bitbake/lib/bb/fetch2/wget.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Ross Burton Dec. 3, 2024, 8:04 p.m. UTC | #1
Note that I’d like someone with more than passing knowledge in the fetcher to double check this.

Also I just noticed the commit shortlog is entirely wrong (from an early version).

Ross

> On 3 Dec 2024, at 20:02, Ross Burton via lists.openembedded.org <ross.burton=arm.com@lists.openembedded.org> wrote:
> 
> ud.path has been unescaped (eg %20 is space) but as we're reconstructing
> a URL we should re-escape it. For example, unzip has a SRC_URI
> containing "UnZip%206.x%20%28latest%29/UnZip%206.0/unzip60.tar.gz" which
> then throws exceptions if the unescaped string " (latest)" is used.
> 
> Also, this code uses the extracted ud.host and ud.path variables. These
> are unescaped but potentially stale as eg the cargo fetcher subclasses
> Wget() and reassigns ud.url on construction.
> 
> Simplify the code by reconstructing a URL from ud.url directly instead
> of bouncing through intermediate variables that may be wrong or
> unescaped.
> 
> Signed-off-by: Ross Burton <ross.burton@arm.com>
> ---
> bitbake/lib/bb/fetch2/wget.py | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py
> index fcb71246d9a..198426065b4 100644
> --- a/bitbake/lib/bb/fetch2/wget.py
> +++ b/bitbake/lib/bb/fetch2/wget.py
> @@ -376,8 +376,8 @@ class Wget(FetchMethod):
>             opener = urllib.request.build_opener(*handlers)
> 
>             try:
> -                uri_base = ud.url.split(";")[0]
> -                uri = "{}://{}{}".format(urllib.parse.urlparse(uri_base).scheme, ud.host, ud.path)
> +                parts = urllib.parse.urlparse(ud.url.split(";")[0])
> +                uri = "{}://{}{}".format(parts.scheme, parts.netloc, parts.path)
>                 r = urllib.request.Request(uri)
>                 r.get_method = lambda: "HEAD"
>                 # Some servers (FusionForge, as used on Alioth) require that the
> -- 
> 2.34.1
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16859): https://lists.openembedded.org/g/bitbake-devel/message/16859
> Mute This Topic: https://lists.openembedded.org/mt/109907401/6875888
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ross.burton@arm.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py
index fcb71246d9a..198426065b4 100644
--- a/bitbake/lib/bb/fetch2/wget.py
+++ b/bitbake/lib/bb/fetch2/wget.py
@@ -376,8 +376,8 @@  class Wget(FetchMethod):
             opener = urllib.request.build_opener(*handlers)
 
             try:
-                uri_base = ud.url.split(";")[0]
-                uri = "{}://{}{}".format(urllib.parse.urlparse(uri_base).scheme, ud.host, ud.path)
+                parts = urllib.parse.urlparse(ud.url.split(";")[0])
+                uri = "{}://{}{}".format(parts.scheme, parts.netloc, parts.path)
                 r = urllib.request.Request(uri)
                 r.get_method = lambda: "HEAD"
                 # Some servers (FusionForge, as used on Alioth) require that the