[1/2] fetch2: Revert decodeurl() to not use the URI class

Message ID	20250228231501.3905069-1-pkj@axis.com
State	New
Headers	show Return-Path: <peter.kjellerstedt@axis.com> ip: 40.107.105.67, mailfrom: peter.kjellerstedt@axis.com) Received-SPF: Pass (protection.outlook.com: domain of axis.com designates 195.60.68.100 as permitted sender) receiver=protection.outlook.com; client-ip=195.60.68.100; helo=mail.axis.com; pr=C From: Peter Kjellerstedt <pkj@axis.com> To: <bitbake-devel@lists.openembedded.org> Subject: [PATCH 1/2] fetch2: Revert decodeurl() to not use the URI class Date: Sat, 1 Mar 2025 00:15:00 +0100 Message-ID: <20250228231501.3905069-1-pkj@axis.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain
Series	[1/2] fetch2: Revert decodeurl() to not use the URI class \| expand [1/2] fetch2: Revert decodeurl() to not use the URI class [2/2] tests/fetch: Verify regular expression "URLs" that contain a ?

Message ID

20250228231501.3905069-1-pkj@axis.com

State

New

Headers

Received-SPF: Pass (protection.outlook.com: domain of axis.com designates
 195.60.68.100 as permitted sender) receiver=protection.outlook.com;
 client-ip=195.60.68.100; helo=mail.axis.com; pr=C
From: Peter Kjellerstedt <pkj@axis.com>
To: <bitbake-devel@lists.openembedded.org>
Subject: [PATCH 1/2] fetch2: Revert decodeurl() to not use the URI class
Date: Sat, 1 Mar 2025 00:15:00 +0100
Message-ID: <20250228231501.3905069-1-pkj@axis.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Feb 2025 23:15:03.5420
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 3cb61b74-d357-40a1-d1ab-08dd584dbb60
X-MS-Exchange-CrossTenant-Id: 78703d3c-b907-432f-b066-88f7af9ca3af
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=78703d3c-b907-432f-b066-88f7af9ca3af;Ip=[195.60.68.100];Helo=[mail.axis.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	AMS0EPF000001AF.eurprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR02MB9063
List-Id: <bitbake-devel.lists.openembedded.org>
X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by
 aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for
 <bitbake-devel@lists.openembedded.org>; Fri, 28 Feb 2025 23:15:09 -0000
X-Groupsio-URL: https://lists.openembedded.org/g/bitbake-devel/message/17364

Series

[1/2] fetch2: Revert decodeurl() to not use the URI class | expand

Commit Message

Peter Kjellerstedt Feb. 28, 2025, 11:15 p.m. UTC

This reverts the recent change of decodeurl() to use the URI class to
parse the URL instead of doing it itself. While reusing code is
generally a good idea, using urllib.parse.urlparse() (which the URI
class does) to parse the regular expression "URLs" that are used in
PREMIRRORS and MIRRORS does not work. A regular expression URL
containing https?://... would be silently ignored, while a URL using a
negative lookahead such as git://(?!internal\.git\.server).*/.* would
result in a cryptic error:

  Exception: re.error: missing ), unterminated subpattern at position 0

The problem is that urllib.parse.urlparse() treats the ? as the start of
URL parameters and thus stops parsing whatever part of the URL it was
parsing.

Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
---
 bitbake/lib/bb/fetch2/__init__.py | 46 +++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 3 deletions(-)

Comments

Richard Purdie March 2, 2025, 8:21 a.m. UTC | #1

On Sat, 2025-03-01 at 00:15 +0100, Peter Kjellerstedt via lists.openembedded.org wrote:
> This reverts the recent change of decodeurl() to use the URI class to
> parse the URL instead of doing it itself. While reusing code is
> generally a good idea, using urllib.parse.urlparse() (which the URI
> class does) to parse the regular expression "URLs" that are used in
> PREMIRRORS and MIRRORS does not work. A regular expression URL
> containing https?://... would be silently ignored, while a URL using a
> negative lookahead such as git://(?!internal\.git\.server).*/.* would
> result in a cryptic error:
> 
>   Exception: re.error: missing ), unterminated subpattern at position 0
> 
> The problem is that urllib.parse.urlparse() treats the ? as the start of
> URL parameters and thus stops parsing whatever part of the URL it was
> parsing.
> 
> Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> ---
>  bitbake/lib/bb/fetch2/__init__.py | 46 +++++++++++++++++++++++++++++--
>  1 file changed, 43 insertions(+), 3 deletions(-)

I'm forever coming under pressure to take a patch like the one this
reverts, it is the first thing most people new to the fetcher dislike.
I can understand why.

I wonder if we should restore the function but only use it for parsing
mirror urls, and comment it accordingly?

Cheers,

Richard

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 93fe012ec3..464b9ee4c3 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -353,9 +353,49 @@  def decodeurl(url):
     user, password, parameters).
     """
 
-    uri = URI(url)
-    path = uri.path if uri.path else "/"
-    return uri.scheme, uri.hostport, path, uri.username, uri.password, uri.params
+    m = re.compile('(?P<type>[^:]*)://((?P<user>[^/;]+)@)?(?P<location>[^;]+)(;(?P<parm>.*))?').match(url)
+    if not m:
+        raise MalformedUrl(url)
+
+    type = m.group('type')
+    location = m.group('location')
+    if not location:
+        raise MalformedUrl(url)
+    user = m.group('user')
+    parm = m.group('parm')
+
+    locidx = location.find('/')
+    if locidx != -1 and type.lower() != 'file':
+        host = location[:locidx]
+        path = location[locidx:]
+    elif type.lower() == 'file':
+        host = ""
+        path = location
+        if user:
+            path = user + '@' + path
+            user = ""
+    else:
+        host = location
+        path = "/"
+    if user:
+        m = re.compile('(?P<user>[^:]+)(:?(?P<pswd>.*))').match(user)
+        if m:
+            user = m.group('user')
+            pswd = m.group('pswd')
+    else:
+        user = ''
+        pswd = ''
+
+    p = collections.OrderedDict()
+    if parm:
+        for s in parm.split(';'):
+            if s:
+                if not '=' in s:
+                    raise MalformedUrl(url, "The URL: '%s' is invalid: parameter %s does not specify a value (missing '=')" % (url, s))
+                s1, s2 = s.split('=', 1)
+                p[s1] = s2
+
+    return type, host, urllib.parse.unquote(path), user, pswd, p
 
 def encodeurl(decoded):
     """Encodes a URL from tokens (scheme, network location, path,

[1/2] fetch2: Revert decodeurl() to not use the URI class

Commit Message

Comments

Patch