diff mbox series

[RFC] bitbake: fetch2: Fix systemd template file parsing

Message ID CAAc_J+_bjkedrH1kj2+qHi=OiEnC9TPxCqpZAVSLiqBs7Xsp6g@mail.gmail.com
State New
Headers show
Series [RFC] bitbake: fetch2: Fix systemd template file parsing | expand

Commit Message

Brennan Coslett Sept. 27, 2024, 4:09 p.m. UTC
From: Brennan Coslett <brennan.coslett@bluerivertech.com>
Date: Fri, Sep 27, 2024 at 10:22 AM
Subject: [RFC PATCH] bitbake: fetch2: Fix systemd template file parsing
To: <bitbake-devel@lists.openembedded.org>
Cc: Brennan Coslett <brennan.coslett@bluerivertech.com>


I noticed that the archiver was failing to collect any source from
the systemd-serialgetty package which has a very simple SRC_URI:

   SRC_URI = "file://serial-getty@.service"

The regex used by fetch2.decodeurl() was treating the unit name as
a user and then returning back a file path of ".service" which the
archiver then failed to find, this adds additional logic to
reassembly the full file name for file type urls. It also adds a test
to bb.tests.fetch for these types of files which only passes with this
change, I had to add additional logic to encodeurl keep the tests happy
as well.

Signed-off-by: Brennan Coslett <brennan.coslett@bluerivertech.com>
---
 bitbake/lib/bb/fetch2/__init__.py | 16 +++++++++++++++-
 bitbake/lib/bb/tests/fetch.py     |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

git.openembedded.org', '/bitbake', 's.o-me_ONE',
r'!#$%^&*()-_={}[]\|:?,.<>~`', {'branch': 'main', 'protocol' : 'https'}),
+       "file://serialgetty@.service" : ('file', '', 'serialgetty@.service',
'', '', {}),
     }
     # we require a pathname to encodeurl but users can still pass such
urls to
     # decodeurl and we need to handle them

Comments

Brennan Coslett Oct. 2, 2024, 3:42 p.m. UTC | #1
Anything more I should do with this if no one has any comments? Do I need to convert this to a non-RFC PATCH?

Thanks,
Brennan
Richard Purdie Oct. 2, 2024, 4:42 p.m. UTC | #2
On Wed, 2024-10-02 at 08:42 -0700, Brennan Coslett via
lists.openembedded.org wrote:
> Anything more I should do with this if no one has any comments? Do I
> need to convert this to a non-RFC PATCH?
>  

It wasn't an easy patch to review without looking at the context of the
file. The decodeurl piece looks ok.

I'm more worried about the encodeurl part and am left wondering what
other character encoding issues we might have there. Special casing %40
seems a bit worrying.

I appreciate the patch solves your immediate problem with that
character but I wonder what other issue remain waiting to catch us out.

Cheers,

Richard
Brennan Coslett Oct. 2, 2024, 4:56 p.m. UTC | #3
The decodeurl part is the only part that was needed to actually solve the
problem
(I was able to verify that the missing files ended up in the archiver
output dir, the
encodeurl part was just to make the existing test format happy because it
has
asserts that the value was exactly identical. If you have thoughts on a
better way to
change that, idk if doing that substitution in the test before it checks
for equivalency
but somehow that feels more gross to me? It seems like maybe we should use
the
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote safe
arg instead
and just set that to include basically everything when encoding local files?

Thanks,
Brennan

On Wed, Oct 2, 2024 at 11:42 AM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Wed, 2024-10-02 at 08: 42 -0700, Brennan Coslett via lists.
> openembedded. org wrote: > Anything more I should do with this if no one
> has any comments? Do I > need to convert this to a non-RFC PATCH? >   It
> wasn't an easy patch to review
> 
>
> On Wed, 2024-10-02 at 08:42 -0700, Brennan Coslett vialists.openembedded.org wrote:
> > Anything more I should do with this if no one has any comments? Do I
> > need to convert this to a non-RFC PATCH?
> >
>
> It wasn't an easy patch to review without looking at the context of the
> file. The decodeurl piece looks ok.
>
> I'm more worried about the encodeurl part and am left wondering what
> other character encoding issues we might have there. Special casing %40
> seems a bit worrying.
>
> I appreciate the patch solves your immediate problem with that
> character but I wonder what other issue remain waiting to catch us out.
>
> Cheers,
>
> Richard
>
>
Richard Purdie Oct. 2, 2024, 5:02 p.m. UTC | #4
On Wed, 2024-10-02 at 11:56 -0500, Brennan Coslett wrote:
> The decodeurl part is the only part that was needed to actually solve the problem
> (I was able to verify that the missing files ended up in the archiver output dir, the
> encodeurl part was just to make the existing test format happy because it has
> asserts that the value was exactly identical.

The test is correct in that what we put in and out of those functions
is meant to match. If it doesn't that opens up other problems.

>  If you have thoughts on a better way to
> change that, idk if doing that substitution in the test before it checks for equivalency
> but somehow that feels more gross to me? It seems like maybe we should use the
> https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote safe arg instead
> and just set that to include basically everything when encoding local files?

I suspect most references in our file urls are not encoded so it should
probably be the other way and not be doing encoding on file url path
components at all?

I'm just working off distant memories of the code though so I'm not
sure how right my instinct is :(.

Cheers,

Richard
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/__init__.py
b/bitbake/lib/bb/fetch2/__init__.py
index ddee4400bb..3d6973a6e5 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -369,6 +369,13 @@  def decodeurl(url):
         path = location[locidx:]
     elif type.lower() == 'file':
         host = ""
+        # The regex we use above fails on file:// urls for systemd
+        # template files which generally follow the format of
+        # "<unit name>@.service" and split out the unit name
+        # into the user group, so manually put it back
+        if user:
+            location = "%s@%s" % (user, location)
+            user = None
         path = location
     else:
         host = location
@@ -414,7 +421,14 @@  def encodeurl(decoded):
         # Standardise path to ensure comparisons work
         while '//' in path:
             path = path.replace("//", "/")
-        url.append("%s" % urllib.parse.quote(path))
+        urllib_parsed = urllib.parse.urlparse(path).path
+        if type == "file":
+            # urllib.parse.urlparse automatically converts "@" to "%40"
+            # which then fails the bitbake self-tests (the rest of the
+            # stack was able to handle this fine) so convert it back
+            # after the fact.
+            urllib_parsed = urllib_parsed.replace('%40', '@')
+        url.append(urllib_parsed)
     if p:
         for parm in p:
             url.append(";%s=%s" % (parm, p[parm]))
diff --git a/bitbake/lib/bb/tests/fetch.py b/bitbake/lib/bb/tests/fetch.py
index 4e26e38811..dff7780e33 100644
--- a/bitbake/lib/bb/tests/fetch.py
+++ b/bitbake/lib/bb/tests/fetch.py
@@ -1390,6 +1390,7 @@  class URLHandle(unittest.TestCase):
        "file://somelocation;someparam=1": ('file', '', 'somelocation', '',
'', {'someparam': '1'}),
        "https://somesite.com/somerepo.git;user=anyUser:idtoken=1234" :
('https', 'somesite.com', '/somerepo.git', '', '', {'user':
'anyUser:idtoken=1234'}),
        r'git://s.o-me_ONE:!#$%^&*()-_={}[]\|:?,.<>~`@
git.openembedded.org/bitbake;branch=main;protocol=https': ('git', '