Message ID | CAAc_J+_bjkedrH1kj2+qHi=OiEnC9TPxCqpZAVSLiqBs7Xsp6g@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [RFC] bitbake: fetch2: Fix systemd template file parsing | expand |
Anything more I should do with this if no one has any comments? Do I need to convert this to a non-RFC PATCH? Thanks, Brennan
On Wed, 2024-10-02 at 08:42 -0700, Brennan Coslett via lists.openembedded.org wrote: > Anything more I should do with this if no one has any comments? Do I > need to convert this to a non-RFC PATCH? > It wasn't an easy patch to review without looking at the context of the file. The decodeurl piece looks ok. I'm more worried about the encodeurl part and am left wondering what other character encoding issues we might have there. Special casing %40 seems a bit worrying. I appreciate the patch solves your immediate problem with that character but I wonder what other issue remain waiting to catch us out. Cheers, Richard
The decodeurl part is the only part that was needed to actually solve the problem (I was able to verify that the missing files ended up in the archiver output dir, the encodeurl part was just to make the existing test format happy because it has asserts that the value was exactly identical. If you have thoughts on a better way to change that, idk if doing that substitution in the test before it checks for equivalency but somehow that feels more gross to me? It seems like maybe we should use the https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote safe arg instead and just set that to include basically everything when encoding local files? Thanks, Brennan On Wed, Oct 2, 2024 at 11:42 AM Richard Purdie < richard.purdie@linuxfoundation.org> wrote: > On Wed, 2024-10-02 at 08: 42 -0700, Brennan Coslett via lists. > openembedded. org wrote: > Anything more I should do with this if no one > has any comments? Do I > need to convert this to a non-RFC PATCH? > It > wasn't an easy patch to review > > > On Wed, 2024-10-02 at 08:42 -0700, Brennan Coslett vialists.openembedded.org wrote: > > Anything more I should do with this if no one has any comments? Do I > > need to convert this to a non-RFC PATCH? > > > > It wasn't an easy patch to review without looking at the context of the > file. The decodeurl piece looks ok. > > I'm more worried about the encodeurl part and am left wondering what > other character encoding issues we might have there. Special casing %40 > seems a bit worrying. > > I appreciate the patch solves your immediate problem with that > character but I wonder what other issue remain waiting to catch us out. > > Cheers, > > Richard > >
On Wed, 2024-10-02 at 11:56 -0500, Brennan Coslett wrote: > The decodeurl part is the only part that was needed to actually solve the problem > (I was able to verify that the missing files ended up in the archiver output dir, the > encodeurl part was just to make the existing test format happy because it has > asserts that the value was exactly identical. The test is correct in that what we put in and out of those functions is meant to match. If it doesn't that opens up other problems. > If you have thoughts on a better way to > change that, idk if doing that substitution in the test before it checks for equivalency > but somehow that feels more gross to me? It seems like maybe we should use the > https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote safe arg instead > and just set that to include basically everything when encoding local files? I suspect most references in our file urls are not encoded so it should probably be the other way and not be doing encoding on file url path components at all? I'm just working off distant memories of the code though so I'm not sure how right my instinct is :(. Cheers, Richard
diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py index ddee4400bb..3d6973a6e5 100644 --- a/bitbake/lib/bb/fetch2/__init__.py +++ b/bitbake/lib/bb/fetch2/__init__.py @@ -369,6 +369,13 @@ def decodeurl(url): path = location[locidx:] elif type.lower() == 'file': host = "" + # The regex we use above fails on file:// urls for systemd + # template files which generally follow the format of + # "<unit name>@.service" and split out the unit name + # into the user group, so manually put it back + if user: + location = "%s@%s" % (user, location) + user = None path = location else: host = location @@ -414,7 +421,14 @@ def encodeurl(decoded): # Standardise path to ensure comparisons work while '//' in path: path = path.replace("//", "/") - url.append("%s" % urllib.parse.quote(path)) + urllib_parsed = urllib.parse.urlparse(path).path + if type == "file": + # urllib.parse.urlparse automatically converts "@" to "%40" + # which then fails the bitbake self-tests (the rest of the + # stack was able to handle this fine) so convert it back + # after the fact. + urllib_parsed = urllib_parsed.replace('%40', '@') + url.append(urllib_parsed) if p: for parm in p: url.append(";%s=%s" % (parm, p[parm])) diff --git a/bitbake/lib/bb/tests/fetch.py b/bitbake/lib/bb/tests/fetch.py index 4e26e38811..dff7780e33 100644 --- a/bitbake/lib/bb/tests/fetch.py +++ b/bitbake/lib/bb/tests/fetch.py @@ -1390,6 +1390,7 @@ class URLHandle(unittest.TestCase): "file://somelocation;someparam=1": ('file', '', 'somelocation', '', '', {'someparam': '1'}), "https://somesite.com/somerepo.git;user=anyUser:idtoken=1234" : ('https', 'somesite.com', '/somerepo.git', '', '', {'user': 'anyUser:idtoken=1234'}), r'git://s.o-me_ONE:!#$%^&*()-_={}[]\|:?,.<>~`@ git.openembedded.org/bitbake;branch=main;protocol=https': ('git', '