diff mbox series

URL encoding/decoding bug with files like "serialgetty@.service" or "sshd@.service" with archiver.bbclass #bitbake #kirkstone #poky

Message ID bhfQ.1727300278439114446.Ff3Z@lists.openembedded.org
State New
Headers show
Series URL encoding/decoding bug with files like "serialgetty@.service" or "sshd@.service" with archiver.bbclass #bitbake #kirkstone #poky | expand

Commit Message

Brennan Coslett Sept. 25, 2024, 9:37 p.m. UTC
Hey All,

I believe there is a bug with how bitbake handles systemd template files when archiving source, as the files are ending up in the rootfs. I was working on getting a script working to package up the tmp/deploy/source directory and noticed that while the archiver was creating tmp/deploy/sources/aarch64-shasta-linux/systemd-serialgetty-1.0-r5/ there wasn't anything inside it. I think this is the base case for the bug as it has a SRC_URI of "file://serial-getty@.service" vs some of the other packages which include these template systemd files that end in @.service contain a ton of other files (I verified the same issue shows up with openssh).

I assume that it will happen with all of these files:
```
$ find layers/ meta-* -name '*@*'
layers/meta-openembedded/meta-networking/recipes-support/nuttcp/nuttcp/nuttcp@.service
layers/meta-openembedded/meta-networking/recipes-support/openvpn/openvpn/openvpn@.service
layers/poky/meta/recipes-connectivity/openssh/openssh/sshd@.service
layers/poky/meta/recipes-connectivity/dhcpcd/files/dhcpcd@.service
layers/poky/meta/recipes-connectivity/ppp/ppp/ppp@.service
layers/poky/meta/recipes-core/dropbear/dropbear/dropbear@.service
layers/poky/meta/recipes-core/systemd/systemd-serialgetty/serial-getty@.service
```

I was able to add some additional debug prints in archiver.bbclass and tried to dig further but the insides of `fetch2` were a bit over my head.
```
```

This was tested with kirkstone, (this commit specifically e938b18b5342bd28eadb44ad39dbf1f5cf5be09b):
```
DEBUG: systemd-serialgetty-1.0-r5 do_ar_original: BRENNAN full SRC_URI ['file://serial-getty@.service']
DEBUG: systemd-serialgetty-1.0-r5 do_ar_original: BRENNAN ['file://.service']
```
It then can't find that file and so then the archiver doesn't package anything up.

Hopefully bitbake-devel is the right mailing list for folks with more knowledge of this stuff?

Thanks,
Brennan Coslett

Comments

Alexander Kanavin Sept. 26, 2024, 9:47 a.m. UTC | #1
On Wed, 25 Sept 2024 at 23:37, Brennan Coslett via
lists.openembedded.org
<brennan.coslett=bluerivertech.com@lists.openembedded.org> wrote:
> It then can't find that file and so then the archiver doesn't package anything up.
>
> Hopefully bitbake-devel is the right mailing list for folks with more knowledge of this stuff?

Not really. Archiver is implemented in oe-core, and so oe-core list is
a better place. It also helps if you can provide specific numbered
steps to reproduce the problem on poky master.

Alex
Brennan Coslett Sept. 26, 2024, 2:57 p.m. UTC | #2
I think the bug is in `bb.fetch.decodeurl`, I just noticed it because of the archiver.
@Alex if that means it should still move over to the oe-core list, let me know and I can do that.

Steps to replicate:
1. Checkout poky master (just replicated with 8634e46b4040b6008410b6d77fecb5cbaec7e90e)
2. source oe-init-build-env
3. append the following to the auto-generated local.conf:
+ require conf/distro/include/init-manager-systemd.inc
+ INHERIT += "archiver"
+ ARCHIVER_MODE[src] = "original"
4. Run "bitbake systemd-serialgetty", the source dir will be created but it wont contain anything:
brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$ ls tmp/deploy/sources/x86_64-poky-linux/systemd-serialgetty-1.0-r0/
brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$

I think something in the regex inside the `decodeurl` function in `bitbake/lib/bb/fetch2/__init__.py` needs to change to stop it from
parsing out `<name>@`  in `<name>@.service` into the `user` field or maybe it needs additional logic inside that function when its a
local file to put the parsed out `user` group back into the `location` group? This appears to fix it but i'm not sure if this is actually valid,
if it is I can submit it as an actual patch. I think there should probably be a corresponding addition of a fetch test around these types of
files?

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index ddee4400bb..2ea834416b 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -369,6 +369,11 @@ def decodeurl(url):
path = location[locidx:]
elif type.lower() == 'file':
host = ""
+        # parsing out a user doesn't make sense for
+        # local files?
+        if user:
+            location = "%s@%s" % (user, location)
+            user = None
path = location
else:
host = location

I was able to see the `serialgetty@.service` file in the output folder:
brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$ bitbake systemd-serialgetty
...
NOTE: Tasks Summary: Attempted 969 tasks of which 871 didn't need to be rerun and all succeeded.
Summary: There was 1 WARNING message.
brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$ ls tmp/deploy/sources/x86_64-poky-linux/systemd-serialgetty-1.0-r0/
serial-getty@.service

Thanks,
Brennan Coslett
Alexander Kanavin Sept. 26, 2024, 3:03 p.m. UTC | #3
On Thu, 26 Sept 2024 at 16:57, Brennan Coslett via
lists.openembedded.org
<brennan.coslett=bluerivertech.com@lists.openembedded.org> wrote:
>
> I think the bug is in `bb.fetch.decodeurl`, I just noticed it because of the archiver.
> @Alex if that means it should still move over to the oe-core list, let me know and I can do that.

Right, let's stay here.

> Steps to replicate:
> 1. Checkout poky master (just replicated with 8634e46b4040b6008410b6d77fecb5cbaec7e90e)
> 2. source oe-init-build-env
> 3. append the following to the auto-generated local.conf:
>     + require conf/distro/include/init-manager-systemd.inc
>     + INHERIT += "archiver"
>     + ARCHIVER_MODE[src] = "original"
> 4. Run "bitbake systemd-serialgetty", the source dir will be created but it wont contain anything:
>     brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$ ls tmp/deploy/sources/x86_64-poky-linux/systemd-serialgetty-1.0-r0/
>     brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$
>
> I think something in the regex inside the `decodeurl` function in `bitbake/lib/bb/fetch2/__init__.py` needs to change to stop it from
> parsing out `<name>@`  in `<name>@.service` into the `user` field or maybe it needs additional logic inside that function when its a
> local file to put the parsed out `user` group back into the `location` group? This appears to fix it but i'm not sure if this is actually valid,
> if it is I can submit it as an actual patch. I think there should probably be a corresponding addition of a fetch test around these types of
> files?

Let's do test driven development here :) There are tests for fetchers
in bitbake/lib/bb/tests/fetch.py - if you can check and add a test
that exposes this issue (e.g. fails without the fix and passes with
it), then we can discuss what the best fix would be. Run them with
bitbake-selftest.

Alex
Brennan Coslett Sept. 26, 2024, 3:31 p.m. UTC | #4
Okay I added an additional test case to URLHandle which then tests encoding
and decoding the original patch I sent worked for decoding
but in the encoding test it converted "@" to "%40" which I dont think we
need for local files. The test-case looks like:

diff --git a/bitbake/lib/bb/tests/fetch.py b/bitbake/lib/bb/tests/fetch.py
index 4e26e38811..dff7780e33 100644
--- a/bitbake/lib/bb/tests/fetch.py
+++ b/bitbake/lib/bb/tests/fetch.py
@@ -1390,6 +1390,7 @@ class URLHandle(unittest.TestCase):
        "file://somelocation;someparam=1": ('file', '', 'somelocation', '',
'', {'someparam': '1'}),
        "https://somesite.com/somerepo.git;user=anyUser:idtoken=1234" :
('https', 'somesite.com', '/somerepo.git', '', '', {'user':
'anyUser:idtoken=1234'}),
        r'git://s.o-me_ONE:!#$%^&*()-_={}[]\|:?,.<>~`@
git.openembedded.org/bitbake;branch=main;protocol=https': ('git', '
git.openembedded.org', '/bitbake', 's.o-me_ONE',
r'!#$%^&*()-_={}[]\|:?,.<>~`', {'branch': 'main', 'prot
ocol' : 'https'}),
+       "file://serialgetty@.service" : ('file', '', 'serialgetty@.service',
'', '', {}),
     }
     # we require a pathname to encodeurl but users can still pass such
urls to
     # decodeurl and we need to handle them

So adding the encodeurl change I needed:

diff --git a/bitbake/lib/bb/fetch2/__init__.py
b/bitbake/lib/bb/fetch2/__init__.py
index ddee4400bb..4ea561e1f8 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -369,6 +369,11 @@ def decodeurl(url):
         path = location[locidx:]
     elif type.lower() == 'file':
         host = ""
+        # parsing out a user doesn't make sense for
+        # local files?
+        if user:
+            location = "%s@%s" % (user, location)
+            user = None
         path = location
     else:
         host = location
@@ -414,7 +419,12 @@ def encodeurl(decoded):
         # Standardise path to ensure comparisons work
         while '//' in path:
             path = path.replace("//", "/")
-        url.append("%s" % urllib.parse.quote(path))
+        urllib_parsed = urllib.parse.urlparse(path).path
+        if type == "file":
+            # Local systemd template file paths don't need the
+            # escaped character.
+            urllib_parsed = urllib_parsed.replace('%40', '@')
+        url.append(urllib_parsed)
     if p:
         for parm in p:
             url.append(";%s=%s" % (parm, p[parm]))
- Brennan

On Thu, Sep 26, 2024 at 10:03 AM Alexander Kanavin <alex.kanavin@gmail.com>
wrote:

> On Thu, 26 Sept 2024 at 16:57, Brennan Coslett via
> lists.openembedded.org
> <brennan.coslett=bluerivertech.com@lists.openembedded.org> wrote:
> >
> > I think the bug is in `bb.fetch.decodeurl`, I just noticed it because of
> the archiver.
> > @Alex if that means it should still move over to the oe-core list, let
> me know and I can do that.
>
> Right, let's stay here.
>
> > Steps to replicate:
> > 1. Checkout poky master (just replicated with
> 8634e46b4040b6008410b6d77fecb5cbaec7e90e)
> > 2. source oe-init-build-env
> > 3. append the following to the auto-generated local.conf:
> >     + require conf/distro/include/init-manager-systemd.inc
> >     + INHERIT += "archiver"
> >     + ARCHIVER_MODE[src] = "original"
> > 4. Run "bitbake systemd-serialgetty", the source dir will be created but
> it wont contain anything:
> >     brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$ ls
> tmp/deploy/sources/x86_64-poky-linux/systemd-serialgetty-1.0-r0/
> >     brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$
> >
> > I think something in the regex inside the `decodeurl` function in
> `bitbake/lib/bb/fetch2/__init__.py` needs to change to stop it from
> > parsing out `<name>@`  in `<name>@.service` into the `user` field or
> maybe it needs additional logic inside that function when its a
> > local file to put the parsed out `user` group back into the `location`
> group? This appears to fix it but i'm not sure if this is actually valid,
> > if it is I can submit it as an actual patch. I think there should
> probably be a corresponding addition of a fetch test around these types of
> > files?
>
> Let's do test driven development here :) There are tests for fetchers
> in bitbake/lib/bb/tests/fetch.py - if you can check and add a test
> that exposes this issue (e.g. fails without the fix and passes with
> it), then we can discuss what the best fix would be. Run them with
> bitbake-selftest.
>
> Alex
>
Alexander Kanavin Sept. 27, 2024, 10:53 a.m. UTC | #5
Can you turn these into patches, and send them as RFC?

Alex

On Thu, 26 Sept 2024 at 17:31, Brennan Coslett
<brennan.coslett@bluerivertech.com> wrote:
>
> Okay I added an additional test case to URLHandle which then tests encoding and decoding the original patch I sent worked for decoding
> but in the encoding test it converted "@" to "%40" which I dont think we need for local files. The test-case looks like:
>
> diff --git a/bitbake/lib/bb/tests/fetch.py b/bitbake/lib/bb/tests/fetch.py
> index 4e26e38811..dff7780e33 100644
> --- a/bitbake/lib/bb/tests/fetch.py
> +++ b/bitbake/lib/bb/tests/fetch.py
> @@ -1390,6 +1390,7 @@ class URLHandle(unittest.TestCase):
>         "file://somelocation;someparam=1": ('file', '', 'somelocation', '', '', {'someparam': '1'}),
>         "https://somesite.com/somerepo.git;user=anyUser:idtoken=1234" : ('https', 'somesite.com', '/somerepo.git', '', '', {'user': 'anyUser:idtoken=1234'}),
>         r'git://s.o-me_ONE:!#$%^&*()-_={}[]\|:?,.<>~`@git.openembedded.org/bitbake;branch=main;protocol=https': ('git', 'git.openembedded.org', '/bitbake', 's.o-me_ONE', r'!#$%^&*()-_={}[]\|:?,.<>~`', {'branch': 'main', 'prot
> ocol' : 'https'}),
> +       "file://serialgetty@.service" : ('file', '', 'serialgetty@.service', '', '', {}),
>      }
>      # we require a pathname to encodeurl but users can still pass such urls to
>      # decodeurl and we need to handle them
>
> So adding the encodeurl change I needed:
>
> diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
> index ddee4400bb..4ea561e1f8 100644
> --- a/bitbake/lib/bb/fetch2/__init__.py
> +++ b/bitbake/lib/bb/fetch2/__init__.py
> @@ -369,6 +369,11 @@ def decodeurl(url):
>          path = location[locidx:]
>      elif type.lower() == 'file':
>          host = ""
> +        # parsing out a user doesn't make sense for
> +        # local files?
> +        if user:
> +            location = "%s@%s" % (user, location)
> +            user = None
>          path = location
>      else:
>          host = location
> @@ -414,7 +419,12 @@ def encodeurl(decoded):
>          # Standardise path to ensure comparisons work
>          while '//' in path:
>              path = path.replace("//", "/")
> -        url.append("%s" % urllib.parse.quote(path))
> +        urllib_parsed = urllib.parse.urlparse(path).path
> +        if type == "file":
> +            # Local systemd template file paths don't need the
> +            # escaped character.
> +            urllib_parsed = urllib_parsed.replace('%40', '@')
> +        url.append(urllib_parsed)
>      if p:
>          for parm in p:
>              url.append(";%s=%s" % (parm, p[parm]))
> - Brennan
>
> On Thu, Sep 26, 2024 at 10:03 AM Alexander Kanavin <alex.kanavin@gmail.com> wrote:
>>
>> On Thu, 26 Sept 2024 at 16:57, Brennan Coslett via
>> lists.openembedded.org
>> <brennan.coslett=bluerivertech.com@lists.openembedded.org> wrote:
>> >
>> > I think the bug is in `bb.fetch.decodeurl`, I just noticed it because of the archiver.
>> > @Alex if that means it should still move over to the oe-core list, let me know and I can do that.
>>
>> Right, let's stay here.
>>
>> > Steps to replicate:
>> > 1. Checkout poky master (just replicated with 8634e46b4040b6008410b6d77fecb5cbaec7e90e)
>> > 2. source oe-init-build-env
>> > 3. append the following to the auto-generated local.conf:
>> >     + require conf/distro/include/init-manager-systemd.inc
>> >     + INHERIT += "archiver"
>> >     + ARCHIVER_MODE[src] = "original"
>> > 4. Run "bitbake systemd-serialgetty", the source dir will be created but it wont contain anything:
>> >     brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$ ls tmp/deploy/sources/x86_64-poky-linux/systemd-serialgetty-1.0-r0/
>> >     brennan.coslett@lpgt-bcosl001:~/missing-source/poky/build$
>> >
>> > I think something in the regex inside the `decodeurl` function in `bitbake/lib/bb/fetch2/__init__.py` needs to change to stop it from
>> > parsing out `<name>@`  in `<name>@.service` into the `user` field or maybe it needs additional logic inside that function when its a
>> > local file to put the parsed out `user` group back into the `location` group? This appears to fix it but i'm not sure if this is actually valid,
>> > if it is I can submit it as an actual patch. I think there should probably be a corresponding addition of a fetch test around these types of
>> > files?
>>
>> Let's do test driven development here :) There are tests for fetchers
>> in bitbake/lib/bb/tests/fetch.py - if you can check and add a test
>> that exposes this issue (e.g. fails without the fix and passes with
>> it), then we can discuss what the best fix would be. Run them with
>> bitbake-selftest.
>>
>> Alex
diff mbox series

Patch

diff --git a/meta/classes/archiver.bbclass b/meta/classes/archiver.bbclass
index 4a5865d7b5..daf7b4605b 100644
--- a/meta/classes/archiver.bbclass
+++ b/meta/classes/archiver.bbclass
@@ -181,6 +181,7 @@  python do_ar_original() {
ar_outdir = d.getVar('ARCHIVER_OUTDIR')
bb.note('Archiving the original source...')
urls = d.getVar("SRC_URI").split()
+    bb.debug(1, "BRENNAN full SRC_URI %s" % urls)
# destsuffix (git fetcher) and subdir (everything else) are allowed to be
# absolute paths (for example, destsuffix=${S}/foobar).
# That messes with unpacking inside our tmpdir below, because the fetchers
@@ -198,7 +199,7 @@  python do_ar_original() {
del decoded[5][param]
encoded = bb.fetch2.encodeurl(decoded)
urls[i] = encoded
-
+    bb.debug(1, "BRENNAN %s" % urls)
# Cleanup SRC_URI before call bb.fetch2.Fetch() since now SRC_URI is in the
# variable "urls", otherwise there might be errors like:
# The SRCREV_FORMAT variable must be set when multiple SCMs are used