Message ID | 20240227173527.726401-1-cordlandwehr@kde.org |
---|---|
State | New |
Headers | show |
Series | fetch2/gitsm.py: handle spaces in git repository paths | expand |
On Tue, 2024-02-27 at 18:35 +0100, Andreas Cord-Landwehr wrote: > Handle encoding of spaces for the obscure but real-world scenario of > having a git repository with submodules, where the urls to submodules > contain spaces, eg. https://example.com/foo/repo%20with%20space.git > > Git submodules are referenced with single escaping, yet the fetcher > requires double-escaping of spaces. > > Signed-off-by: Andreas Cord-Landwehr <cordlandwehr@kde.org> > --- > bitbake/lib/bb/fetch2/gitsm.py | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py > index f7f3af7212..e37be055eb 100644 > --- a/bitbake/lib/bb/fetch2/gitsm.py > +++ b/bitbake/lib/bb/fetch2/gitsm.py > @@ -23,6 +23,7 @@ import copy > import shutil > import tempfile > from bb.fetch2.git import Git > +from bb.fetch2 import URI > from bb.fetch2 import runfetchcmd > from bb.fetch2 import logger > from bb.fetch2 import Fetch > @@ -93,6 +94,11 @@ class GitSM(Git): > newud.path = os.path.normpath(os.path.join(newud.path, uris[m])) > uris[m] = Git._get_repo_url(self, newud) > > + # Double-encode before handing over to git to handle spaces in URLs > + double_encoded_uri = URI(uris[m]) > + double_encoded_uri.path = double_encoded_uri.path_quoted > + uris[m] = str(double_encoded_uri) > + > for module in submodules: > # Translate the module url into a SRC_URI Thanks for the patch, I think this may need some work though. Double encoding feels like a workaround for the issue and this probably needs a different fix where we only decode the data once. That is likely more invasive but ultimately more correct else I worry we'd have to triple encode second level submodules or other horrible things. Also is there a testcase we could add to bitbake-selftest to cover this? The fetcher is tricky to work on due to all the corner cases so we do try to cover them in the testsuite. Cheers, Richard
Hi Richard, thank you for the fast response! At the moment I only have access to a closed server (AzureDevOps based) which allows creation of repositories with spaces. For GitHub eg. I checked that they replace spaces with a hyphen and so avoid any problems. Probably a simple rewrite rule may work though...? But I do not have a stable server at hand that I could offer as a permanent source for tests. But I will check if I can find a solution. Regarding the issue itself. Already the git fetcher requires %20 spaces to be encoded as %2520 in SRC_URI statements. So I expected that URLs are always expected to be double encoded. Is that not the case? Best regards, Andreas Am 28. Februar 2024 10:21:19 MEZ schrieb Richard Purdie <richard.purdie@linuxfoundation.org>: >On Tue, 2024-02-27 at 18:35 +0100, Andreas Cord-Landwehr wrote: >> Handle encoding of spaces for the obscure but real-world scenario of >> having a git repository with submodules, where the urls to submodules >> contain spaces, eg. https://example.com/foo/repo%20with%20space.git >> >> Git submodules are referenced with single escaping, yet the fetcher >> requires double-escaping of spaces. >> >> Signed-off-by: Andreas Cord-Landwehr <cordlandwehr@kde.org> >> --- >> bitbake/lib/bb/fetch2/gitsm.py | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py >> index f7f3af7212..e37be055eb 100644 >> --- a/bitbake/lib/bb/fetch2/gitsm.py >> +++ b/bitbake/lib/bb/fetch2/gitsm.py >> @@ -23,6 +23,7 @@ import copy >> import shutil >> import tempfile >> from bb.fetch2.git import Git >> +from bb.fetch2 import URI >> from bb.fetch2 import runfetchcmd >> from bb.fetch2 import logger >> from bb.fetch2 import Fetch >> @@ -93,6 +94,11 @@ class GitSM(Git): >> newud.path = os.path.normpath(os.path.join(newud.path, uris[m])) >> uris[m] = Git._get_repo_url(self, newud) >> >> + # Double-encode before handing over to git to handle spaces in URLs >> + double_encoded_uri = URI(uris[m]) >> + double_encoded_uri.path = double_encoded_uri.path_quoted >> + uris[m] = str(double_encoded_uri) >> + >> for module in submodules: >> # Translate the module url into a SRC_URI > >Thanks for the patch, I think this may need some work though. > >Double encoding feels like a workaround for the issue and this probably >needs a different fix where we only decode the data once. That is >likely more invasive but ultimately more correct else I worry we'd have >to triple encode second level submodules or other horrible things. > >Also is there a testcase we could add to bitbake-selftest to cover >this? The fetcher is tricky to work on due to all the corner cases so >we do try to cover them in the testsuite. > >Cheers, > >Richard > > >
diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py index f7f3af7212..e37be055eb 100644 --- a/bitbake/lib/bb/fetch2/gitsm.py +++ b/bitbake/lib/bb/fetch2/gitsm.py @@ -23,6 +23,7 @@ import copy import shutil import tempfile from bb.fetch2.git import Git +from bb.fetch2 import URI from bb.fetch2 import runfetchcmd from bb.fetch2 import logger from bb.fetch2 import Fetch @@ -93,6 +94,11 @@ class GitSM(Git): newud.path = os.path.normpath(os.path.join(newud.path, uris[m])) uris[m] = Git._get_repo_url(self, newud) + # Double-encode before handing over to git to handle spaces in URLs + double_encoded_uri = URI(uris[m]) + double_encoded_uri.path = double_encoded_uri.path_quoted + uris[m] = str(double_encoded_uri) + for module in submodules: # Translate the module url into a SRC_URI
Handle encoding of spaces for the obscure but real-world scenario of having a git repository with submodules, where the urls to submodules contain spaces, eg. https://example.com/foo/repo%20with%20space.git Git submodules are referenced with single escaping, yet the fetcher requires double-escaping of spaces. Signed-off-by: Andreas Cord-Landwehr <cordlandwehr@kde.org> --- bitbake/lib/bb/fetch2/gitsm.py | 6 ++++++ 1 file changed, 6 insertions(+)