Message ID | a7e199839adbd9e003c04d1bd021f5f0d8f2cc3c.camel@mercedes-benz.com |
---|---|
State | New |
Headers | show |
Series | wget: Add CHECKMD_wget option | expand |
On Mon, 2022-12-19 at 08:55 +0000, alp.oezmert via lists.openembedded.org wrote: > `CHECKCMD_wget` defines a command that the wget fetcher module will execute > instead of the built-in HTTP client when checking if URLs exist. This says what the patch does, but not why. I can read the patch and see what the code change is but I can't know why we need to so this information is much more important for the commit message. > This > allows to completely match the behavior of the wget fetcher including proxy > and authentication. We're changing the behaviour of the wget fetcher to match the wget fetcher? This doesn't make any sense. > By default, this option is empty, and wget fetcher behavior is not changed. Why do we need this? Why can't the HTTP client be used? What this appears to do is add two different code paths meaning we need to double our testing. I really don't want to do that. Can't the inbuilt http client code be fixed to work with whatever auth/proxy issue I'm guessing you're running into? Cheers, Richard
Hello all! On Mon, 2022-12-19 at 10:58 +0000, Richard Purdie wrote: > > On Mon, 2022-12-19 at 08:55 +0000, alp.oezmert via > lists.openembedded.org wrote: > > `CHECKCMD_wget` defines a command that the wget fetcher module will > > execute > > instead of the built-in HTTP client when checking if URLs exist. > > This says what the patch does, but not why. I can read the patch and > see what the code change is but I can't know why we need to so this > information is much more important for the commit message. > > > This > > allows to completely match the behavior of the wget fetcher > > including proxy > > and authentication. > > We're changing the behaviour of the wget fetcher to match the wget > fetcher? This doesn't make any sense. The proposed change adds an option to change behavior of wget fetcher module `fetch2/wget.py` during checking step (`Wget.checkstatus` function) so it can match the fetching behavior that uses `.wgetrc` . > > By default, this option is empty, and wget fetcher behavior is not > > changed. > > Why do we need this? Why can't the HTTP client be used? During downloading, the `wget` command is used and this includes the configuration in .wgetrc. During the checking that precedes downloading, the built-in HTTP-client is used but it doesn't support all configurations of `wget`, for example authentication with client certificates or proxy settings. > What this appears to do is add two different code paths meaning we > need > to double our testing. I really don't want to do that. The change contains two new test cases that pass: ``` ./bin/bitbake-selftest -k test_wget_checkstatus_external -v [...] test_wget_checkstatus_external_auto (bb.tests.fetch.FetchCheckStatusTest) ... ok test_wget_checkstatus_external_wget (bb.tests.fetch.FetchCheckStatusTest) ... ok ---------------------------------------------------------------------- Ran 2 tests in 13.510s OK ``` > Can't the inbuilt http client code be fixed to work with whatever > auth/proxy issue I'm guessing you're running into? As far as I can tell, not on the short term. > Cheers, > > Richard Let me know if you would like to have changes to the commit message. Best regards, Alp -- Dr.-Ing. Alp Özmert | alp.oezmert@mercedes-benz.com | Expert Software Engineer | MBition GmbH | Dovestraße 1, 10587 Berlin, Germany Geschäftsführer/Board of Directors: Frank Preidel, Steven Bentley, Daniel Burk, Registergericht/Court of Registry: Amtsgericht Charlottenburg HRB-Nr./Commercial Register No.: 188241 B If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.
On Mon, 2022-12-19 at 13:44 +0000, alp.oezmert@mercedes-benz.com wrote: > On Mon, 2022-12-19 at 10:58 +0000, Richard Purdie wrote: > > > > On Mon, 2022-12-19 at 08:55 +0000, alp.oezmert via > > lists.openembedded.org wrote: > > > `CHECKCMD_wget` defines a command that the wget fetcher module will > > > execute > > > instead of the built-in HTTP client when checking if URLs exist. > > > > This says what the patch does, but not why. I can read the patch and > > see what the code change is but I can't know why we need to so this > > information is much more important for the commit message. > > > > > This > > > allows to completely match the behavior of the wget fetcher > > > including proxy > > > and authentication. > > > > We're changing the behaviour of the wget fetcher to match the wget > > fetcher? This doesn't make any sense. > > The proposed change adds an option to change behavior of wget > fetcher module `fetch2/wget.py` during checking step > (`Wget.checkstatus` function) so it can match the fetching behavior > that uses `.wgetrc` . The *key* piece of information here is wgetrc. At least I can now understand your issue, that kind of information should be mentioned in the commit message. > > > By default, this option is empty, and wget fetcher behavior is not > > > changed. > > > > Why do we need this? Why can't the HTTP client be used? > > During downloading, the `wget` command is used and this includes the > configuration in .wgetrc. During the checking that precedes > downloading, the built-in HTTP-client is used but it doesn't support > all configurations of `wget`, for example authentication with client > certificates or proxy settings. bitbake and this code will handle settings for http_proxy in the environment for example. You are correct that the checkstatus option isn't going to work for all (or any) settings in wgetrc though. The problem is you just added some code which means we have to document that: "If you use a wgetrc file, you must set CHECKCMD_wget in order for checkstatus to work correctly for your SRC_URI". Clearly it doesn't make sense to have the user need to set such a variable in order to make checkstatus work. As such this patch is therefore not the right solution. > > What this appears to do is add two different code paths meaning we > > need > > to double our testing. I really don't want to do that. > > The change contains two new test cases that pass: > > ``` > ./bin/bitbake-selftest -k test_wget_checkstatus_external -v > [...] > test_wget_checkstatus_external_auto > (bb.tests.fetch.FetchCheckStatusTest) ... ok > test_wget_checkstatus_external_wget > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > ---------------------------------------------------------------------- > Ran 2 tests in 13.510s > > OK > ``` I'm not convinced this is a very good test. a) Whilst you test the new variable works, you don't test whether our other tests work with/without the variable set. Can someone set this and have all our other functionality still working? If so, why shouldn't we drop all that other urllib code? b) The test also doesn't touch on wgetrc at all which is the original problem, so we're likely to continue to break that in future. > > Can't the inbuilt http client code be fixed to work with whatever > > auth/proxy issue I'm guessing you're running into? > > As far as I can tell, not on the short term. I suspect we'll need to look into that as the patch as proposed isn't appropriate. > > > Cheers, > > > > Richard > > Let me know if you would like to have changes to the commit message. The commit message is wrong and would need to be fixed but this needs more than that to make it acceptable to merge as I do not like having duplicate overlapping code paths with partial functionality each. Cheers, Richard
Hi all! On Mon, 2022-12-19 at 14:26 +0000, Richard Purdie wrote: > [**EXTERNAL E-MAIL**] > > On Mon, 2022-12-19 at 13:44 +0000, alp.oezmert@mercedes-benz.com > wrote: > > On Mon, 2022-12-19 at 10:58 +0000, Richard Purdie wrote: > > > On Mon, 2022-12-19 at 08:55 +0000, alp.oezmert via > > > lists.openembedded.org wrote: > > > > `CHECKCMD_wget` defines a command that the wget fetcher module > > > > will > > > > execute > > > > instead of the built-in HTTP client when checking if URLs > > > > exist. > > > > > > This says what the patch does, but not why. I can read the patch > > > and > > > see what the code change is but I can't know why we need to so > > > this > > > information is much more important for the commit message. > > > > > > > This > > > > allows to completely match the behavior of the wget fetcher > > > > including proxy > > > > and authentication. > > > > > > We're changing the behaviour of the wget fetcher to match the > > > wget > > > fetcher? This doesn't make any sense. > > > > The proposed change adds an option to change behavior of wget > > fetcher module `fetch2/wget.py` during checking step > > (`Wget.checkstatus` function) so it can match the fetching behavior > > that uses `.wgetrc` . > > The *key* piece of information here is wgetrc. At least I can now > understand your issue, that kind of information should be mentioned > in > the commit message. > > > > > By default, this option is empty, and wget fetcher behavior is > > > > not > > > > changed. > > > > > > Why do we need this? Why can't the HTTP client be used? > > > > During downloading, the `wget` command is used and this includes > > the > > configuration in .wgetrc. During the checking that precedes > > downloading, the built-in HTTP-client is used but it doesn't > > support > > all configurations of `wget`, for example authentication with > > client > > certificates or proxy settings. > > bitbake and this code will handle settings for http_proxy in the > environment for example. You are correct that the checkstatus option > isn't going to work for all (or any) settings in wgetrc though. > > The problem is you just added some code which means we have to > document > that: > > "If you use a wgetrc file, you must set CHECKCMD_wget in order for > checkstatus to work correctly for your SRC_URI". > > Clearly it doesn't make sense to have the user need to set such a > variable in order to make checkstatus work. As such this patch is > therefore not the right solution. > > > > What this appears to do is add two different code paths meaning > > > we > > > need > > > to double our testing. I really don't want to do that. > > > > The change contains two new test cases that pass: > > > > ``` > > ./bin/bitbake-selftest -k test_wget_checkstatus_external -v > > [...] > > test_wget_checkstatus_external_auto > > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > test_wget_checkstatus_external_wget > > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > > > ----------------------------------------------------------------- > > ----- > > Ran 2 tests in 13.510s > > > > OK > > ``` > > I'm not convinced this is a very good test. > > a) Whilst you test the new variable works, you don't test whether our > other tests work with/without the variable set. Can someone set this > and have all our other functionality still working? If so, why > shouldn't we drop all that other urllib code? I am indeed wondering the following: - Was the intention by the urllib code to gain execution speed by connection pooling? - Are the advantages of the above worth the different behavior between `Wget.checkstatus` and `Wget.download` as the latter uses the `wget` command? A design decision to drop all `urllib` code for the sake of uniform interface and unix composability would make sense to me, but this is just my point of view. > b) The test also doesn't touch on wgetrc at all which is the original > problem, so we're likely to continue to break that in future. > > > > Can't the inbuilt http client code be fixed to work with whatever > > > auth/proxy issue I'm guessing you're running into? > > > > As far as I can tell, not on the short term. > > I suspect we'll need to look into that as the patch as proposed isn't > appropriate. > > > > Cheers, > > > > > > Richard > > > > Let me know if you would like to have changes to the commit > > message. > > The commit message is wrong and would need to be fixed but this needs > more than that to make it acceptable to merge as I do not like having > duplicate overlapping code paths with partial functionality each. > > Cheers, > > Richard > Best regards, Alp -- Dr.-Ing. Alp Özmert | alp.oezmert@mercedes-benz.com | Expert Software Engineer | MBition GmbH | Dovestraße 1, 10587 Berlin, Germany Geschäftsführer/Board of Directors: Frank Preidel, Steven Bentley, Daniel Burk, Registergericht/Court of Registry: Amtsgericht Charlottenburg HRB-Nr./Commercial Register No.: 188241 B If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.
On Mon, 2022-12-19 at 14:57 +0000, alp.oezmert@mercedes-benz.com wrote: > Hi all! > > On Mon, 2022-12-19 at 14:26 +0000, Richard Purdie wrote: > > [**EXTERNAL E-MAIL**] > > > > On Mon, 2022-12-19 at 13:44 +0000, alp.oezmert@mercedes-benz.com > > wrote: > > > On Mon, 2022-12-19 at 10:58 +0000, Richard Purdie wrote: > > > > On Mon, 2022-12-19 at 08:55 +0000, alp.oezmert via > > > > lists.openembedded.org wrote: > > > > > `CHECKCMD_wget` defines a command that the wget fetcher module > > > > > will > > > > > execute > > > > > instead of the built-in HTTP client when checking if URLs > > > > > exist. > > > > > > > > This says what the patch does, but not why. I can read the patch > > > > and > > > > see what the code change is but I can't know why we need to so > > > > this > > > > information is much more important for the commit message. > > > > > > > > > This > > > > > allows to completely match the behavior of the wget fetcher > > > > > including proxy > > > > > and authentication. > > > > > > > > We're changing the behaviour of the wget fetcher to match the > > > > wget > > > > fetcher? This doesn't make any sense. > > > > > > The proposed change adds an option to change behavior of wget > > > fetcher module `fetch2/wget.py` during checking step > > > (`Wget.checkstatus` function) so it can match the fetching behavior > > > that uses `.wgetrc` . > > > > The *key* piece of information here is wgetrc. At least I can now > > understand your issue, that kind of information should be mentioned > > in > > the commit message. > > > > > > > By default, this option is empty, and wget fetcher behavior is > > > > > not > > > > > changed. > > > > > > > > Why do we need this? Why can't the HTTP client be used? > > > > > > During downloading, the `wget` command is used and this includes > > > the > > > configuration in .wgetrc. During the checking that precedes > > > downloading, the built-in HTTP-client is used but it doesn't > > > support > > > all configurations of `wget`, for example authentication with > > > client > > > certificates or proxy settings. > > > > bitbake and this code will handle settings for http_proxy in the > > environment for example. You are correct that the checkstatus option > > isn't going to work for all (or any) settings in wgetrc though. > > > > The problem is you just added some code which means we have to > > document > > that: > > > > "If you use a wgetrc file, you must set CHECKCMD_wget in order for > > checkstatus to work correctly for your SRC_URI". > > > > Clearly it doesn't make sense to have the user need to set such a > > variable in order to make checkstatus work. As such this patch is > > therefore not the right solution. > > > > > > What this appears to do is add two different code paths meaning > > > > we > > > > need > > > > to double our testing. I really don't want to do that. > > > > > > The change contains two new test cases that pass: > > > > > > ``` > > > ./bin/bitbake-selftest -k test_wget_checkstatus_external -v > > > [...] > > > test_wget_checkstatus_external_auto > > > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > > test_wget_checkstatus_external_wget > > > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > > > > > ----------------------------------------------------------------- > > > ----- > > > Ran 2 tests in 13.510s > > > > > > OK > > > ``` > > > > I'm not convinced this is a very good test. > > > > a) Whilst you test the new variable works, you don't test whether our > > other tests work with/without the variable set. Can someone set this > > and have all our other functionality still working? If so, why > > shouldn't we drop all that other urllib code? > > I am indeed wondering the following: > > - Was the intention by the urllib code to gain execution speed by > connection pooling? It was one of them, yes. In particular, the sstate existence checks are done with this fetcher code and we need to make several hundred of those at once in a timely fashion to give a reasonable user experience. > - Are the advantages of the above worth the different behavior between > `Wget.checkstatus` and `Wget.download` as the latter uses the `wget` > command? Yes, the advantages were significant. > A design decision to drop all `urllib` code for the sake of uniform > interface and unix composability would make sense to me, but this is > just my point of view. Much as I also like a simple life, we do need the performance that code offers and I suspect much of the userbase would not want it removed. Cheers, Richard
If the problem is in having to provide a magic corporate certificate, we can adjust the existing urllib code so it can be supplied? Alex On Mon, 19 Dec 2022 at 16:02, Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > > On Mon, 2022-12-19 at 14:57 +0000, alp.oezmert@mercedes-benz.com wrote: > > Hi all! > > > > On Mon, 2022-12-19 at 14:26 +0000, Richard Purdie wrote: > > > [**EXTERNAL E-MAIL**] > > > > > > On Mon, 2022-12-19 at 13:44 +0000, alp.oezmert@mercedes-benz.com > > > wrote: > > > > On Mon, 2022-12-19 at 10:58 +0000, Richard Purdie wrote: > > > > > On Mon, 2022-12-19 at 08:55 +0000, alp.oezmert via > > > > > lists.openembedded.org wrote: > > > > > > `CHECKCMD_wget` defines a command that the wget fetcher module > > > > > > will > > > > > > execute > > > > > > instead of the built-in HTTP client when checking if URLs > > > > > > exist. > > > > > > > > > > This says what the patch does, but not why. I can read the patch > > > > > and > > > > > see what the code change is but I can't know why we need to so > > > > > this > > > > > information is much more important for the commit message. > > > > > > > > > > > This > > > > > > allows to completely match the behavior of the wget fetcher > > > > > > including proxy > > > > > > and authentication. > > > > > > > > > > We're changing the behaviour of the wget fetcher to match the > > > > > wget > > > > > fetcher? This doesn't make any sense. > > > > > > > > The proposed change adds an option to change behavior of wget > > > > fetcher module `fetch2/wget.py` during checking step > > > > (`Wget.checkstatus` function) so it can match the fetching behavior > > > > that uses `.wgetrc` . > > > > > > The *key* piece of information here is wgetrc. At least I can now > > > understand your issue, that kind of information should be mentioned > > > in > > > the commit message. > > > > > > > > > By default, this option is empty, and wget fetcher behavior is > > > > > > not > > > > > > changed. > > > > > > > > > > Why do we need this? Why can't the HTTP client be used? > > > > > > > > During downloading, the `wget` command is used and this includes > > > > the > > > > configuration in .wgetrc. During the checking that precedes > > > > downloading, the built-in HTTP-client is used but it doesn't > > > > support > > > > all configurations of `wget`, for example authentication with > > > > client > > > > certificates or proxy settings. > > > > > > bitbake and this code will handle settings for http_proxy in the > > > environment for example. You are correct that the checkstatus option > > > isn't going to work for all (or any) settings in wgetrc though. > > > > > > The problem is you just added some code which means we have to > > > document > > > that: > > > > > > "If you use a wgetrc file, you must set CHECKCMD_wget in order for > > > checkstatus to work correctly for your SRC_URI". > > > > > > Clearly it doesn't make sense to have the user need to set such a > > > variable in order to make checkstatus work. As such this patch is > > > therefore not the right solution. > > > > > > > > What this appears to do is add two different code paths meaning > > > > > we > > > > > need > > > > > to double our testing. I really don't want to do that. > > > > > > > > The change contains two new test cases that pass: > > > > > > > > ``` > > > > ./bin/bitbake-selftest -k test_wget_checkstatus_external -v > > > > [...] > > > > test_wget_checkstatus_external_auto > > > > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > > > test_wget_checkstatus_external_wget > > > > (bb.tests.fetch.FetchCheckStatusTest) ... ok > > > > > > > > ----------------------------------------------------------------- > > > > ----- > > > > Ran 2 tests in 13.510s > > > > > > > > OK > > > > ``` > > > > > > I'm not convinced this is a very good test. > > > > > > a) Whilst you test the new variable works, you don't test whether our > > > other tests work with/without the variable set. Can someone set this > > > and have all our other functionality still working? If so, why > > > shouldn't we drop all that other urllib code? > > > > I am indeed wondering the following: > > > > - Was the intention by the urllib code to gain execution speed by > > connection pooling? > > It was one of them, yes. In particular, the sstate existence checks are > done with this fetcher code and we need to make several hundred of > those at once in a timely fashion to give a reasonable user experience. > > > - Are the advantages of the above worth the different behavior between > > `Wget.checkstatus` and `Wget.download` as the latter uses the `wget` > > command? > > Yes, the advantages were significant. > > > A design decision to drop all `urllib` code for the sake of uniform > > interface and unix composability would make sense to me, but this is > > just my point of view. > > Much as I also like a simple life, we do need the performance that code > offers and I suspect much of the userbase would not want it removed. > > Cheers, > > Richard > > > > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#14204): https://lists.openembedded.org/g/bitbake-devel/message/14204 > Mute This Topic: https://lists.openembedded.org/mt/95761427/1686489 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- >
diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py index 821afa5b..56afd24e 100644 --- a/lib/bb/fetch2/wget.py +++ b/lib/bb/fetch2/wget.py @@ -153,6 +153,13 @@ class Wget(FetchMethod): return True def checkstatus(self, fetch, ud, d, try_again=True): + checkcmd = d.getVar("CHECKCMD_wget") + if checkcmd: + if checkcmd == "auto": + checkcmd = "%s --spider --quiet" % self.basecmd + uri = ud.url.split(";")[0] + return os.system("%s %s" % (checkcmd, uri)) == os.EX_OK + class HTTPConnectionCache(http.client.HTTPConnection): if fetch.connection_cache: def connect(self): diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py index 0af06e46..6eb77e5c 100644 --- a/lib/bb/tests/fetch.py +++ b/lib/bb/tests/fetch.py @@ -1498,6 +1498,20 @@ class FetchCheckStatusTest(FetcherTest): ret = m.checkstatus(fetch, ud, self.d) self.assertTrue(ret, msg="URI %s, can't check status" % (u)) + @skipIfNoNetwork() + def test_wget_checkstatus_external_wget(self): + # exclude test case ftp:// as wget has been observed to wrongly return '8' + self.test_wget_uris = [uri for uri in self.test_wget_uris if "ftp://" not in uri] + self.d.setVar("CHECKCMD_wget", "wget --spider --quiet") + self.test_wget_checkstatus() + + @skipIfNoNetwork() + def test_wget_checkstatus_external_auto(self): + # exclude test case ftp:// as wget has been observed to wrongly return '8' + self.test_wget_uris = [uri for uri in self.test_wget_uris if "ftp://" not in uri] + self.d.setVar("CHECKCMD_wget", "auto") + self.test_wget_checkstatus() + @skipIfNoNetwork() def test_wget_checkstatus_connection_cache(self): from bb.fetch2 import FetchConnectionCache
`CHECKCMD_wget` defines a command that the wget fetcher module will execute instead of the built-in HTTP client when checking if URLs exist. This allows to completely match the behavior of the wget fetcher including proxy and authentication. By default, this option is empty, and wget fetcher behavior is not changed. Signed-off-by: Alp Özmert <alp.oezmert@mercedes-benz.com> --- lib/bb/fetch2/wget.py | 7 +++++++ lib/bb/tests/fetch.py | 14 ++++++++++++++ 2 files changed, 21 insertions(+) -- 2.38.1 If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.