diff mbox series

[RFC] fetch2/wget: set User-Agent to 'bitbake/version' in checkstatus()

Message ID 20241014133504.2390211-1-alex.kanavin@gmail.com
State New
Headers show
Series [RFC] fetch2/wget: set User-Agent to 'bitbake/version' in checkstatus() | expand

Commit Message

Alexander Kanavin Oct. 14, 2024, 1:35 p.m. UTC
From: Alexander Kanavin <alex@linutronix.de>

This eliminates the last usage of 'fake mozilla' in bitbake, and
it's then truthful everywhere about presenting itself, or wget
(when that is used).

I understand this will make people nervous so I want to provide
an extended decription.

1. How was this tested?

- bitbake-selftest -k FetchCheckStatusTest
(tests a few hardcoded URIs, all passed)

- bitbake -k -c checkuri world
(runs checkstatus() over all recipes in oe-core, and all passed again -
this hopefully goes a long way to reassure everyone that hosts around
the world and various CDNs typically do not have a problem with user-agent
strings they haven't seen before or bitbake user-agent specifically)

2. What about that removed cloudflare comment?

I digged into git history, and I think it is not fully accurate. First, 'fake
mozilla' agent is used only for checkstatus() - in actual fetching with wget
it is not. And that has not been a problem for anyone.

Second, here's how the comment occured. Usage of 'fake mozilla' was introduced here:
https://git.yoctoproject.org/poky/commit/?h=master&id=ab26fdae9e5ae56bb84196698d3fa4fd568fe903

At that point it did not have to be specifically 'mozilla', the commit message
indicates that any User-Agent would have been ok. Mozilla was simply copied
from upstream version check for convenience.

Later on, the string was updated to a more recent Mozilla:
https://git.yoctoproject.org/poky/commit/?h=master&id=9f123238261a68e37cec634782e9320633cac5d4

The claim in the added comment become something else: that User-Agent *must* a browser,
without evidence or tests. Even though it demonstrably doesn't have to be - wget is ok.

3. What if someone has a server that is ok with wget agent, but not ok with bitbake agent?

Please see point one. It's not impossible but I think it's highly unlikely. I do think
we should rather tell servers the truth, and learn where the actual issues are. Then
we can consider options - whether that would be pretending to be wget, or allowing user-agent
to be configured. We should also add such servers to bitbake-selftest so we know what they
are.

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
---
 bitbake/lib/bb/fetch2/wget.py | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Ryan Eatmon Oct. 16, 2024, 2:32 p.m. UTC | #1
Initial testing behind the TI firewall showed no issues with these two 
changes.  It *should* be ok to move ahead with it.


On 10/14/2024 8:35 AM, Alexander Kanavin wrote:
> From: Alexander Kanavin <alex@linutronix.de>
> 
> This eliminates the last usage of 'fake mozilla' in bitbake, and
> it's then truthful everywhere about presenting itself, or wget
> (when that is used).
> 
> I understand this will make people nervous so I want to provide
> an extended decription.
> 
> 1. How was this tested?
> 
> - bitbake-selftest -k FetchCheckStatusTest
> (tests a few hardcoded URIs, all passed)
> 
> - bitbake -k -c checkuri world
> (runs checkstatus() over all recipes in oe-core, and all passed again -
> this hopefully goes a long way to reassure everyone that hosts around
> the world and various CDNs typically do not have a problem with user-agent
> strings they haven't seen before or bitbake user-agent specifically)
> 
> 2. What about that removed cloudflare comment?
> 
> I digged into git history, and I think it is not fully accurate. First, 'fake
> mozilla' agent is used only for checkstatus() - in actual fetching with wget
> it is not. And that has not been a problem for anyone.
> 
> Second, here's how the comment occured. Usage of 'fake mozilla' was introduced here:
> https://git.yoctoproject.org/poky/commit/?h=master&id=ab26fdae9e5ae56bb84196698d3fa4fd568fe903
> 
> At that point it did not have to be specifically 'mozilla', the commit message
> indicates that any User-Agent would have been ok. Mozilla was simply copied
> from upstream version check for convenience.
> 
> Later on, the string was updated to a more recent Mozilla:
> https://git.yoctoproject.org/poky/commit/?h=master&id=9f123238261a68e37cec634782e9320633cac5d4
> 
> The claim in the added comment become something else: that User-Agent *must* a browser,
> without evidence or tests. Even though it demonstrably doesn't have to be - wget is ok.
> 
> 3. What if someone has a server that is ok with wget agent, but not ok with bitbake agent?
> 
> Please see point one. It's not impossible but I think it's highly unlikely. I do think
> we should rather tell servers the truth, and learn where the actual issues are. Then
> we can consider options - whether that would be pretending to be wget, or allowing user-agent
> to be configured. We should also add such servers to bitbake-selftest so we know what they
> are.
> 
> Signed-off-by: Alexander Kanavin <alex@linutronix.de>
> ---
>   bitbake/lib/bb/fetch2/wget.py | 7 +------
>   1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py
> index 493a5b62ee2..7856d10fa4b 100644
> --- a/bitbake/lib/bb/fetch2/wget.py
> +++ b/bitbake/lib/bb/fetch2/wget.py
> @@ -53,11 +53,6 @@ class WgetProgressHandler(bb.progress.LineFilterProgressHandler):
>   class Wget(FetchMethod):
>       """Class to fetch urls via 'wget'"""
>   
> -    # CDNs like CloudFlare may do a 'browser integrity test' which can fail
> -    # with the standard wget/urllib User-Agent, so pretend to be a modern
> -    # browser.
> -    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
> -
>       def check_certs(self, d):
>           """
>           Should certificates be checked?
> @@ -356,7 +351,7 @@ class Wget(FetchMethod):
>                   # Some servers (FusionForge, as used on Alioth) require that the
>                   # optional Accept header is set.
>                   r.add_header("Accept", "*/*")
> -                r.add_header("User-Agent", self.user_agent)
> +                r.add_header("User-Agent", "bitbake/{}".format(bb.__version__))
>                   def add_basic_auth(login_str, request):
>                       '''Adds Basic auth to http request, pass in login:password as string'''
>                       import base64
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16680): https://lists.openembedded.org/g/bitbake-devel/message/16680
> Mute This Topic: https://lists.openembedded.org/mt/109001174/6551054
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [reatmon@ti.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py
index 493a5b62ee2..7856d10fa4b 100644
--- a/bitbake/lib/bb/fetch2/wget.py
+++ b/bitbake/lib/bb/fetch2/wget.py
@@ -53,11 +53,6 @@  class WgetProgressHandler(bb.progress.LineFilterProgressHandler):
 class Wget(FetchMethod):
     """Class to fetch urls via 'wget'"""
 
-    # CDNs like CloudFlare may do a 'browser integrity test' which can fail
-    # with the standard wget/urllib User-Agent, so pretend to be a modern
-    # browser.
-    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
-
     def check_certs(self, d):
         """
         Should certificates be checked?
@@ -356,7 +351,7 @@  class Wget(FetchMethod):
                 # Some servers (FusionForge, as used on Alioth) require that the
                 # optional Accept header is set.
                 r.add_header("Accept", "*/*")
-                r.add_header("User-Agent", self.user_agent)
+                r.add_header("User-Agent", "bitbake/{}".format(bb.__version__))
                 def add_basic_auth(login_str, request):
                     '''Adds Basic auth to http request, pass in login:password as string'''
                     import base64