diff mbox series

[v2] fetch2/wget: Add user_agent parameter so it can be used optionally

Message ID 20240609220226.977-1-egyszeregy@freemail.hu
State New
Headers show
Series [v2] fetch2/wget: Add user_agent parameter so it can be used optionally | expand

Commit Message

Livius June 9, 2024, 10:02 p.m. UTC
s=20181004; d=freemail.hu;

	h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding;

	l=4835; bh=8cAMWmHkDso/Uy77wDUPO1dEoAwn3lUW4e+l992BSu4=;

	b=yrQFhZVMDkwi1+7G1lKapCyim7dhvJvjTQ1IFTqOBJoWdKbtJ2U12lMUfV8mA5qc

	7kYaykJnZTKAJEuvdGi5NzJe5TreKJCNbw44c/jC3K7Z44rNAxWCYsg4b1cTHBLhFa3

	ktUBh+zuRYIiFgFVnNpdlrbfgkktZF9bTwbx81bytp/vqOSyJlBYFBaULQJnCerEUn+

	9WZDlyRoRYAirs+iihdt8qyHX3j9GdWCRtVrsmz7neRrMbvdfWvkS+q+Fup2iE7ZzSb

	Dx0DOKiP4BGXVhORfGHqaqUfHSUTtu51oYyXqdeNZ+zmgbUk6O0TGAmlG+3xF1TTTp4

	cOmK0/q2sA==
Content-Transfer-Encoding: quoted-printable

From: Benjamin Sz=C5=91ke <egyszeregy@freemail.hu>

Add "user_agent" optional parameter for wget fetcher to able
to use it if HTTP servers block requests with the default wget
user agent.

Signed-off-by: Benjamin Sz=C5=91ke <egyszeregy@freemail.hu>
---
 conf/bitbake.conf                             |  1 +
 .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
 .../bitbake-user-manual-ref-variables.rst     |  5 ++++
 lib/bb/fetch2/wget.py                         | 11 +++++----
 4 files changed, 28 insertions(+), 12 deletions(-)

+        self.user_agent =3D d.getVar("BB_USER_AGENT")
=20
     def check_certs(self, d):
         """
@@ -89,6 +86,10 @@ class Wget(FetchMethod):
=20
         self.basecmd =3D d.getVar("FETCHCMD_wget") or "/usr/bin/env wget=
 -t 2 -T 30"
=20
+        is_user_agent_enabled =3D ud.parm.get("user_agent","0") =3D=3D "=
1"
+        if is_user_agent_enabled:
+            self.basecmd +=3D f" --user-agent=3D'{self.user_agent}'"
+
         if ud.type =3D=3D 'ftp' or ud.type =3D=3D 'ftps':
             self.basecmd +=3D " --passive-ftp"
=20
--=20
2.45.2.windows.1

Comments

Alexander Kanavin June 10, 2024, 4:23 a.m. UTC | #1
No. I really do not like the idea that we should pretend to be Firefox. And
you didn’t answer Ross questions properly.

Fix the servers please.

Alex



On Mon 10. Jun 2024 at 0.02, Livius via lists.openembedded.org <egyszeregy=
freemail.hu@lists.openembedded.org> wrote:

> From: Benjamin Szőke <egyszeregy@freemail.hu>
>
> Add "user_agent" optional parameter for wget fetcher to able
> to use it if HTTP servers block requests with the default wget
> user agent.
>
> Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu>
> ---
>  conf/bitbake.conf                             |  1 +
>  .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
>  .../bitbake-user-manual-ref-variables.rst     |  5 ++++
>  lib/bb/fetch2/wget.py                         | 11 +++++----
>  4 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/conf/bitbake.conf b/conf/bitbake.conf
> index f5a5a333a..045cc7dbd 100644
> --- a/conf/bitbake.conf
> +++ b/conf/bitbake.conf
> @@ -44,3 +44,4 @@ TARGET_ARCH = "${BUILD_ARCH}"
>  TMPDIR = "${TOPDIR}/tmp"
>  WORKDIR = "${TMPDIR}/work/${PF}"
>  GITPKGV = "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
> +BB_USER_AGENT ??= "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0)
> Gecko/20100101 Firefox/126.0"
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> index fb4f0a23d..4da558f0c 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> @@ -221,13 +221,21 @@ HTTP/FTP wget fetcher (``http://``, ``ftp://``,
> ``https://``)
>  This fetcher obtains files from web and FTP servers. Internally, the
>  fetcher uses the wget utility.
>
> -The executable and parameters used are specified by the
> -``FETCHCMD_wget`` variable, which defaults to sensible values. The
> -fetcher supports a parameter "downloadfilename" that allows the name of
> -the downloaded file to be specified. Specifying the name of the
> -downloaded file is useful for avoiding collisions in
> -:term:`DL_DIR` when dealing with multiple files that
> -have the same name.
> +The executable and parameters used are specified by the ``FETCHCMD_wget``
> +variable, which defaults to sensible values. The fetcher supports
> +parameters:
> +
> +-  *downloadfilename:* That allows the name of the downloaded file
> +   to be specified.
> +
> +-  *user_agent:* Enable to use a default ``Mozilla/5.0`` user-agent
> +   which is defined in :term:`BB_USER_AGENT`.
> +
> +Specifying the name of the downloaded file is useful for avoiding
> +collisions in :term:`DL_DIR` when dealing with multiple files
> +that have the same name. A few HTTP servers block requests with
> +the default wget user-agent, in this case specifying a valid
> +user-agent can solve this issue.
>
>  If a username and password are specified in the ``SRC_URI``, a Basic
>  Authorization header will be added to each request, including across
> redirects.
> @@ -239,6 +247,7 @@ Some example URLs are as follows::
>     SRC_URI = "http://oe.handhelds.org/not_there.aac"
>     SRC_URI = "ftp://oe.handhelds.org/not_there_as_well.aac"
>     SRC_URI = "ftp://you@oe.handhelds.org/home/you/secret.plan"
> +   SRC_URI = "https://oe.handhelds.org/not_there.aac;user_agent=1"
>
>  .. note::
>
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> index 899e584f9..1132d44b9 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> @@ -699,6 +699,11 @@ overview of their function and contents.
>        Within an executing task, this variable holds the hash of the task
> as
>        returned by the currently enabled signature generator.
>
> +   :term:`BB_USER_AGENT`
> +      Specifies a user-agent string which BitBake uses if "user_agent"
> +      parameter is enabled for HTTP/FTP wget fetcher. Default value can
> +      be found in conf\bitbake.conf.
> +
>     :term:`BB_VERBOSE_LOGS`
>        Controls how verbose BitBake is during builds. If set, shell scripts
>        echo commands and shell script output appears on standard out
> diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
> index 2e9211763..098cac52b 100644
> --- a/lib/bb/fetch2/wget.py
> +++ b/lib/bb/fetch2/wget.py
> @@ -52,11 +52,8 @@ class
> WgetProgressHandler(bb.progress.LineFilterProgressHandler):
>
>  class Wget(FetchMethod):
>      """Class to fetch urls via 'wget'"""
> -
> -    # CDNs like CloudFlare may do a 'browser integrity test' which can
> fail
> -    # with the standard wget/urllib User-Agent, so pretend to be a modern
> -    # browser.
> -    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0)
> Gecko/20100101 Firefox/84.0"
> +    def init(self, d):
> +        self.user_agent = d.getVar("BB_USER_AGENT")
>
>      def check_certs(self, d):
>          """
> @@ -89,6 +86,10 @@ class Wget(FetchMethod):
>
>          self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t
> 2 -T 30"
>
> +        is_user_agent_enabled = ud.parm.get("user_agent","0") == "1"
> +        if is_user_agent_enabled:
> +            self.basecmd += f" --user-agent='{self.user_agent}'"
> +
>          if ud.type == 'ftp' or ud.type == 'ftps':
>              self.basecmd += " --passive-ftp"
>
> --
> 2.45.2.windows.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16335):
> https://lists.openembedded.org/g/bitbake-devel/message/16335
> Mute This Topic: https://lists.openembedded.org/mt/106583049/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Alexander Kanavin June 10, 2024, 4:33 a.m. UTC | #2
On second thought, you did mention that FETCHCMD_whet can be used to set a
custom user agent in one of the tickets, so if this can be done per recipe
there is no need to add a new variable.

Ale

On Mon 10. Jun 2024 at 6.24, Alexander Kanavin via lists.openembedded.org
<alex.kanavin=gmail.com@lists.openembedded.org> wrote:

> No. I really do not like the idea that we should pretend to be Firefox.
> And you didn’t answer Ross questions properly.
>
> Fix the servers please.
>
> Alex
>
>
>
> On Mon 10. Jun 2024 at 0.02, Livius via lists.openembedded.org
> <egyszeregy=freemail.hu@lists.openembedded.org> wrote:
>
>> From: Benjamin Szőke <egyszeregy@freemail.hu>
>>
>> Add "user_agent" optional parameter for wget fetcher to able
>> to use it if HTTP servers block requests with the default wget
>> user agent.
>>
>> Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu>
>> ---
>>  conf/bitbake.conf                             |  1 +
>>  .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
>>  .../bitbake-user-manual-ref-variables.rst     |  5 ++++
>>  lib/bb/fetch2/wget.py                         | 11 +++++----
>>  4 files changed, 28 insertions(+), 12 deletions(-)
>>
>> diff --git a/conf/bitbake.conf b/conf/bitbake.conf
>> index f5a5a333a..045cc7dbd 100644
>> --- a/conf/bitbake.conf
>> +++ b/conf/bitbake.conf
>> @@ -44,3 +44,4 @@ TARGET_ARCH = "${BUILD_ARCH}"
>>  TMPDIR = "${TOPDIR}/tmp"
>>  WORKDIR = "${TMPDIR}/work/${PF}"
>>  GITPKGV = "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
>> +BB_USER_AGENT ??= "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0)
>> Gecko/20100101 Firefox/126.0"
>> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> index fb4f0a23d..4da558f0c 100644
>> --- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> @@ -221,13 +221,21 @@ HTTP/FTP wget fetcher (``http://``, ``ftp://``,
>> ``https://``)
>>  This fetcher obtains files from web and FTP servers. Internally, the
>>  fetcher uses the wget utility.
>>
>> -The executable and parameters used are specified by the
>> -``FETCHCMD_wget`` variable, which defaults to sensible values. The
>> -fetcher supports a parameter "downloadfilename" that allows the name of
>> -the downloaded file to be specified. Specifying the name of the
>> -downloaded file is useful for avoiding collisions in
>> -:term:`DL_DIR` when dealing with multiple files that
>> -have the same name.
>> +The executable and parameters used are specified by the ``FETCHCMD_wget``
>> +variable, which defaults to sensible values. The fetcher supports
>> +parameters:
>> +
>> +-  *downloadfilename:* That allows the name of the downloaded file
>> +   to be specified.
>> +
>> +-  *user_agent:* Enable to use a default ``Mozilla/5.0`` user-agent
>> +   which is defined in :term:`BB_USER_AGENT`.
>> +
>> +Specifying the name of the downloaded file is useful for avoiding
>> +collisions in :term:`DL_DIR` when dealing with multiple files
>> +that have the same name. A few HTTP servers block requests with
>> +the default wget user-agent, in this case specifying a valid
>> +user-agent can solve this issue.
>>
>>  If a username and password are specified in the ``SRC_URI``, a Basic
>>  Authorization header will be added to each request, including across
>> redirects.
>> @@ -239,6 +247,7 @@ Some example URLs are as follows::
>>     SRC_URI = "http://oe.handhelds.org/not_there.aac"
>>     SRC_URI = "ftp://oe.handhelds.org/not_there_as_well.aac"
>>     SRC_URI = "ftp://you@oe.handhelds.org/home/you/secret.plan"
>> +   SRC_URI = "https://oe.handhelds.org/not_there.aac;user_agent=1"
>>
>>  .. note::
>>
>> diff --git
>> a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> index 899e584f9..1132d44b9 100644
>> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> @@ -699,6 +699,11 @@ overview of their function and contents.
>>        Within an executing task, this variable holds the hash of the task
>> as
>>        returned by the currently enabled signature generator.
>>
>> +   :term:`BB_USER_AGENT`
>> +      Specifies a user-agent string which BitBake uses if "user_agent"
>> +      parameter is enabled for HTTP/FTP wget fetcher. Default value can
>> +      be found in conf\bitbake.conf.
>> +
>>     :term:`BB_VERBOSE_LOGS`
>>        Controls how verbose BitBake is during builds. If set, shell
>> scripts
>>        echo commands and shell script output appears on standard out
>> diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
>> index 2e9211763..098cac52b 100644
>> --- a/lib/bb/fetch2/wget.py
>> +++ b/lib/bb/fetch2/wget.py
>> @@ -52,11 +52,8 @@ class
>> WgetProgressHandler(bb.progress.LineFilterProgressHandler):
>>
>>  class Wget(FetchMethod):
>>      """Class to fetch urls via 'wget'"""
>> -
>> -    # CDNs like CloudFlare may do a 'browser integrity test' which can
>> fail
>> -    # with the standard wget/urllib User-Agent, so pretend to be a modern
>> -    # browser.
>> -    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0)
>> Gecko/20100101 Firefox/84.0"
>> +    def init(self, d):
>> +        self.user_agent = d.getVar("BB_USER_AGENT")
>>
>>      def check_certs(self, d):
>>          """
>> @@ -89,6 +86,10 @@ class Wget(FetchMethod):
>>
>>          self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget
>> -t 2 -T 30"
>>
>> +        is_user_agent_enabled = ud.parm.get("user_agent","0") == "1"
>> +        if is_user_agent_enabled:
>> +            self.basecmd += f" --user-agent='{self.user_agent}'"
>> +
>>          if ud.type == 'ftp' or ud.type == 'ftps':
>>              self.basecmd += " --passive-ftp"
>>
>> --
>> 2.45.2.windows.1
>>
>>
>>
>>
>>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16336):
> https://lists.openembedded.org/g/bitbake-devel/message/16336
> Mute This Topic: https://lists.openembedded.org/mt/106583049/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Livius June 10, 2024, 9:03 a.m. UTC | #3
Yes FETCHCMD_wget can solve it, but it is a workaround and not a nice solution. Moreover need many experiences for it to understand where and how need to be used, it is not optimal for Yocto end-users, also there will be no any official documentation about that it is a recommended ugly solution solve this issues in recipes. I can not believe this kind of "hacking" is the best to use for long-term solution, this issue need to be solved even more in many recipes in the future in my opinion (more and more HTTP servers are going be crazy in the future).

In your last commits regarding user-agent every info about this situation can be found in your commit message.
https://github.com/openembedded/bitbake/commit/d6fa261a9603677f0b3abbd309c1ca6073b63f4c
Your patch was not ready to solve this in a good way, therefore it was reverted because Jfrog Artifactory does not like it if wget use a browser user-agent, so it must be not a hardcoded wget downloading mode.
https://github.com/openembedded/bitbake/commit/feef5cd12e877f42ffcace168d44b0e6eb80a907

I also run in to this same issue, because dear AMD changed there HTTP servers to be stupid with wget (it is now affected in all old Yocto releases as well, thanks AMD).
https://support.xilinx.com/s/question/0D54U00008RolRMSAZ/yocto-metaxilinxcore-cannot-find-pmuromnative-url?language=en_US

Nobody from us can solve the server issues, not I am and not You are the server maintainers of this big tech companies, they will never take care about this Yocto issue, i think. So it means a nice and easy solution need to be provided by the Yocto project/bitbake about this issues.

It is clear for me there are some random HTTP server hoster which does not like wget downloading and must be used some fake browser user-agent for them and unfortunately there are other server hoster which do the opposite issue, when a browser user-agent can require an interactive browser for download (wget can not do it).

So my opinion, user-agent must be an enable/disable parameter for wget fetcher. It will helps for recipe maintainers to use it if need. Default "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" user-agent in BB_USER_AGENT will be fine for 99% if need to use it, but also it is easily changeable in a recipe file if maintainer found out there need to be use any other user-agent for a specific HTTP host server/URL.
Alexander Kanavin June 10, 2024, 9:26 a.m. UTC | #4
I can't agree with this. Setting a custom user agent is not a solution
either, it's a workaround, just slightly less cumbersome than using
FETCHCMD_wget.

If you make it easy to set custom user agents, then the actual issue
(broken servers) will never get reported, let alone solved. You don't
need to report it to server administrators; the correct people are
those that placed the download artefacts onto those servers. They
might be open to moving those artefacts elsewhere, or if they are as
well uncooperative, the recipe maintainers can move them and update
the recipes (which has to happen for older yoctos in any case).

Did you contact meta-xilinx maintainers about the issue? Can you
report it to them first?

Alex

On Mon, 10 Jun 2024 at 11:10, Livius via lists.openembedded.org
<egyszeregy=freemail.hu@lists.openembedded.org> wrote:
>
> [Edited Message Follows]
>
> Yes FETCHCMD_wget can solve it, but it is a workaround and not a nice solution. Moreover need many experiences for it to understand where and how need to be used, it is not optimal for Yocto end-users, also there will be no any official documentation about that it is a recommended ugly solution solve this issues in recipes. I can not believe this kind of "hacking" is the best to use for long-term solution, this issue need to be solved even more in many recipes in the future in my opinion (more and more HTTP servers are going be crazy in the future).
>
> In a last commits regarding user-agent every info about this situation can be found in its commit message.
> https://github.com/openembedded/bitbake/commit/d6fa261a9603677f0b3abbd309c1ca6073b63f4c
> This patch was not ready to solve this in a good way, therefore it was reverted because Jfrog Artifactory does not like it if wget use a browser user-agent, so it must be not a hardcoded wget downloading mode.
> https://github.com/openembedded/bitbake/commit/feef5cd12e877f42ffcace168d44b0e6eb80a907
>
> I also run in to this same issue, because dear AMD changed there HTTP servers to be stupid with wget (it is now affected in all old Yocto releases as well, thanks AMD).
> https://support.xilinx.com/s/question/0D54U00008RolRMSAZ/yocto-metaxilinxcore-cannot-find-pmuromnative-url?language=en_US
>
> Nobody from us can solve the server issues, not I am and not You are the server maintainers of this big tech companies, they will never take care about this problem in Yocto, i think. So it means a nice and easy solution need to be provided by the Yocto project/bitbake about this issues.
>
> It is clear for me there are some random HTTP server hoster which does not like wget downloading and must be used some fake browser user-agent for them and unfortunately there are other server hoster which do the opposite issue, a browser user-agent can require an interactive browser for download (wget can not do it).
>
> So my opinion, user-agent must be an enable/disable parameter for wget fetcher. It will helps for recipe maintainers to use it if need. Default "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" user-agent in BB_USER_AGENT will be fine in 99% if need to use it, but also it is easily changeable in a recipe file if maintainer found out, there need to be use any other user-agent for a specific HTTP host server/URL.
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16338): https://lists.openembedded.org/g/bitbake-devel/message/16338
> Mute This Topic: https://lists.openembedded.org/mt/106583049/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Richard Purdie June 10, 2024, 9:41 a.m. UTC | #5
On Mon, 2024-06-10 at 06:23 +0200, Alexander Kanavin via
lists.yoctoproject.org wrote:
> No. I really do not like the idea that we should pretend to be
> Firefox. And you didn’t answer Ross questions properly.

We already do that unfortunately.

I think the biggest concern I have about the patch is the variable
name, it should be namespaced into the fetcher (BB_FETCH_USER_AGENT?).

I would like to see some answers to Ross' questions as well though. It
would be good to understand what kinds of problems people are facing
that need this.

The trouble is that every time we add more configuration to bitbake, it
complicated testing and if we can see one way of working for the
majority, it does help reduce complexity. This means adding new
configuration and code paths does need careful thought.

Cheers,

Richard
Livius June 10, 2024, 9:44 a.m. UTC | #6
Do you motivated to collect the all of URLs in a list which are broken and all the time update/report them to the server hoster companies? (more then half of them will be never response for you anything)

I have no time for it, rather more I am motivated to make it as a "new user-agent feature" for wget fetcher to use it a slightly less cumbersome solution. If need it can be used quickly this is the main goal. This is what need to the Yocto users/maintaniers whit correct documentation not any ugly undocumented suggestions and workarounds for it which are available only from random topics in forum and mailing list.
Alexander Kanavin June 10, 2024, 9:48 a.m. UTC | #7
Not to the server hoster companies. I asked you to report this to
upstream maintainers that placed those artefacts there, and to the
layer maintainers that wrote recipes that fetch those artefacts. Don't
jump into writing a whole new universal solution before you get
feedback from meta-xilinx folks, and a possible alternative solution
from them.

Alex


On Mon, 10 Jun 2024 at 11:44, Livius via lists.openembedded.org
<egyszeregy=freemail.hu@lists.openembedded.org> wrote:
>
> Do you motivated to collect the all of URLs in a list which are broken and all the time update/report them to the server hoster companies? (more then half of them will be never response for you anything)
>
> I have no time for it, rather more I am motivated to make it as a "new user-agent feature" for wget fetcher to use it a slightly less cumbersome solution. If need it can be used quickly this is the main goal. This is what need to the Yocto users/maintaniers whit correct documentation not any ugly undocumented suggestions and workarounds for it which are available only from random topics in forum and mailing list.
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16341): https://lists.openembedded.org/g/bitbake-devel/message/16341
> Mute This Topic: https://lists.openembedded.org/mt/106583049/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Livius June 10, 2024, 11:37 p.m. UTC | #8
Possible answer for Ross:
https://lists.openembedded.org/g/bitbake-devel/message/16344
Livius June 10, 2024, 11:39 p.m. UTC | #9
Possible answer for Ross:
https://lists.openembedded.org/g/bitbake-devel/message/16344
diff mbox series

Patch

diff --git a/conf/bitbake.conf b/conf/bitbake.conf
index f5a5a333a..045cc7dbd 100644
--- a/conf/bitbake.conf
+++ b/conf/bitbake.conf
@@ -44,3 +44,4 @@  TARGET_ARCH =3D "${BUILD_ARCH}"
 TMPDIR =3D "${TOPDIR}/tmp"
 WORKDIR =3D "${TMPDIR}/work/${PF}"
 GITPKGV =3D "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
+BB_USER_AGENT ??=3D "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) G=
ecko/20100101 Firefox/126.0"
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/d=
oc/bitbake-user-manual/bitbake-user-manual-fetching.rst
index fb4f0a23d..4da558f0c 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
@@ -221,13 +221,21 @@  HTTP/FTP wget fetcher (``http://``, ``ftp://``, ``h=
ttps://``)
 This fetcher obtains files from web and FTP servers. Internally, the
 fetcher uses the wget utility.
=20
-The executable and parameters used are specified by the
-``FETCHCMD_wget`` variable, which defaults to sensible values. The
-fetcher supports a parameter "downloadfilename" that allows the name of
-the downloaded file to be specified. Specifying the name of the
-downloaded file is useful for avoiding collisions in
-:term:`DL_DIR` when dealing with multiple files that
-have the same name.
+The executable and parameters used are specified by the ``FETCHCMD_wget`=
`
+variable, which defaults to sensible values. The fetcher supports
+parameters:
+
+-  *downloadfilename:* That allows the name of the downloaded file
+   to be specified.
+
+-  *user_agent:* Enable to use a default ``Mozilla/5.0`` user-agent
+   which is defined in :term:`BB_USER_AGENT`.
+
+Specifying the name of the downloaded file is useful for avoiding
+collisions in :term:`DL_DIR` when dealing with multiple files
+that have the same name. A few HTTP servers block requests with
+the default wget user-agent, in this case specifying a valid
+user-agent can solve this issue.
=20
 If a username and password are specified in the ``SRC_URI``, a Basic
 Authorization header will be added to each request, including across red=
irects.
@@ -239,6 +247,7 @@  Some example URLs are as follows::
    SRC_URI =3D "http://oe.handhelds.org/not_there.aac"
    SRC_URI =3D "ftp://oe.handhelds.org/not_there_as_well.aac"
    SRC_URI =3D "ftp://you@oe.handhelds.org/home/you/secret.plan"
+   SRC_URI =3D "https://oe.handhelds.org/not_there.aac;user_agent=3D1"
=20
 .. note::
=20
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rs=
t b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
index 899e584f9..1132d44b9 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
@@ -699,6 +699,11 @@  overview of their function and contents.
       Within an executing task, this variable holds the hash of the task=
 as
       returned by the currently enabled signature generator.
=20
+   :term:`BB_USER_AGENT`
+      Specifies a user-agent string which BitBake uses if "user_agent"
+      parameter is enabled for HTTP/FTP wget fetcher. Default value can
+      be found in conf\bitbake.conf.
+
    :term:`BB_VERBOSE_LOGS`
       Controls how verbose BitBake is during builds. If set, shell scrip=
ts
       echo commands and shell script output appears on standard out
diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
index 2e9211763..098cac52b 100644
--- a/lib/bb/fetch2/wget.py
+++ b/lib/bb/fetch2/wget.py
@@ -52,11 +52,8 @@  class WgetProgressHandler(bb.progress.LineFilterProgre=
ssHandler):
=20
 class Wget(FetchMethod):
     """Class to fetch urls via 'wget'"""
-
-    # CDNs like CloudFlare may do a 'browser integrity test' which can f=
ail
-    # with the standard wget/urllib User-Agent, so pretend to be a moder=
n
-    # browser.
-    user_agent =3D "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gec=
ko/20100101 Firefox/84.0"
+    def init(self, d):