diff mbox series

[v2] fetch2/wget: Add user_agent parameter so it can be used optionally

Message ID 20240609220226.977-1-egyszeregy@freemail.hu
State New
Headers show
Series [v2] fetch2/wget: Add user_agent parameter so it can be used optionally | expand

Commit Message

Livius June 9, 2024, 10:02 p.m. UTC
s=20181004; d=freemail.hu;

	h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding;

	l=4835; bh=8cAMWmHkDso/Uy77wDUPO1dEoAwn3lUW4e+l992BSu4=;

	b=yrQFhZVMDkwi1+7G1lKapCyim7dhvJvjTQ1IFTqOBJoWdKbtJ2U12lMUfV8mA5qc

	7kYaykJnZTKAJEuvdGi5NzJe5TreKJCNbw44c/jC3K7Z44rNAxWCYsg4b1cTHBLhFa3

	ktUBh+zuRYIiFgFVnNpdlrbfgkktZF9bTwbx81bytp/vqOSyJlBYFBaULQJnCerEUn+

	9WZDlyRoRYAirs+iihdt8qyHX3j9GdWCRtVrsmz7neRrMbvdfWvkS+q+Fup2iE7ZzSb

	Dx0DOKiP4BGXVhORfGHqaqUfHSUTtu51oYyXqdeNZ+zmgbUk6O0TGAmlG+3xF1TTTp4

	cOmK0/q2sA==
Content-Transfer-Encoding: quoted-printable

From: Benjamin Sz=C5=91ke <egyszeregy@freemail.hu>

Add "user_agent" optional parameter for wget fetcher to able
to use it if HTTP servers block requests with the default wget
user agent.

Signed-off-by: Benjamin Sz=C5=91ke <egyszeregy@freemail.hu>
---
 conf/bitbake.conf                             |  1 +
 .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
 .../bitbake-user-manual-ref-variables.rst     |  5 ++++
 lib/bb/fetch2/wget.py                         | 11 +++++----
 4 files changed, 28 insertions(+), 12 deletions(-)

+        self.user_agent =3D d.getVar("BB_USER_AGENT")
=20
     def check_certs(self, d):
         """
@@ -89,6 +86,10 @@ class Wget(FetchMethod):
=20
         self.basecmd =3D d.getVar("FETCHCMD_wget") or "/usr/bin/env wget=
 -t 2 -T 30"
=20
+        is_user_agent_enabled =3D ud.parm.get("user_agent","0") =3D=3D "=
1"
+        if is_user_agent_enabled:
+            self.basecmd +=3D f" --user-agent=3D'{self.user_agent}'"
+
         if ud.type =3D=3D 'ftp' or ud.type =3D=3D 'ftps':
             self.basecmd +=3D " --passive-ftp"
=20
--=20
2.45.2.windows.1

Comments

Alexander Kanavin June 10, 2024, 4:23 a.m. UTC | #1
No. I really do not like the idea that we should pretend to be Firefox. And
you didn’t answer Ross questions properly.

Fix the servers please.

Alex



On Mon 10. Jun 2024 at 0.02, Livius via lists.openembedded.org <egyszeregy=
freemail.hu@lists.openembedded.org> wrote:

> From: Benjamin Szőke <egyszeregy@freemail.hu>
>
> Add "user_agent" optional parameter for wget fetcher to able
> to use it if HTTP servers block requests with the default wget
> user agent.
>
> Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu>
> ---
>  conf/bitbake.conf                             |  1 +
>  .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
>  .../bitbake-user-manual-ref-variables.rst     |  5 ++++
>  lib/bb/fetch2/wget.py                         | 11 +++++----
>  4 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/conf/bitbake.conf b/conf/bitbake.conf
> index f5a5a333a..045cc7dbd 100644
> --- a/conf/bitbake.conf
> +++ b/conf/bitbake.conf
> @@ -44,3 +44,4 @@ TARGET_ARCH = "${BUILD_ARCH}"
>  TMPDIR = "${TOPDIR}/tmp"
>  WORKDIR = "${TMPDIR}/work/${PF}"
>  GITPKGV = "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
> +BB_USER_AGENT ??= "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0)
> Gecko/20100101 Firefox/126.0"
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> index fb4f0a23d..4da558f0c 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> @@ -221,13 +221,21 @@ HTTP/FTP wget fetcher (``http://``, ``ftp://``,
> ``https://``)
>  This fetcher obtains files from web and FTP servers. Internally, the
>  fetcher uses the wget utility.
>
> -The executable and parameters used are specified by the
> -``FETCHCMD_wget`` variable, which defaults to sensible values. The
> -fetcher supports a parameter "downloadfilename" that allows the name of
> -the downloaded file to be specified. Specifying the name of the
> -downloaded file is useful for avoiding collisions in
> -:term:`DL_DIR` when dealing with multiple files that
> -have the same name.
> +The executable and parameters used are specified by the ``FETCHCMD_wget``
> +variable, which defaults to sensible values. The fetcher supports
> +parameters:
> +
> +-  *downloadfilename:* That allows the name of the downloaded file
> +   to be specified.
> +
> +-  *user_agent:* Enable to use a default ``Mozilla/5.0`` user-agent
> +   which is defined in :term:`BB_USER_AGENT`.
> +
> +Specifying the name of the downloaded file is useful for avoiding
> +collisions in :term:`DL_DIR` when dealing with multiple files
> +that have the same name. A few HTTP servers block requests with
> +the default wget user-agent, in this case specifying a valid
> +user-agent can solve this issue.
>
>  If a username and password are specified in the ``SRC_URI``, a Basic
>  Authorization header will be added to each request, including across
> redirects.
> @@ -239,6 +247,7 @@ Some example URLs are as follows::
>     SRC_URI = "http://oe.handhelds.org/not_there.aac"
>     SRC_URI = "ftp://oe.handhelds.org/not_there_as_well.aac"
>     SRC_URI = "ftp://you@oe.handhelds.org/home/you/secret.plan"
> +   SRC_URI = "https://oe.handhelds.org/not_there.aac;user_agent=1"
>
>  .. note::
>
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> index 899e584f9..1132d44b9 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> @@ -699,6 +699,11 @@ overview of their function and contents.
>        Within an executing task, this variable holds the hash of the task
> as
>        returned by the currently enabled signature generator.
>
> +   :term:`BB_USER_AGENT`
> +      Specifies a user-agent string which BitBake uses if "user_agent"
> +      parameter is enabled for HTTP/FTP wget fetcher. Default value can
> +      be found in conf\bitbake.conf.
> +
>     :term:`BB_VERBOSE_LOGS`
>        Controls how verbose BitBake is during builds. If set, shell scripts
>        echo commands and shell script output appears on standard out
> diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
> index 2e9211763..098cac52b 100644
> --- a/lib/bb/fetch2/wget.py
> +++ b/lib/bb/fetch2/wget.py
> @@ -52,11 +52,8 @@ class
> WgetProgressHandler(bb.progress.LineFilterProgressHandler):
>
>  class Wget(FetchMethod):
>      """Class to fetch urls via 'wget'"""
> -
> -    # CDNs like CloudFlare may do a 'browser integrity test' which can
> fail
> -    # with the standard wget/urllib User-Agent, so pretend to be a modern
> -    # browser.
> -    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0)
> Gecko/20100101 Firefox/84.0"
> +    def init(self, d):
> +        self.user_agent = d.getVar("BB_USER_AGENT")
>
>      def check_certs(self, d):
>          """
> @@ -89,6 +86,10 @@ class Wget(FetchMethod):
>
>          self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t
> 2 -T 30"
>
> +        is_user_agent_enabled = ud.parm.get("user_agent","0") == "1"
> +        if is_user_agent_enabled:
> +            self.basecmd += f" --user-agent='{self.user_agent}'"
> +
>          if ud.type == 'ftp' or ud.type == 'ftps':
>              self.basecmd += " --passive-ftp"
>
> --
> 2.45.2.windows.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16335):
> https://lists.openembedded.org/g/bitbake-devel/message/16335
> Mute This Topic: https://lists.openembedded.org/mt/106583049/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Alexander Kanavin June 10, 2024, 4:33 a.m. UTC | #2
On second thought, you did mention that FETCHCMD_whet can be used to set a
custom user agent in one of the tickets, so if this can be done per recipe
there is no need to add a new variable.

Ale

On Mon 10. Jun 2024 at 6.24, Alexander Kanavin via lists.openembedded.org
<alex.kanavin=gmail.com@lists.openembedded.org> wrote:

> No. I really do not like the idea that we should pretend to be Firefox.
> And you didn’t answer Ross questions properly.
>
> Fix the servers please.
>
> Alex
>
>
>
> On Mon 10. Jun 2024 at 0.02, Livius via lists.openembedded.org
> <egyszeregy=freemail.hu@lists.openembedded.org> wrote:
>
>> From: Benjamin Szőke <egyszeregy@freemail.hu>
>>
>> Add "user_agent" optional parameter for wget fetcher to able
>> to use it if HTTP servers block requests with the default wget
>> user agent.
>>
>> Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu>
>> ---
>>  conf/bitbake.conf                             |  1 +
>>  .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
>>  .../bitbake-user-manual-ref-variables.rst     |  5 ++++
>>  lib/bb/fetch2/wget.py                         | 11 +++++----
>>  4 files changed, 28 insertions(+), 12 deletions(-)
>>
>> diff --git a/conf/bitbake.conf b/conf/bitbake.conf
>> index f5a5a333a..045cc7dbd 100644
>> --- a/conf/bitbake.conf
>> +++ b/conf/bitbake.conf
>> @@ -44,3 +44,4 @@ TARGET_ARCH = "${BUILD_ARCH}"
>>  TMPDIR = "${TOPDIR}/tmp"
>>  WORKDIR = "${TMPDIR}/work/${PF}"
>>  GITPKGV = "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
>> +BB_USER_AGENT ??= "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0)
>> Gecko/20100101 Firefox/126.0"
>> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> index fb4f0a23d..4da558f0c 100644
>> --- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
>> @@ -221,13 +221,21 @@ HTTP/FTP wget fetcher (``http://``, ``ftp://``,
>> ``https://``)
>>  This fetcher obtains files from web and FTP servers. Internally, the
>>  fetcher uses the wget utility.
>>
>> -The executable and parameters used are specified by the
>> -``FETCHCMD_wget`` variable, which defaults to sensible values. The
>> -fetcher supports a parameter "downloadfilename" that allows the name of
>> -the downloaded file to be specified. Specifying the name of the
>> -downloaded file is useful for avoiding collisions in
>> -:term:`DL_DIR` when dealing with multiple files that
>> -have the same name.
>> +The executable and parameters used are specified by the ``FETCHCMD_wget``
>> +variable, which defaults to sensible values. The fetcher supports
>> +parameters:
>> +
>> +-  *downloadfilename:* That allows the name of the downloaded file
>> +   to be specified.
>> +
>> +-  *user_agent:* Enable to use a default ``Mozilla/5.0`` user-agent
>> +   which is defined in :term:`BB_USER_AGENT`.
>> +
>> +Specifying the name of the downloaded file is useful for avoiding
>> +collisions in :term:`DL_DIR` when dealing with multiple files
>> +that have the same name. A few HTTP servers block requests with
>> +the default wget user-agent, in this case specifying a valid
>> +user-agent can solve this issue.
>>
>>  If a username and password are specified in the ``SRC_URI``, a Basic
>>  Authorization header will be added to each request, including across
>> redirects.
>> @@ -239,6 +247,7 @@ Some example URLs are as follows::
>>     SRC_URI = "http://oe.handhelds.org/not_there.aac"
>>     SRC_URI = "ftp://oe.handhelds.org/not_there_as_well.aac"
>>     SRC_URI = "ftp://you@oe.handhelds.org/home/you/secret.plan"
>> +   SRC_URI = "https://oe.handhelds.org/not_there.aac;user_agent=1"
>>
>>  .. note::
>>
>> diff --git
>> a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> index 899e584f9..1132d44b9 100644
>> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
>> @@ -699,6 +699,11 @@ overview of their function and contents.
>>        Within an executing task, this variable holds the hash of the task
>> as
>>        returned by the currently enabled signature generator.
>>
>> +   :term:`BB_USER_AGENT`
>> +      Specifies a user-agent string which BitBake uses if "user_agent"
>> +      parameter is enabled for HTTP/FTP wget fetcher. Default value can
>> +      be found in conf\bitbake.conf.
>> +
>>     :term:`BB_VERBOSE_LOGS`
>>        Controls how verbose BitBake is during builds. If set, shell
>> scripts
>>        echo commands and shell script output appears on standard out
>> diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
>> index 2e9211763..098cac52b 100644
>> --- a/lib/bb/fetch2/wget.py
>> +++ b/lib/bb/fetch2/wget.py
>> @@ -52,11 +52,8 @@ class
>> WgetProgressHandler(bb.progress.LineFilterProgressHandler):
>>
>>  class Wget(FetchMethod):
>>      """Class to fetch urls via 'wget'"""
>> -
>> -    # CDNs like CloudFlare may do a 'browser integrity test' which can
>> fail
>> -    # with the standard wget/urllib User-Agent, so pretend to be a modern
>> -    # browser.
>> -    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0)
>> Gecko/20100101 Firefox/84.0"
>> +    def init(self, d):
>> +        self.user_agent = d.getVar("BB_USER_AGENT")
>>
>>      def check_certs(self, d):
>>          """
>> @@ -89,6 +86,10 @@ class Wget(FetchMethod):
>>
>>          self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget
>> -t 2 -T 30"
>>
>> +        is_user_agent_enabled = ud.parm.get("user_agent","0") == "1"
>> +        if is_user_agent_enabled:
>> +            self.basecmd += f" --user-agent='{self.user_agent}'"
>> +
>>          if ud.type == 'ftp' or ud.type == 'ftps':
>>              self.basecmd += " --passive-ftp"
>>
>> --
>> 2.45.2.windows.1
>>
>>
>>
>>
>>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16336):
> https://lists.openembedded.org/g/bitbake-devel/message/16336
> Mute This Topic: https://lists.openembedded.org/mt/106583049/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Richard Purdie June 10, 2024, 9:41 a.m. UTC | #3
On Mon, 2024-06-10 at 06:23 +0200, Alexander Kanavin via
lists.yoctoproject.org wrote:
> No. I really do not like the idea that we should pretend to be
> Firefox. And you didn’t answer Ross questions properly.

We already do that unfortunately.

I think the biggest concern I have about the patch is the variable
name, it should be namespaced into the fetcher (BB_FETCH_USER_AGENT?).

I would like to see some answers to Ross' questions as well though. It
would be good to understand what kinds of problems people are facing
that need this.

The trouble is that every time we add more configuration to bitbake, it
complicated testing and if we can see one way of working for the
majority, it does help reduce complexity. This means adding new
configuration and code paths does need careful thought.

Cheers,

Richard
diff mbox series

Patch

diff --git a/conf/bitbake.conf b/conf/bitbake.conf
index f5a5a333a..045cc7dbd 100644
--- a/conf/bitbake.conf
+++ b/conf/bitbake.conf
@@ -44,3 +44,4 @@  TARGET_ARCH =3D "${BUILD_ARCH}"
 TMPDIR =3D "${TOPDIR}/tmp"
 WORKDIR =3D "${TMPDIR}/work/${PF}"
 GITPKGV =3D "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
+BB_USER_AGENT ??=3D "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) G=
ecko/20100101 Firefox/126.0"
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/d=
oc/bitbake-user-manual/bitbake-user-manual-fetching.rst
index fb4f0a23d..4da558f0c 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
@@ -221,13 +221,21 @@  HTTP/FTP wget fetcher (``http://``, ``ftp://``, ``h=
ttps://``)
 This fetcher obtains files from web and FTP servers. Internally, the
 fetcher uses the wget utility.
=20
-The executable and parameters used are specified by the
-``FETCHCMD_wget`` variable, which defaults to sensible values. The
-fetcher supports a parameter "downloadfilename" that allows the name of
-the downloaded file to be specified. Specifying the name of the
-downloaded file is useful for avoiding collisions in
-:term:`DL_DIR` when dealing with multiple files that
-have the same name.
+The executable and parameters used are specified by the ``FETCHCMD_wget`=
`
+variable, which defaults to sensible values. The fetcher supports
+parameters:
+
+-  *downloadfilename:* That allows the name of the downloaded file
+   to be specified.
+
+-  *user_agent:* Enable to use a default ``Mozilla/5.0`` user-agent
+   which is defined in :term:`BB_USER_AGENT`.
+
+Specifying the name of the downloaded file is useful for avoiding
+collisions in :term:`DL_DIR` when dealing with multiple files
+that have the same name. A few HTTP servers block requests with
+the default wget user-agent, in this case specifying a valid
+user-agent can solve this issue.
=20
 If a username and password are specified in the ``SRC_URI``, a Basic
 Authorization header will be added to each request, including across red=
irects.
@@ -239,6 +247,7 @@  Some example URLs are as follows::
    SRC_URI =3D "http://oe.handhelds.org/not_there.aac"
    SRC_URI =3D "ftp://oe.handhelds.org/not_there_as_well.aac"
    SRC_URI =3D "ftp://you@oe.handhelds.org/home/you/secret.plan"
+   SRC_URI =3D "https://oe.handhelds.org/not_there.aac;user_agent=3D1"
=20
 .. note::
=20
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rs=
t b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
index 899e584f9..1132d44b9 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
@@ -699,6 +699,11 @@  overview of their function and contents.
       Within an executing task, this variable holds the hash of the task=
 as
       returned by the currently enabled signature generator.
=20
+   :term:`BB_USER_AGENT`
+      Specifies a user-agent string which BitBake uses if "user_agent"
+      parameter is enabled for HTTP/FTP wget fetcher. Default value can
+      be found in conf\bitbake.conf.
+
    :term:`BB_VERBOSE_LOGS`
       Controls how verbose BitBake is during builds. If set, shell scrip=
ts
       echo commands and shell script output appears on standard out
diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
index 2e9211763..098cac52b 100644
--- a/lib/bb/fetch2/wget.py
+++ b/lib/bb/fetch2/wget.py
@@ -52,11 +52,8 @@  class WgetProgressHandler(bb.progress.LineFilterProgre=
ssHandler):
=20
 class Wget(FetchMethod):
     """Class to fetch urls via 'wget'"""
-
-    # CDNs like CloudFlare may do a 'browser integrity test' which can f=
ail
-    # with the standard wget/urllib User-Agent, so pretend to be a moder=
n
-    # browser.
-    user_agent =3D "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gec=
ko/20100101 Firefox/84.0"
+    def init(self, d):