diff mbox series

[v3] fetch2/wget: Add user_agent parameter so it can be used optionally

Message ID 20240611230649.1473-1-egyszeregy@freemail.hu
State New
Headers show
Series [v3] fetch2/wget: Add user_agent parameter so it can be used optionally | expand

Commit Message

Livius June 11, 2024, 11:06 p.m. UTC
s=20181004; d=freemail.hu;

	h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding;

	l=4816; bh=zfRHegVF38ufPjMVJivZbqGVpzjQjOKlpALoiB7N0UE=;

	b=gazu0kjFemdHJHl/KdnUsHHN6JcNUig1mM88ohsgn7ia5CF+SAgtfKCt4ZlkQUdJ

	4mGUEZEekwpVB2ML1ipcgCtlJFstu+/SGMD/y3vV7XuUm+8Fnyh8O3wPB3mL5lqa/Vt

	AEzRCG91/jlQgzF0spvPDGyGUplpB10sAP8FRyr7m2epMNrFg1F1IYZvDV/DyeQgJrx

	vuJsdM+TL861rMoQdt2GkuICzD9TkRnppImLTeRDSa2IsstY46Unv4Tt1bwSvBm4SrY

	PuzWpw091KHGT/6dpzfSww6rpKrQtppkB7QJaXnOFn4Pe53XjBkKgc+fnERwUUfOqAT

	J3YpcaFzZg==
Content-Transfer-Encoding: quoted-printable

From: Benjamin Sz=C5=91ke <egyszeregy@freemail.hu>

Add "user_agent" optional parameter for wget fetcher to able
to use it if HTTP servers block requests with the default wget
user-agent.

Signed-off-by: Benjamin Sz=C5=91ke <egyszeregy@freemail.hu>
---
 conf/bitbake.conf                             |  1 +
 .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
 .../bitbake-user-manual-ref-variables.rst     |  5 ++++
 lib/bb/fetch2/wget.py                         | 11 +++++----
 4 files changed, 28 insertions(+), 12 deletions(-)

+        self.user_agent =3D d.getVar("BB_FETCH_USER_AGENT")
=20
     def check_certs(self, d):
         """
@@ -89,6 +86,10 @@ class Wget(FetchMethod):
=20
         self.basecmd =3D d.getVar("FETCHCMD_wget") or "/usr/bin/env wget=
 -t 2 -T 30"
=20
+        is_user_agent_enabled =3D ud.parm.get("user_agent","0") =3D=3D "=
1"
+        if is_user_agent_enabled:
+            self.basecmd +=3D f" --user-agent=3D'{self.user_agent}'"
+
         if ud.type =3D=3D 'ftp' or ud.type =3D=3D 'ftps':
             self.basecmd +=3D " --passive-ftp"
=20
--=20
2.45.2.windows.1

Comments

Alexander Kanavin June 12, 2024, 7:45 a.m. UTC | #1
It helps to include all relevant links into the commit message, e.g.
the issues with xilinx's PMU_ROM that you had:
https://support.xilinx.com/s/question/0D54U00008RolRMSAZ/yocto-metaxilinxcore-cannot-find-pmuromnative-url?language=en_US

I just tried to reproduce the locally, and there is no issue with wget:

alex@alex-lx-laptop:~$ wget
https://www.xilinx.com/bin/public/openDownload?filename=PMU_ROM.tar.gz
--2024-06-12 09:29:55--
https://www.xilinx.com/bin/public/openDownload?filename=PMU_ROM.tar.gz
Resolving www.xilinx.com (www.xilinx.com)... 184.25.239.57, 184.25.239.48
Connecting to www.xilinx.com (www.xilinx.com)|184.25.239.57|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://xilinx-ax-dl.entitlenow.com/dl/ul/2021/12/20/R210497260/PMU_ROM.tar.gz?hash=oU3pGSFXZ2pq00I2gGZX-g&expires=1718209784&filename=PMU_ROM.tar.gz
[following]
--2024-06-12 09:29:56--
https://xilinx-ax-dl.entitlenow.com/dl/ul/2021/12/20/R210497260/PMU_ROM.tar.gz?hash=oU3pGSFXZ2pq00I2gGZX-g&expires=1718209784&filename=PMU_ROM.tar.gz
Resolving xilinx-ax-dl.entitlenow.com (xilinx-ax-dl.entitlenow.com)...
184.25.239.24, 184.25.239.83
Connecting to xilinx-ax-dl.entitlenow.com
(xilinx-ax-dl.entitlenow.com)|184.25.239.24|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8031 (7.8K) [application/x-gzip]
Saving to: ‘openDownload?filename=PMU_ROM.tar.gz’

openDownload?filename=PMU_ROM.tar.gz
100%[=====================================================================================================================>]
  7.84K  --.-KB/s    in 0s

2024-06-12 09:29:57 (87.0 MB/s) -
‘openDownload?filename=PMU_ROM.tar.gz’ saved [8031/8031]


The sha25sum matches what is in the recipe:

alex@alex-lx-laptop:~$ sha256sum openDownload\?filename\=PMU_ROM.tar.gz
f9a450ef960979463ea0a87a35fafb4a5b62d3a741de30cbcef04c8edc22a7cf
openDownload?filename=PMU_ROM.tar.gz

The report indicated that there was a broken redirect via amd.com
(which does require browser user-agent), but this seems to be no
longer happening. Can you check?

All of this is to say, I'm still reluctant to support making the code
more complex when there is no proven need for it.

Alex


On Wed, 12 Jun 2024 at 01:07, Livius via lists.yoctoproject.org
<egyszeregy=freemail.hu@lists.yoctoproject.org> wrote:
>
> From: Benjamin Szőke <egyszeregy@freemail.hu>
>
> Add "user_agent" optional parameter for wget fetcher to able
> to use it if HTTP servers block requests with the default wget
> user-agent.
>
> Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu>
> ---
>  conf/bitbake.conf                             |  1 +
>  .../bitbake-user-manual-fetching.rst          | 23 +++++++++++++------
>  .../bitbake-user-manual-ref-variables.rst     |  5 ++++
>  lib/bb/fetch2/wget.py                         | 11 +++++----
>  4 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/conf/bitbake.conf b/conf/bitbake.conf
> index f5a5a333a..a16f72d25 100644
> --- a/conf/bitbake.conf
> +++ b/conf/bitbake.conf
> @@ -44,3 +44,4 @@ TARGET_ARCH = "${BUILD_ARCH}"
>  TMPDIR = "${TOPDIR}/tmp"
>  WORKDIR = "${TMPDIR}/work/${PF}"
>  GITPKGV = "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
> +BB_FETCH_USER_AGENT ??= "bitbake/${BB_VERSION}"
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> index fb4f0a23d..0ee07992f 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> @@ -221,13 +221,21 @@ HTTP/FTP wget fetcher (``http://``, ``ftp://``, ``https://``)
>  This fetcher obtains files from web and FTP servers. Internally, the
>  fetcher uses the wget utility.
>
> -The executable and parameters used are specified by the
> -``FETCHCMD_wget`` variable, which defaults to sensible values. The
> -fetcher supports a parameter "downloadfilename" that allows the name of
> -the downloaded file to be specified. Specifying the name of the
> -downloaded file is useful for avoiding collisions in
> -:term:`DL_DIR` when dealing with multiple files that
> -have the same name.
> +The executable and parameters used are specified by the ``FETCHCMD_wget``
> +variable, which defaults to sensible values. The fetcher supports
> +parameters:
> +
> +-  *downloadfilename:* That allows the name of the downloaded file
> +   to be specified.
> +
> +-  *user_agent:* Enable to use a default ``bitbake/${BB_VERSION}`` user-agent
> +   which is defined in :term:`BB_FETCH_USER_AGENT`.
> +
> +Specifying the name of the downloaded file is useful for avoiding
> +collisions in :term:`DL_DIR` when dealing with multiple files
> +that have the same name. A few HTTP servers block requests with
> +the default wget user-agent, in this case specifying a valid
> +user-agent can solve this issue.
>
>  If a username and password are specified in the ``SRC_URI``, a Basic
>  Authorization header will be added to each request, including across redirects.
> @@ -239,6 +247,7 @@ Some example URLs are as follows::
>     SRC_URI = "http://oe.handhelds.org/not_there.aac"
>     SRC_URI = "ftp://oe.handhelds.org/not_there_as_well.aac"
>     SRC_URI = "ftp://you@oe.handhelds.org/home/you/secret.plan"
> +   SRC_URI = "https://oe.handhelds.org/not_there.aac;user_agent=1"
>
>  .. note::
>
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> index 899e584f9..3f310bd72 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
> @@ -699,6 +699,11 @@ overview of their function and contents.
>        Within an executing task, this variable holds the hash of the task as
>        returned by the currently enabled signature generator.
>
> +   :term:`BB_FETCH_USER_AGENT`
> +      Specifies a user-agent string which BitBake uses if "user_agent"
> +      parameter is enabled for HTTP/FTP wget fetcher. Default value can
> +      be found in ``conf/bitbake.conf``.
> +
>     :term:`BB_VERBOSE_LOGS`
>        Controls how verbose BitBake is during builds. If set, shell scripts
>        echo commands and shell script output appears on standard out
> diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
> index 2e9211763..30a56df71 100644
> --- a/lib/bb/fetch2/wget.py
> +++ b/lib/bb/fetch2/wget.py
> @@ -52,11 +52,8 @@ class WgetProgressHandler(bb.progress.LineFilterProgressHandler):
>
>  class Wget(FetchMethod):
>      """Class to fetch urls via 'wget'"""
> -
> -    # CDNs like CloudFlare may do a 'browser integrity test' which can fail
> -    # with the standard wget/urllib User-Agent, so pretend to be a modern
> -    # browser.
> -    user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
> +    def init(self, d):
> +        self.user_agent = d.getVar("BB_FETCH_USER_AGENT")
>
>      def check_certs(self, d):
>          """
> @@ -89,6 +86,10 @@ class Wget(FetchMethod):
>
>          self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t 2 -T 30"
>
> +        is_user_agent_enabled = ud.parm.get("user_agent","0") == "1"
> +        if is_user_agent_enabled:
> +            self.basecmd += f" --user-agent='{self.user_agent}'"
> +
>          if ud.type == 'ftp' or ud.type == 'ftps':
>              self.basecmd += " --passive-ftp"
>
> --
> 2.45.2.windows.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#5294): https://lists.yoctoproject.org/g/docs/message/5294
> Mute This Topic: https://lists.yoctoproject.org/mt/106623550/1686489
> Group Owner: docs+owner@lists.yoctoproject.org
> Unsubscribe: https://lists.yoctoproject.org/g/docs/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/conf/bitbake.conf b/conf/bitbake.conf
index f5a5a333a..a16f72d25 100644
--- a/conf/bitbake.conf
+++ b/conf/bitbake.conf
@@ -44,3 +44,4 @@  TARGET_ARCH =3D "${BUILD_ARCH}"
 TMPDIR =3D "${TOPDIR}/tmp"
 WORKDIR =3D "${TMPDIR}/work/${PF}"
 GITPKGV =3D "${@bb.fetch2.get_srcrev(d, 'gitpkgv_revision')}"
+BB_FETCH_USER_AGENT ??=3D "bitbake/${BB_VERSION}"
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/d=
oc/bitbake-user-manual/bitbake-user-manual-fetching.rst
index fb4f0a23d..0ee07992f 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
@@ -221,13 +221,21 @@  HTTP/FTP wget fetcher (``http://``, ``ftp://``, ``h=
ttps://``)
 This fetcher obtains files from web and FTP servers. Internally, the
 fetcher uses the wget utility.
=20
-The executable and parameters used are specified by the
-``FETCHCMD_wget`` variable, which defaults to sensible values. The
-fetcher supports a parameter "downloadfilename" that allows the name of
-the downloaded file to be specified. Specifying the name of the
-downloaded file is useful for avoiding collisions in
-:term:`DL_DIR` when dealing with multiple files that
-have the same name.
+The executable and parameters used are specified by the ``FETCHCMD_wget`=
`
+variable, which defaults to sensible values. The fetcher supports
+parameters:
+
+-  *downloadfilename:* That allows the name of the downloaded file
+   to be specified.
+
+-  *user_agent:* Enable to use a default ``bitbake/${BB_VERSION}`` user-=
agent
+   which is defined in :term:`BB_FETCH_USER_AGENT`.
+
+Specifying the name of the downloaded file is useful for avoiding
+collisions in :term:`DL_DIR` when dealing with multiple files
+that have the same name. A few HTTP servers block requests with
+the default wget user-agent, in this case specifying a valid
+user-agent can solve this issue.
=20
 If a username and password are specified in the ``SRC_URI``, a Basic
 Authorization header will be added to each request, including across red=
irects.
@@ -239,6 +247,7 @@  Some example URLs are as follows::
    SRC_URI =3D "http://oe.handhelds.org/not_there.aac"
    SRC_URI =3D "ftp://oe.handhelds.org/not_there_as_well.aac"
    SRC_URI =3D "ftp://you@oe.handhelds.org/home/you/secret.plan"
+   SRC_URI =3D "https://oe.handhelds.org/not_there.aac;user_agent=3D1"
=20
 .. note::
=20
diff --git a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rs=
t b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
index 899e584f9..3f310bd72 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
@@ -699,6 +699,11 @@  overview of their function and contents.
       Within an executing task, this variable holds the hash of the task=
 as
       returned by the currently enabled signature generator.
=20
+   :term:`BB_FETCH_USER_AGENT`
+      Specifies a user-agent string which BitBake uses if "user_agent"
+      parameter is enabled for HTTP/FTP wget fetcher. Default value can
+      be found in ``conf/bitbake.conf``.
+
    :term:`BB_VERBOSE_LOGS`
       Controls how verbose BitBake is during builds. If set, shell scrip=
ts
       echo commands and shell script output appears on standard out
diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
index 2e9211763..30a56df71 100644
--- a/lib/bb/fetch2/wget.py
+++ b/lib/bb/fetch2/wget.py
@@ -52,11 +52,8 @@  class WgetProgressHandler(bb.progress.LineFilterProgre=
ssHandler):
=20
 class Wget(FetchMethod):
     """Class to fetch urls via 'wget'"""
-
-    # CDNs like CloudFlare may do a 'browser integrity test' which can f=
ail
-    # with the standard wget/urllib User-Agent, so pretend to be a moder=
n
-    # browser.
-    user_agent =3D "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gec=
ko/20100101 Firefox/84.0"
+    def init(self, d):