mbox series

[RFC,00/15] Make mirror replacement syntax explicit

Message ID 20250205071538.2681-1-stefan.herbrechtsmeier-oss@weidmueller.com
Headers show
Series Make mirror replacement syntax explicit | expand

Message

Stefan Herbrechtsmeier Feb. 5, 2025, 7:15 a.m. UTC
From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>


The mirror replacement syntax contains many implicit transformations.
The path of the URI always contains the base name of the downloaded
filename. This makes it impossible to rename or remove the base name of
the original path. It prevents upstream mirror for SRC_URIS with a
downloadfilename parameter. The base name of the downloaded filename
makes it impossible to use the download mirror for SRC_URIs with
subfolders in the downloadfilename parameter. Altogether the implicit
transformation complicates the understanding of the replacements.

This series adds an additional replacement named DOWNLOADFILENAME. This
replacement contains the relative filename of the downloaded file or
mirror archive for git and hg. This allows the user to explicitly define
the behavior. The usage is equivalent to the PATH replacement for the
sstate mirror from file to https scheme.

git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
https?://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH

Without a replacement variable the mirror will use the same base name as
the origin SRC_URI. This allows the usage of private package manager
registry together with a downloadfilename parameter or the rename of the
base name.

https://registry.npmjs.org/  https://example.com/npm/registry/
https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz

The series adds heuristics to keep a backward compatibility to common
styles. Because of the ambiguity of the old style, it is advisable to
remove this compatibility sooner or later to avoid unexpected behavior.


Stefan Herbrechtsmeier (15):
  fetch2: remove unnecessary expand function calls
  fetch2: local: use path variable
  fetch2: remove unnecessary unquote
  fetch2: ssh: use common localpath handling
  fetch2: clearcase: remove double DL_DIR from localfile
  fetch2: remove basepath from FetchData
  fetch2: remove basename from FetchData
  fetch2: use localpath instead of localfile
  fetch2: make DOWNLOADFILENAME and PATH explicit in mirrors
  tests: fetch: unify style of replaceuris in MirrorUriTest
  tests: fetch: fix nonsensical replaceuris in MirrorUriTest
  tests: fetch: reenable replaceuris in MirrorUriTest
  tests: fetch: comment and add replaceuris in MirrorUriTest
  tests: fetch: add npm mirrors replaceuris in MirrorUriTest
  tests: fetch: fix nonsensical mirror uris in FetcherNetworkTest

 lib/bb/fetch2/__init__.py  | 62 ++++++++++++++++++++------------------
 lib/bb/fetch2/az.py        |  8 ++---
 lib/bb/fetch2/clearcase.py |  2 --
 lib/bb/fetch2/gcp.py       |  8 -----
 lib/bb/fetch2/local.py     |  8 ++---
 lib/bb/fetch2/npm.py       |  2 +-
 lib/bb/fetch2/repo.py      |  2 +-
 lib/bb/fetch2/s3.py        |  7 -----
 lib/bb/fetch2/sftp.py      | 10 +-----
 lib/bb/fetch2/ssh.py       |  3 +-
 lib/bb/fetch2/wget.py      | 11 +------
 lib/bb/tests/fetch.py      | 62 ++++++++++++++++++++++++++++++++------
 12 files changed, 98 insertions(+), 87 deletions(-)

Comments

Richard Purdie Feb. 5, 2025, 10:34 a.m. UTC | #1
On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> The mirror replacement syntax contains many implicit transformations.
> The path of the URI always contains the base name of the downloaded
> filename. This makes it impossible to rename or remove the base name of
> the original path. It prevents upstream mirror for SRC_URIS with a
> downloadfilename parameter. The base name of the downloaded filename
> makes it impossible to use the download mirror for SRC_URIs with
> subfolders in the downloadfilename parameter. Altogether the implicit
> transformation complicates the understanding of the replacements.
> 
> This series adds an additional replacement named DOWNLOADFILENAME. This
> replacement contains the relative filename of the downloaded file or
> mirror archive for git and hg. This allows the user to explicitly define
> the behavior. The usage is equivalent to the PATH replacement for the
> sstate mirror from file to https scheme.
> 
> git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
> https?://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
> 
> Without a replacement variable the mirror will use the same base name as
> the origin SRC_URI. This allows the usage of private package manager
> registry together with a downloadfilename parameter or the rename of the
> base name.
> 
> https://registry.npmjs.org/  https://example.com/npm/registry/
> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
> 
> The series adds heuristics to keep a backward compatibility to common
> styles. Because of the ambiguity of the old style, it is advisable to
> remove this compatibility sooner or later to avoid unexpected behavior.

Thanks for the patches, these look interesting with some good
improvements in there. A lot of the series looks like cleanups and
those look like good fixes to have. It may make sense to split this
series into two, the cleanups/fixes and the behaviour changes.

I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
think about this from the perspective of someone writing a MIRROR or
PREMIRROR entry - would they understand what that means vs some of the
other names?

I'm also a bit nervous about breaking compatibility with the older
syntaxes. It is unclear to me how or when  we'd detect/deprecate older
formats. I do agree we probably do need to remove some support for some
syntax and move to something new though as what we have is turning into
some kind of nightmare. FWIW this is really old code that in many ways
predates my involvement so probably 20+ years old.

I'll continue to give this some thought. If you could confirm which
patches are "cleanup" that would help though, we can see if we can get
those in a bit faster.

Cheers,

Richard
Stefan Herbrechtsmeier Feb. 5, 2025, 12:12 p.m. UTC | #2
Am 05.02.2025 um 11:34 schrieb Richard Purdie:
> On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> The mirror replacement syntax contains many implicit transformations.
>> The path of the URI always contains the base name of the downloaded
>> filename. This makes it impossible to rename or remove the base name of
>> the original path. It prevents upstream mirror for SRC_URIS with a
>> downloadfilename parameter. The base name of the downloaded filename
>> makes it impossible to use the download mirror for SRC_URIs with
>> subfolders in the downloadfilename parameter. Altogether the implicit
>> transformation complicates the understanding of the replacements.
>>
>> This series adds an additional replacement named DOWNLOADFILENAME. This
>> replacement contains the relative filename of the downloaded file or
>> mirror archive for git and hg. This allows the user to explicitly define
>> the behavior. The usage is equivalent to the PATH replacement for the
>> sstate mirror from file to https scheme.
>>
>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>> https?://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
>>
>> Without a replacement variable the mirror will use the same base name as
>> the origin SRC_URI. This allows the usage of private package manager
>> registry together with a downloadfilename parameter or the rename of the
>> base name.
>>
>> https://registry.npmjs.org/  https://example.com/npm/registry/
>> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
>>
>> The series adds heuristics to keep a backward compatibility to common
>> styles. Because of the ambiguity of the old style, it is advisable to
>> remove this compatibility sooner or later to avoid unexpected behavior.
> Thanks for the patches, these look interesting with some good
> improvements in there. A lot of the series looks like cleanups and
> those look like good fixes to have. It may make sense to split this
> series into two, the cleanups/fixes and the behaviour changes.
>
> I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
> think about this from the perspective of someone writing a MIRROR or
> PREMIRROR entry - would they understand what that means vs some of the
> other names?

I’m open for suggestions. Even ARCHIVE or TARBALL are hard to understand 
because it is only a relative path on the download mirror. Alternative 
we can mark the lines as upstream or download mirror and give the 
replacement different meanings. The path could be the original PATH for 
an upstream mirror or the relative path of the downloaded file for the 
download mirror.

> I'm also a bit nervous about breaking compatibility with the older
> syntaxes. It is unclear to me how or when  we'd detect/deprecate older
> formats. I do agree we probably do need to remove some support for some
> syntax and move to something new though as what we have is turning into
> some kind of nightmare. FWIW this is really old code that in many ways
> predates my involvement so probably 20+ years old.

The only possibility is to mark the new format because it is impossible 
to know if a replacement want to keep the base name of the path.

As example the following replacement is used in the test:

file:///some1where/.* file://some2where/

The plain regular expressions replaces the complete path with a new path 
but the old code appends only the base name of the path. What was the 
desired behavior?

file:///some1where/ file://some2where/
file:///some1where/.* file://some2where/BASENAME_OF_DOWNLOADFILENAME

The main problem is that we have to keep the bugs if we keep the old 
behavior.

> I'll continue to give this some thought. If you could confirm which
> patches are "cleanup" that would help though, we can see if we can get
> those in a bit faster.

The first 8 patches are cleanups.
Stefan Herbrechtsmeier Feb. 5, 2025, 12:30 p.m. UTC | #3
Am 05.02.2025 um 13:12 schrieb Stefan Herbrechtsmeier via 
lists.openembedded.org:
> Am 05.02.2025 um 11:34 schrieb Richard Purdie:
>> On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>> The mirror replacement syntax contains many implicit transformations.
>>> The path of the URI always contains the base name of the downloaded
>>> filename. This makes it impossible to rename or remove the base name of
>>> the original path. It prevents upstream mirror for SRC_URIS with a
>>> downloadfilename parameter. The base name of the downloaded filename
>>> makes it impossible to use the download mirror for SRC_URIs with
>>> subfolders in the downloadfilename parameter. Altogether the implicit
>>> transformation complicates the understanding of the replacements.
>>>
>>> This series adds an additional replacement named DOWNLOADFILENAME. This
>>> replacement contains the relative filename of the downloaded file or
>>> mirror archive for git and hg. This allows the user to explicitly define
>>> the behavior. The usage is equivalent to the PATH replacement for the
>>> sstate mirror from file to https scheme.
>>>
>>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>>> https?://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>>> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
>>>
>>> Without a replacement variable the mirror will use the same base name as
>>> the origin SRC_URI. This allows the usage of private package manager
>>> registry together with a downloadfilename parameter or the rename of the
>>> base name.
>>>
>>> https://registry.npmjs.org/  https://example.com/npm/registry/
>>> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
>>>
>>> The series adds heuristics to keep a backward compatibility to common
>>> styles. Because of the ambiguity of the old style, it is advisable to
>>> remove this compatibility sooner or later to avoid unexpected behavior.
>> Thanks for the patches, these look interesting with some good
>> improvements in there. A lot of the series looks like cleanups and
>> those look like good fixes to have. It may make sense to split this
>> series into two, the cleanups/fixes and the behaviour changes.
>>
>> I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
>> think about this from the perspective of someone writing a MIRROR or
>> PREMIRROR entry - would they understand what that means vs some of the
>> other names?
>
> I’m open for suggestions. Even ARCHIVE or TARBALL are hard to 
> understand because it is only a relative path on the download mirror. 
> Alternative we can mark the lines as upstream or download mirror and 
> give the replacement different meanings. The path could be the 
> original PATH for an upstream mirror or the relative path of the 
> downloaded file for the download mirror.
>
>> I'm also a bit nervous about breaking compatibility with the older
>> syntaxes. It is unclear to me how or when  we'd detect/deprecate older
>> formats. I do agree we probably do need to remove some support for some
>> syntax and move to something new though as what we have is turning into
>> some kind of nightmare. FWIW this is really old code that in many ways
>> predates my involvement so probably 20+ years old.
>
> The only possibility is to mark the new format because it is 
> impossible to know if a replacement want to keep the base name of the 
> path.
>
> As example the following replacement is used in the test:
>
> file:///some1where/.* file://some2where/
>
> The plain regular expressions replaces the complete path with a new 
> path but the old code appends only the base name of the path. What was 
> the desired behavior?
>
> file:///some1where/ file://some2where/
> file:///some1where/.* file://some2where/BASENAME_OF_DOWNLOADFILENAME
>
The BASENAME_OF_DOWNLOADFILENAME is wrong because the original is a 
local path and not a downloaded file:

file:///some1where/.* file://some2where/BASENAME

> The main problem is that we have to keep the bugs if we keep the old 
> behavior.
>
>> I'll continue to give this some thought. If you could confirm which
>> patches are "cleanup" that would help though, we can see if we can get
>> those in a bit faster.
>
> The first 8 patches are cleanups.
>