mbox series

[RFC,00/15] Make mirror replacement syntax explicit

Message ID 20250205071538.2681-1-stefan.herbrechtsmeier-oss@weidmueller.com
Headers show
Series Make mirror replacement syntax explicit | expand

Message

Stefan Herbrechtsmeier Feb. 5, 2025, 7:15 a.m. UTC
From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>


The mirror replacement syntax contains many implicit transformations.
The path of the URI always contains the base name of the downloaded
filename. This makes it impossible to rename or remove the base name of
the original path. It prevents upstream mirror for SRC_URIS with a
downloadfilename parameter. The base name of the downloaded filename
makes it impossible to use the download mirror for SRC_URIs with
subfolders in the downloadfilename parameter. Altogether the implicit
transformation complicates the understanding of the replacements.

This series adds an additional replacement named DOWNLOADFILENAME. This
replacement contains the relative filename of the downloaded file or
mirror archive for git and hg. This allows the user to explicitly define
the behavior. The usage is equivalent to the PATH replacement for the
sstate mirror from file to https scheme.

git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
https?://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH

Without a replacement variable the mirror will use the same base name as
the origin SRC_URI. This allows the usage of private package manager
registry together with a downloadfilename parameter or the rename of the
base name.

https://registry.npmjs.org/  https://example.com/npm/registry/
https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz

The series adds heuristics to keep a backward compatibility to common
styles. Because of the ambiguity of the old style, it is advisable to
remove this compatibility sooner or later to avoid unexpected behavior.


Stefan Herbrechtsmeier (15):
  fetch2: remove unnecessary expand function calls
  fetch2: local: use path variable
  fetch2: remove unnecessary unquote
  fetch2: ssh: use common localpath handling
  fetch2: clearcase: remove double DL_DIR from localfile
  fetch2: remove basepath from FetchData
  fetch2: remove basename from FetchData
  fetch2: use localpath instead of localfile
  fetch2: make DOWNLOADFILENAME and PATH explicit in mirrors
  tests: fetch: unify style of replaceuris in MirrorUriTest
  tests: fetch: fix nonsensical replaceuris in MirrorUriTest
  tests: fetch: reenable replaceuris in MirrorUriTest
  tests: fetch: comment and add replaceuris in MirrorUriTest
  tests: fetch: add npm mirrors replaceuris in MirrorUriTest
  tests: fetch: fix nonsensical mirror uris in FetcherNetworkTest

 lib/bb/fetch2/__init__.py  | 62 ++++++++++++++++++++------------------
 lib/bb/fetch2/az.py        |  8 ++---
 lib/bb/fetch2/clearcase.py |  2 --
 lib/bb/fetch2/gcp.py       |  8 -----
 lib/bb/fetch2/local.py     |  8 ++---
 lib/bb/fetch2/npm.py       |  2 +-
 lib/bb/fetch2/repo.py      |  2 +-
 lib/bb/fetch2/s3.py        |  7 -----
 lib/bb/fetch2/sftp.py      | 10 +-----
 lib/bb/fetch2/ssh.py       |  3 +-
 lib/bb/fetch2/wget.py      | 11 +------
 lib/bb/tests/fetch.py      | 62 ++++++++++++++++++++++++++++++++------
 12 files changed, 98 insertions(+), 87 deletions(-)

Comments

Richard Purdie Feb. 5, 2025, 10:34 a.m. UTC | #1
On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> The mirror replacement syntax contains many implicit transformations.
> The path of the URI always contains the base name of the downloaded
> filename. This makes it impossible to rename or remove the base name of
> the original path. It prevents upstream mirror for SRC_URIS with a
> downloadfilename parameter. The base name of the downloaded filename
> makes it impossible to use the download mirror for SRC_URIs with
> subfolders in the downloadfilename parameter. Altogether the implicit
> transformation complicates the understanding of the replacements.
> 
> This series adds an additional replacement named DOWNLOADFILENAME. This
> replacement contains the relative filename of the downloaded file or
> mirror archive for git and hg. This allows the user to explicitly define
> the behavior. The usage is equivalent to the PATH replacement for the
> sstate mirror from file to https scheme.
> 
> git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
> https?://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
> 
> Without a replacement variable the mirror will use the same base name as
> the origin SRC_URI. This allows the usage of private package manager
> registry together with a downloadfilename parameter or the rename of the
> base name.
> 
> https://registry.npmjs.org/  https://example.com/npm/registry/
> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
> 
> The series adds heuristics to keep a backward compatibility to common
> styles. Because of the ambiguity of the old style, it is advisable to
> remove this compatibility sooner or later to avoid unexpected behavior.

Thanks for the patches, these look interesting with some good
improvements in there. A lot of the series looks like cleanups and
those look like good fixes to have. It may make sense to split this
series into two, the cleanups/fixes and the behaviour changes.

I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
think about this from the perspective of someone writing a MIRROR or
PREMIRROR entry - would they understand what that means vs some of the
other names?

I'm also a bit nervous about breaking compatibility with the older
syntaxes. It is unclear to me how or when  we'd detect/deprecate older
formats. I do agree we probably do need to remove some support for some
syntax and move to something new though as what we have is turning into
some kind of nightmare. FWIW this is really old code that in many ways
predates my involvement so probably 20+ years old.

I'll continue to give this some thought. If you could confirm which
patches are "cleanup" that would help though, we can see if we can get
those in a bit faster.

Cheers,

Richard
Stefan Herbrechtsmeier Feb. 5, 2025, 12:12 p.m. UTC | #2
Am 05.02.2025 um 11:34 schrieb Richard Purdie:
> On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> The mirror replacement syntax contains many implicit transformations.
>> The path of the URI always contains the base name of the downloaded
>> filename. This makes it impossible to rename or remove the base name of
>> the original path. It prevents upstream mirror for SRC_URIS with a
>> downloadfilename parameter. The base name of the downloaded filename
>> makes it impossible to use the download mirror for SRC_URIs with
>> subfolders in the downloadfilename parameter. Altogether the implicit
>> transformation complicates the understanding of the replacements.
>>
>> This series adds an additional replacement named DOWNLOADFILENAME. This
>> replacement contains the relative filename of the downloaded file or
>> mirror archive for git and hg. This allows the user to explicitly define
>> the behavior. The usage is equivalent to the PATH replacement for the
>> sstate mirror from file to https scheme.
>>
>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>> https?://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
>>
>> Without a replacement variable the mirror will use the same base name as
>> the origin SRC_URI. This allows the usage of private package manager
>> registry together with a downloadfilename parameter or the rename of the
>> base name.
>>
>> https://registry.npmjs.org/  https://example.com/npm/registry/
>> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
>>
>> The series adds heuristics to keep a backward compatibility to common
>> styles. Because of the ambiguity of the old style, it is advisable to
>> remove this compatibility sooner or later to avoid unexpected behavior.
> Thanks for the patches, these look interesting with some good
> improvements in there. A lot of the series looks like cleanups and
> those look like good fixes to have. It may make sense to split this
> series into two, the cleanups/fixes and the behaviour changes.
>
> I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
> think about this from the perspective of someone writing a MIRROR or
> PREMIRROR entry - would they understand what that means vs some of the
> other names?

I’m open for suggestions. Even ARCHIVE or TARBALL are hard to understand 
because it is only a relative path on the download mirror. Alternative 
we can mark the lines as upstream or download mirror and give the 
replacement different meanings. The path could be the original PATH for 
an upstream mirror or the relative path of the downloaded file for the 
download mirror.

> I'm also a bit nervous about breaking compatibility with the older
> syntaxes. It is unclear to me how or when  we'd detect/deprecate older
> formats. I do agree we probably do need to remove some support for some
> syntax and move to something new though as what we have is turning into
> some kind of nightmare. FWIW this is really old code that in many ways
> predates my involvement so probably 20+ years old.

The only possibility is to mark the new format because it is impossible 
to know if a replacement want to keep the base name of the path.

As example the following replacement is used in the test:

file:///some1where/.* file://some2where/

The plain regular expressions replaces the complete path with a new path 
but the old code appends only the base name of the path. What was the 
desired behavior?

file:///some1where/ file://some2where/
file:///some1where/.* file://some2where/BASENAME_OF_DOWNLOADFILENAME

The main problem is that we have to keep the bugs if we keep the old 
behavior.

> I'll continue to give this some thought. If you could confirm which
> patches are "cleanup" that would help though, we can see if we can get
> those in a bit faster.

The first 8 patches are cleanups.
Stefan Herbrechtsmeier Feb. 5, 2025, 12:30 p.m. UTC | #3
Am 05.02.2025 um 13:12 schrieb Stefan Herbrechtsmeier via 
lists.openembedded.org:
> Am 05.02.2025 um 11:34 schrieb Richard Purdie:
>> On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>> The mirror replacement syntax contains many implicit transformations.
>>> The path of the URI always contains the base name of the downloaded
>>> filename. This makes it impossible to rename or remove the base name of
>>> the original path. It prevents upstream mirror for SRC_URIS with a
>>> downloadfilename parameter. The base name of the downloaded filename
>>> makes it impossible to use the download mirror for SRC_URIs with
>>> subfolders in the downloadfilename parameter. Altogether the implicit
>>> transformation complicates the understanding of the replacements.
>>>
>>> This series adds an additional replacement named DOWNLOADFILENAME. This
>>> replacement contains the relative filename of the downloaded file or
>>> mirror archive for git and hg. This allows the user to explicitly define
>>> the behavior. The usage is equivalent to the PATH replacement for the
>>> sstate mirror from file to https scheme.
>>>
>>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>>> https?://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>>> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
>>>
>>> Without a replacement variable the mirror will use the same base name as
>>> the origin SRC_URI. This allows the usage of private package manager
>>> registry together with a downloadfilename parameter or the rename of the
>>> base name.
>>>
>>> https://registry.npmjs.org/  https://example.com/npm/registry/
>>> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
>>>
>>> The series adds heuristics to keep a backward compatibility to common
>>> styles. Because of the ambiguity of the old style, it is advisable to
>>> remove this compatibility sooner or later to avoid unexpected behavior.
>> Thanks for the patches, these look interesting with some good
>> improvements in there. A lot of the series looks like cleanups and
>> those look like good fixes to have. It may make sense to split this
>> series into two, the cleanups/fixes and the behaviour changes.
>>
>> I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
>> think about this from the perspective of someone writing a MIRROR or
>> PREMIRROR entry - would they understand what that means vs some of the
>> other names?
>
> I’m open for suggestions. Even ARCHIVE or TARBALL are hard to 
> understand because it is only a relative path on the download mirror. 
> Alternative we can mark the lines as upstream or download mirror and 
> give the replacement different meanings. The path could be the 
> original PATH for an upstream mirror or the relative path of the 
> downloaded file for the download mirror.
>
>> I'm also a bit nervous about breaking compatibility with the older
>> syntaxes. It is unclear to me how or when  we'd detect/deprecate older
>> formats. I do agree we probably do need to remove some support for some
>> syntax and move to something new though as what we have is turning into
>> some kind of nightmare. FWIW this is really old code that in many ways
>> predates my involvement so probably 20+ years old.
>
> The only possibility is to mark the new format because it is 
> impossible to know if a replacement want to keep the base name of the 
> path.
>
> As example the following replacement is used in the test:
>
> file:///some1where/.* file://some2where/
>
> The plain regular expressions replaces the complete path with a new 
> path but the old code appends only the base name of the path. What was 
> the desired behavior?
>
> file:///some1where/ file://some2where/
> file:///some1where/.* file://some2where/BASENAME_OF_DOWNLOADFILENAME
>
The BASENAME_OF_DOWNLOADFILENAME is wrong because the original is a 
local path and not a downloaded file:

file:///some1where/.* file://some2where/BASENAME

> The main problem is that we have to keep the bugs if we keep the old 
> behavior.
>
>> I'll continue to give this some thought. If you could confirm which
>> patches are "cleanup" that would help though, we can see if we can get
>> those in a bit faster.
>
> The first 8 patches are cleanups.
>
Richard Purdie Feb. 20, 2025, 10:22 a.m. UTC | #4
On Wed, 2025-02-05 at 13:12 +0100, Stefan Herbrechtsmeier wrote:
>  Am 05.02.2025 um 11:34 schrieb Richard Purdie:
>  On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> > > The mirror replacement syntax contains many implicit transformations.
> > > The path of the URI always contains the base name of the downloaded
> > > filename. This makes it impossible to rename or remove the base name of
> > > the original path. It prevents upstream mirror for SRC_URIS with a
> > > downloadfilename parameter. The base name of the downloaded filename
> > > makes it impossible to use the download mirror for SRC_URIs with
> > > subfolders in the downloadfilename parameter. Altogether the implicit
> > > transformation complicates the understanding of the replacements.
> > > 
> > > This series adds an additional replacement named DOWNLOADFILENAME. This
> > > replacement contains the relative filename of the downloaded file or
> > > mirror archive for git and hg. This allows the user to explicitly define
> > > the behavior. The usage is equivalent to the PATH replacement for the
> > > sstate mirror from file to https scheme.
> > > 
> > > git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
> > > https?://.*/.*  http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
> > > file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
> > > 
> > > Without a replacement variable the mirror will use the same base name as
> > > the origin SRC_URI. This allows the usage of private package manager
> > > registry together with a downloadfilename parameter or the rename of the
> > > base name.
> > > 
> > > https://registry.npmjs.org/  https://example.com/npm/registry/
> > > https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
> > > 
> > > The series adds heuristics to keep a backward compatibility to common
> > > styles. Because of the ambiguity of the old style, it is advisable to
> > > remove this compatibility sooner or later to avoid unexpected behavior.
> > >  
> > Thanks for the patches, these look interesting with some good
> > improvements in there. A lot of the series looks like cleanups and
> > those look like good fixes to have. It may make sense to split this
> > series into two, the cleanups/fixes and the behaviour changes.
> > 
> > I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
> > think about this from the perspective of someone writing a MIRROR or
> > PREMIRROR entry - would they understand what that means vs some of the
> > other names?
> >  
> I’m open for suggestions. Even ARCHIVE or TARBALL are hard to
> understand because it is only a relative path on the download mirror.
> Alternative we can mark the lines as upstream or download mirror and
> give the replacement different meanings. The path could be the
> original PATH for an upstream mirror or the relative path of the
> downloaded file for the download mirror.

I've been giving this topic some thought. One idea I wondered about was
to instead markup the mirror urls with how they're expected to work
with a new parameter. For example:

git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball

The possible options would be something like:

mirrortarball - mirror tarballs taken from DL_DIR
flattened - copy of DL_DIR so DL_DIR layout (maybe call it dldir?)
upstream - layout is the same as the upstream directory structure so a direct url replacement

If using a mirrortarball mirror url, we'd know to use the values from
urldata.mirrortarballs. We could add parameters to the fetcher to have
two parameters, one will be the DL_DIR path and the other would be the
upstream url path. One key question I have is how we might need to
shorted the url path for some mirror urls to add/remove a path prefix
in the mirroring.

I think that would cover most of our scenarios and make the mirror urls
much more useful/explicit.

It also would give us a migration path as the code can simply error if
it sees a mirror url without a mirrorformat parameter. This may mean we
could have a clean slate for the mirror urls and switch to one clear
format. I think we can make a case for such a breaking architecture
change.

To move it forward we'd need to work out which pieces of mirror syntax
we should drop, which mirror replacement strings we need (I'm not
convinced the current ones are right/useful). We'd probably need a some
proof of concept patches showing it in action and a proposal to the oe-
arch list explaining why the change was needed and what the change
would do/look like.

I think this can be done independently of the other fetcher/vendor
changing but should be able to work for the vendoring issues too.

Thoughts? I'm sure I'm missing things here :/.

Cheers,

Richard
Stefan Herbrechtsmeier Feb. 20, 2025, 11:45 a.m. UTC | #5
Am 20.02.2025 um 11:22 schrieb Richard Purdie via lists.openembedded.org:
> On Wed, 2025-02-05 at 13:12 +0100, Stefan Herbrechtsmeier wrote:
>>   Am 05.02.2025 um 11:34 schrieb Richard Purdie:
>>   On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>>> The mirror replacement syntax contains many implicit transformations.
>>>> The path of the URI always contains the base name of the downloaded
>>>> filename. This makes it impossible to rename or remove the base name of
>>>> the original path. It prevents upstream mirror for SRC_URIS with a
>>>> downloadfilename parameter. The base name of the downloaded filename
>>>> makes it impossible to use the download mirror for SRC_URIs with
>>>> subfolders in the downloadfilename parameter. Altogether the implicit
>>>> transformation complicates the understanding of the replacements.
>>>>
>>>> This series adds an additional replacement named DOWNLOADFILENAME. This
>>>> replacement contains the relative filename of the downloaded file or
>>>> mirror archive for git and hg. This allows the user to explicitly define
>>>> the behavior. The usage is equivalent to the PATH replacement for the
>>>> sstate mirror from file to https scheme.
>>>>
>>>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>>>> https?://.*/.*http://downloads.yoctoproject.org/mirror/sources/DOWNLOADFILENAME
>>>> file://.*  https://sstate.yoctoproject.org/all/PATH;downloadfilename=PATH
>>>>
>>>> Without a replacement variable the mirror will use the same base name as
>>>> the origin SRC_URI. This allows the usage of private package manager
>>>> registry together with a downloadfilename parameter or the rename of the
>>>> base name.
>>>>
>>>> https://registry.npmjs.org/  https://example.com/npm/registry/
>>>> https://example.com/example/1.0.0.tgz  https://example.com/example/example-1.0.0.tgz
>>>>
>>>> The series adds heuristics to keep a backward compatibility to common
>>>> styles. Because of the ambiguity of the old style, it is advisable to
>>>> remove this compatibility sooner or later to avoid unexpected behavior.
>>>>   
>>> Thanks for the patches, these look interesting with some good
>>> improvements in there. A lot of the series looks like cleanups and
>>> those look like good fixes to have. It may make sense to split this
>>> series into two, the cleanups/fixes and the behaviour changes.
>>>
>>> I'm not entirely "sold" on the naming of DOWNLOADFILENAME. You have to
>>> think about this from the perspective of someone writing a MIRROR or
>>> PREMIRROR entry - would they understand what that means vs some of the
>>> other names?
>>>   
>> I’m open for suggestions. Even ARCHIVE or TARBALL are hard to
>> understand because it is only a relative path on the download mirror.
>> Alternative we can mark the lines as upstream or download mirror and
>> give the replacement different meanings. The path could be the
>> original PATH for an upstream mirror or the relative path of the
>> downloaded file for the download mirror.
> I've been giving this topic some thought. One idea I wondered about was
> to instead markup the mirror urls with how they're expected to work
> with a new parameter. For example:
>
> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball
The ? could be problematic because it is the separator for the query. It 
is unlikely that the user really use this query parameter but it could 
complicate the code because we have to handle additional query parameters.
What does the "?mirrorformat=mirrortarball" mean? Will it work like a 
MIRRORTARBALL replacement?

How does a simple replacement should look like?

http://  https://

Because of the backward compatible this will replace the basename of the 
path.

> The possible options would be something like:
>
> mirrortarball - mirror tarballs taken from DL_DIR
> flattened - copy of DL_DIR so DL_DIR layout (maybe call it dldir?)
> upstream - layout is the same as the upstream directory structure so a direct url replacement
Do you think we have to handle the mirror tarball explicit? The mirror 
tarball is required for a scheme change.

> If using a mirrortarball mirror url, we'd know to use the values from
> urldata.mirrortarballs. We could add parameters to the fetcher to have
> two parameters, one will be the DL_DIR path and the other would be the
> upstream url path.
I don't understand where this is needed, because the mirror tarball and 
downloadfilename are used by different fetchers.

> One key question I have is how we might need to
> shorted the url path for some mirror urls to add/remove a path prefix
> in the mirroring.
What do you mean by this? The downloadfilename could contain a path 
without any problem after my change.

> I think that would cover most of our scenarios and make the mirror urls
> much more useful/explicit.
>
> It also would give us a migration path as the code can simply error if
> it sees a mirror url without a mirrorformat parameter. This may mean we
> could have a clean slate for the mirror urls and switch to one clear
> format. I think we can make a case for such a breaking architecture
> change.
Does this means we doesn't need a backward compatibility?

Alternative we could make some replacement mandatory if a wildcard is 
used to detect obsolete entries.

> To move it forward we'd need to work out which pieces of mirror syntax
> we should drop, which mirror replacement strings we need (I'm not
> convinced the current ones are right/useful). We'd probably need a some
> proof of concept patches showing it in action and a proposal to the oe-
> arch list explaining why the change was needed and what the change
> would do/look like.
>
> I think this can be done independently of the other fetcher/vendor
> changing but should be able to work for the vendoring issues too.
Sure. I can rework my patch as soon as we have an agreement.
Richard Purdie Feb. 20, 2025, 12:21 p.m. UTC | #6
On Thu, 2025-02-20 at 12:45 +0100, Stefan Herbrechtsmeier via
lists.openembedded.org wrote:
>  
> Am 20.02.2025 um 11:22 schrieb Richard Purdie via
> lists.openembedded.org:
> > On Wed, 2025-02-05 at 13:12 +0100, Stefan Herbrechtsmeier wrote:
> > >  Am 05.02.2025 um 11:34 schrieb Richard Purdie:
> > >  On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> > > I’m open for suggestions. Even ARCHIVE or TARBALL are hard to
> > > understand because it is only a relative path on the download mirror.
> > > Alternative we can mark the lines as upstream or download mirror and
> > > give the replacement different meanings. The path could be the
> > > original PATH for an upstream mirror or the relative path of the
> > > downloaded file for the download mirror.
> > >  
> >  
> > I've been giving this topic some thought. One idea I wondered about was
> > to instead markup the mirror urls with how they're expected to work
> > with a new parameter. For example:
> > 
> > git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball
> >  
> The ? could be problematic because it is the separator for the query.
> It is unlikely that the user really use this query parameter but it
> could complicate the code because we have to handle additional query
> parameters.
> What does the "?mirrorformat=mirrortarball" mean? Will it work like a
> MIRRORTARBALL replacement?

The mirrorformat parameter would be used by the mirroring code itself
to understand how to handle the url. It would be dropped from the
modified url so is only therefore our code's use. If there are
additional parameters they would be passed through as they are now.

> How does a simple replacement should look like?
> 
> http://  https://
> 
> Because of the backward compatible this will replace the basename of
> the path.

It would depend how the mirror is laid out. Some mirrors flatten the
urls like DL_DIR is laid out, some potentially don't. The standard
usage would likely have a mirrorformat=dldir parameter added.


> > The possible options would be something like:
> > 
> > mirrortarball - mirror tarballs taken from DL_DIR
> > flattened - copy of DL_DIR so DL_DIR layout (maybe call it dldir?)
> > upstream - layout is the same as the upstream directory structure so a direct url replacement
> >  
> Do you think we have to handle the mirror tarball explicit? The
> mirror tarball is required for a scheme change.

If we do that, we can avoid having to guess at too many urls to test to
figure out a mirror format so I think it would be an improvement on
where we are today.

> > If using a mirrortarball mirror url, we'd know to use the values from
> > urldata.mirrortarballs. We could add parameters to the fetcher to have
> > two parameters, one will be the DL_DIR path and the other would be the
> > upstream url path.
> >  
> I don't understand where this is needed, because the mirror tarball
> and downloadfilename are used by different fetchers.

Please keep in mind that downloadfilename is pretty much a misfeature.
It was added as we couldn't control collisions inside dl_dir but it
creates all kind of other problems. I think we do need to handle that
problem case but it does then mean we have to indicate whether any
given mirror uses "dldir" or "upstream" names and paths.

> > One key question I have is how we might need to
> > shorted the url path for some mirror urls to add/remove a path prefix
> > in the mirroring.
> >  
> What do you mean by this? The downloadfilename could contain a path
> without any problem after my change.

See above, downloadfilename is not something I'm keen to promote and is
creating several of the problems we have by badly trying to hack extra
functionality onto the fetcher without thinking through all the issues
like mirroring.

This is about the fact that you could have:

htttps://some.server/some/deep/multi/level/path/

and a mirror of things there at:

htttps://some.server/shortpath/path/

so the mirror urls need to be able to map

htttps://some.server/some/deep/multi/level/path/a/b/c.tgz

to

htttps://some.server/shortpath/path/a/b/c.tgz

but also:

htttps://some.server/shortpath/path/c.tgz

or possibly

htttps://some.server/shortpath/path.d.tgz

if downloadfilename is in action.

> >  
> > I think that would cover most of our scenarios and make the mirror urls
> > much more useful/explicit.
> > 
> > It also would give us a migration path as the code can simply error if
> > it sees a mirror url without a mirrorformat parameter. This may mean we
> > could have a clean slate for the mirror urls and switch to one clear
> > format. I think we can make a case for such a breaking architecture
> > change.
> >  
> Does this means we doesn't need a backward compatibility?

If we can come up with sufficient justification to break things and can
detect and error correctly for old usage and get this signed off by the
TSC/community, potentially, yes.

> Alternative we could make some replacement mandatory if a wildcard is
> used to detect obsolete entries.

I don't understand that. 

I do think we have too many problems in the existing mirroring url
mapping and we probably need to rework this rather than try and pile
more patches into it and complicate it further.

The question is whether the proposal fixes the issues it needs to and
has enough simplification and benefit to justify making the change.

Cheers,

Richard
Stefan Herbrechtsmeier Feb. 20, 2025, 5:37 p.m. UTC | #7
Am 20.02.2025 um 13:21 schrieb Richard Purdie:
> On Thu, 2025-02-20 at 12:45 +0100, Stefan Herbrechtsmeier via 
> lists.openembedded.org wrote:
>> Am 20.02.2025 um 11:22 schrieb Richard Purdie via lists.openembedded.org:
>>> On Wed, 2025-02-05 at 13:12 +0100, Stefan Herbrechtsmeier wrote:
>>>>   Am 05.02.2025 um 11:34 schrieb Richard Purdie:
>>>>   On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>>> I’m open for suggestions. Even ARCHIVE or TARBALL are hard to
>>>> understand because it is only a relative path on the download mirror.
>>>> Alternative we can mark the lines as upstream or download mirror and
>>>> give the replacement different meanings. The path could be the
>>>> original PATH for an upstream mirror or the relative path of the
>>>> downloaded file for the download mirror.
>>> I've been giving this topic some thought. One idea I wondered about was
>>> to instead markup the mirror urls with how they're expected to work
>>> with a new parameter. For example:
>>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball
>> The ? could be problematic because it is the separator for the query. 
>> It is unlikely that the user really use this query parameter but it 
>> could complicate the code because we have to handle additional query 
>> parameters.
>> What does the "?mirrorformat=mirrortarball" mean? Will it work like a 
>> MIRRORTARBALL replacement?
>
> The mirrorformat parameter would be used by the mirroring code itself 
> to understand how to handle the url.

What is the different to a MIRRORTARBALL replacement? The code will 
replace the word with the content.

> It would be dropped from the modified url so is only therefore our 
> code's use. If there are additional parameters they would be passed 
> through as they are now.
>
>> How does a simple replacement should look like?
>>
>> http://  https://
>>
>> Because of the backward compatible this will replace the basename of 
>> the path.
>
> It would depend how the mirror is laid out. Some mirrors flatten the 
> urls like DL_DIR is laid out, some potentially don't. The standard 
> usage would likely have a mirrorformat=dldir parameter added.

How does the user specify an entry that replace the http scheme with 
https and keeps everything else like it is (upstream mirror)?

>>> The possible options would be something like:
>>> mirrortarball - mirror tarballs taken from DL_DIR
>>> flattened - copy of DL_DIR so DL_DIR layout (maybe call it dldir?)
>>> upstream - layout is the same as the upstream directory structure so a direct url replacement
>> Do you think we have to handle the mirror tarball explicit? The 
>> mirror tarball is required for a scheme change.
>
> If we do that, we can avoid having to guess at too many urls to test 
> to figure out a mirror format so I think it would be an improvement on 
> where we are today.

Do you mean we will test if the URL have a mirrortarball and if not skip 
the entry?

>
>>> If using a mirrortarball mirror url, we'd know to use the values from
>>> urldata.mirrortarballs. We could add parameters to the fetcher to have
>>> two parameters, one will be the DL_DIR path and the other would be the
>>> upstream url path.
>> I don't understand where this is needed, because the mirror tarball 
>> and downloadfilename are used by different fetchers.
>
> Please keep in mind that downloadfilename is pretty much a misfeature. 
> It was added as we couldn't control collisions inside dl_dir but it 
> creates all kind of other problems. I think we do need to handle that 
> problem case but it does then mean we have to indicate whether any 
> given mirror uses "dldir" or "upstream" names and paths.

I don't understand the problem. The download mirror will use the 
downloadfilename or its default the basename of the localpath. The 
upstream mirror will use the path. The mirrortarball will use the 
mirrortarball. Why the fetcher need two parameters?

>>> One key question I have is how we might need to
>>> shorted the url path for some mirror urls to add/remove a path prefix
>>> in the mirroring.
>> What do you mean by this? The downloadfilename could contain a path 
>> without any problem after my change.
>
> See above, downloadfilename is not something I'm keen to promote and 
> is creating several of the problems we have by badly trying to hack 
> extra functionality onto the fetcher without thinking through all the 
> issues like mirroring.

What is the desired way to avoid name clashes? The package manager 
fetcher need an generic way to override the basename.

> This is about the fact that you could have:
>
> htttps://some.server/some/deep/multi/level/path/
>
> and a mirror of things there at:
>
> htttps://some.server/shortpath/path/
>
> so the mirror urls need to be able to map
>
> htttps://some.server/some/deep/multi/level/path/a/b/c.tgz

This could be solved if we use my proposed code.

> to
>
> htttps://some.server/shortpath/path/a/b/c.tgz

htttps://some.server/some/deep/multi/level/path/ 
htttps://some.server/shortpath/path/

> but also:
>
> htttps://some.server/shortpath/path/c.tgz

htttps://some.server/some/deep/multi/level/path/.* 
htttps://some.server/shortpath/path/DOWNLOADFILENAME

> or possibly
>
> htttps://some.server/shortpath/path.d.tgz

> if downloadfilename is in action.

Because the fallback for the downloadfilename is the basename the same 
entry works.

htttps://some.server/some/deep/multi/level/path/.* 
htttps://some.server/shortpath/DOWNLOADFILENAME

>>> I think that would cover most of our scenarios and make the mirror urls
>>> much more useful/explicit.
>>> It also would give us a migration path as the code can simply error if
>>> it sees a mirror url without a mirrorformat parameter. This may mean we
>>> could have a clean slate for the mirror urls and switch to one clear
>>> format. I think we can make a case for such a breaking architecture
>>> change.
>> Does this means we doesn't need a backward compatibility?
>
> If we can come up with sufficient justification to break things and 
> can detect and error correctly for old usage and get this signed off 
> by the TSC/community, potentially, yes.

Okay

>> Alternative we could make some replacement mandatory if a wildcard is 
>> used to detect obsolete entries.
>
> I don't understand that.

If we have a wildcard .* in the path we need to know how to replace it. 
This could be the PATH, BASENAME, DOWNLOADFILENAME, MIRRORARCHIVE or re 
group. But in case of the re group this could still be an old entry.

Do support all cases we need a fix prefix or delimiter:

r:http https

http#https
http|https

http?://.*/.*|http://downloads.yoctoproject.org/mirror/sources/DOWNLOADARCHIVE
git://.*/.*|http://downloads.yoctoproject.org/mirror/sources/MIRRORARCHIVE

> I do think we have too many problems in the existing mirroring url 
> mapping and we probably need to rework this rather than try and pile 
> more patches into it and complicate it further.
I have already rework it. If I can remove the backward compatibility and 
replace it with an error this would simplify the code. I only need a 
better name for the DOWNLOADFILENAME (DOWNLOADARCHIVE) and add the 
MIRRORARCHIV.

> The question is whether the proposal fixes the issues it needs to and 
> has enough simplification and benefit to justify making the change.
My patches support folders in the downloadfilename, upstream mirrors and 
renames. I have to rework the MIRROR strings but therefore the commented 
out tests work.
Richard Purdie Feb. 20, 2025, 10 p.m. UTC | #8
On Thu, 2025-02-20 at 18:37 +0100, Stefan Herbrechtsmeier wrote:
>  
> Am 20.02.2025 um 13:21 schrieb Richard Purdie:
> > On Thu, 2025-02-20 at 12:45 +0100, Stefan Herbrechtsmeier via
> > lists.openembedded.org wrote:
> > > Am 20.02.2025 um 11:22 schrieb Richard Purdie via
> > > lists.openembedded.org:
> > > > On Wed, 2025-02-05 at 13:12 +0100, Stefan Herbrechtsmeier wrote:
> > > > >  Am 05.02.2025 um 11:34 schrieb Richard Purdie:
> > > > >  
> > > > >  On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> > > > >  
> > > > > I’m open for suggestions. Even ARCHIVE or TARBALL are hard to
> > > > >  
> > > > > understand because it is only a relative path on the download mirror.
> > > > >  
> > > > > Alternative we can mark the lines as upstream or download mirror and
> > > > >  
> > > > > give the replacement different meanings. The path could be the
> > > > >  
> > > > > original PATH for an upstream mirror or the relative path of the
> > > > >  
> > > > > downloaded file for the download mirror.
> > > > >  
> > > > >  
> > > > >  
> > > >  
> > > >  
> > > >  
> > > > I've been giving this topic some thought. One idea I wondered about was
> > > >  
> > > > to instead markup the mirror urls with how they're expected to work
> > > >  
> > > > with a new parameter. For example:
> > > >   
> > > > git://.*/.*  http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball
> > > >  
> > > >  
> > > >  
> > >  
> > > The ? could be problematic because it is the separator for the
> > > query. It is unlikely that the user really use this query
> > > parameter but it could complicate the code because we have to
> > > handle additional query parameters.
> > > What does the "?mirrorformat=mirrortarball" mean? Will it work
> > > like a MIRRORTARBALL replacement?
> > >  
> >  
> > 
> >  
> >  
> > The mirrorformat parameter would be used by the mirroring code
> > itself to understand how to handle the url.
> >  
>  
> What is the different to a MIRRORTARBALL replacement? The code will
> replace the word with the content.
Think about this from a usability perspective. We're struggling to even
work out good names for your proposal. Even if we work out the names, I
still don't think users are going to understand how to convert urls
into the new syntax.

The difference with my proposed format is that we're specifying it in a
way which I suspect users will better understand without needing to go
and read the docs every time. We're saying what we're configuring with
the "mirrorformat" key and then the value should be able to describe
the format. 

My proposal also gives us both a way to clearly detect when obsolete
formatting is used and a namespace mechanism to extend, with both being
in a way we can easily and clearly describe in the docs.

I appreciate with your proposal we can add more strings and we can add
docs about how to migrate but I suspect users aren't going to be as
readily/easily able to understand it.


> > It would be dropped from the modified url so is only therefore our
> > code's use. If there are additional parameters they would be passed
> > through as they are now.
> >  
> > 
> >  
> >  
> > >  
> > > How does a simple replacement should look like?
> > > 
> > > http://  https://
> > > 
> > > Because of the backward compatible this will replace the basename
> > > of the path.
> > >  
> >  
> > 
> >  
> >  
> > It would depend how the mirror is laid out. Some mirrors flatten
> > the urls like DL_DIR is laid out, some potentially don't. The
> > standard usage would likely have a mirrorformat=dldir parameter
> > added.
> >  
>  
> How does the user specify an entry that replace the http scheme with
> https and keeps everything else like it is (upstream mirror)?

http://.*/.* https://.*/.*?mirrorformat=upstream

We need to determine the best value for "upstream". I'd also like to
review whether the .* formatting is the best way to handle this if we
are going to change the format.

> > > > The possible options would be something like:
> > > >   
> > > > mirrortarball - mirror tarballs taken from DL_DIR
> > > >  
> > > > flattened - copy of DL_DIR so DL_DIR layout (maybe call it dldir?)
> > > >  
> > > > upstream - layout is the same as the upstream directory structure so a direct url replacement
> > > >  
> > > >  
> > > >  
> > >  
> > > Do you think we have to handle the mirror tarball explicit? The
> > > mirror tarball is required for a scheme change.
> > >  
> >  
> > 
> >  
> >  
> > If we do that, we can avoid having to guess at too many urls to
> > test to figure out a mirror format so I think it would be an
> > improvement on where we are today.
> >  
>  
> Do you mean we will test if the URL have a mirrortarball and if not
> skip the entry?
Correct.

> > > > If using a mirrortarball mirror url, we'd know to use the values from
> > > >  
> > > > urldata.mirrortarballs. We could add parameters to the fetcher to have
> > > >  
> > > > two parameters, one will be the DL_DIR path and the other would be the
> > > >  
> > > > upstream url path.
> > > >  
> > > >  
> > > >  
> > >  
> > > I don't understand where this is needed, because the mirror
> > > tarball and downloadfilename are used by different fetchers.
> > >  
> >  
> > 
> >  
> >  
> > Please keep in mind that downloadfilename is pretty much a
> > misfeature. It was added as we couldn't control collisions inside
> > dl_dir but it creates all kind of other problems. I think we do
> > need to handle that problem case but it does then mean we have to
> > indicate whether any given mirror uses "dldir" or "upstream" names
> > and paths.
> >  
>  
> I don't understand the problem. The download mirror will use the
> downloadfilename or its default the basename of the localpath. The
> upstream mirror will use the path. The mirrortarball will use the
> mirrortarball. Why the fetcher need two parameters?
You are trying to make downloadfilename a supported parameter of every
fetcher. I'm arguing that I wish we'd never added it at all and that
I'd rather not use it or encourage its use. I don't think you
understand the way the fetcher API was written/used and this is why
some of the patches are still on hold in master-next until I can
convince myself they are actually the right thing to do. There have
been too many other misunderstandings to give me confidence they're
going to do the right thing :(. Sadly, I just don't have the time do
the right level of review and everything else being asked of me.

> > > > One key question I have is how we might need to
> > > >  
> > > > shorted the url path for some mirror urls to add/remove a path prefix
> > > >  
> > > > in the mirroring.
> > > >  
> > > >  
> > > >  
> > >  
> > > What do you mean by this? The downloadfilename could contain a
> > > path without any problem after my change.
> > >  
> >  
> > 
> >  
> >  
> > See above, downloadfilename is not something I'm keen to promote
> > and is creating several of the problems we have by badly trying to
> > hack extra functionality onto the fetcher without thinking through
> > all the issues like mirroring.
> >  
>  
> What is the desired way to avoid name clashes? The package manager
> fetcher need an generic way to override the basename.
Why does it need that? Usually we've used the directory layout to avoid
problems where we can for example. I've been hoping we could do similar
here rather than use downloadfilename, which causes so many mirroring
issues in the first place.


> > > Alternative we could make some replacement mandatory if a
> > > wildcard is used to detect obsolete entries.
> > >  
> >  
> > 
> >  
> >  
> > I don't understand that.
> >  
>  
> If we have a wildcard .* in the path we need to know how to replace
> it. This could be the PATH, BASENAME, DOWNLOADFILENAME, MIRRORARCHIVE
> or re group. But in case of the re group this could still be an old
> entry.
I'm not entirely sure we want to keep all the different syntax. One
frustration I have with the current code is the way pattern matches are
restricted to that url component for example and I've wondered if we
could/should do something different instead, if we can make it simpler.

> Do support all cases we need a fix prefix or delimiter:
> 
> r:http https
>  
> http#https
> http|https
> 
> http?://.*/.*|
> http://downloads.yoctoproject.org/mirror/sources/DOWNLOADARCHIVE
> git://.*/.*|
> http://downloads.yoctoproject.org/mirror/sources/MIRRORARCHIVE
> 
>  
> >  
> > I do think we have too many problems in the existing mirroring url
> > mapping and we probably need to rework this rather than try and
> > pile more patches into it and complicate it further.
> >  
> I have already rework it. If I can remove the backward compatibility
> and replace it with an error this would simplify the code. I only
> need a better name for the DOWNLOADFILENAME (DOWNLOADARCHIVE) and add
> the MIRRORARCHIV.

> 
> >  
> > The question is whether the proposal fixes the issues it needs to
> > and has enough simplification and benefit to justify making the
> > change.
> >  
> My patches support folders in the downloadfilename, upstream mirrors
> and renames. I have to rework the MIRROR strings but therefore the
> commented out tests work.
> 


You've created a patch, yes. I don't think it improves usability though
and I think you're also pushing concepts like downloadfilename into
places we might not want to use them too. For me to merge patches like
these, there needs to be a sense of trust and shared understanding.
This simply isn't there, you're just saying your patches are fine as
they are, I disagree. I therefore worry we're at an impasse and are
going to struggle to move beyond this. That does make me quite sad.

Regards,

Richard
Stefan Herbrechtsmeier Feb. 21, 2025, 10:51 a.m. UTC | #9
Am 20.02.2025 um 23:00 schrieb Richard Purdie:
> On Thu, 2025-02-20 at 18:37 +0100, Stefan Herbrechtsmeier wrote:
>> Am 20.02.2025 um 13:21 schrieb Richard Purdie:
>>> On Thu, 2025-02-20 at 12:45 +0100, Stefan Herbrechtsmeier via 
>>> lists.openembedded.org wrote:
>>>> Am 20.02.2025 um 11:22 schrieb Richard Purdie via 
>>>> lists.openembedded.org:
>>>>> On Wed, 2025-02-05 at 13:12 +0100, Stefan Herbrechtsmeier wrote:
>>>>>>   Am 05.02.2025 um 11:34 schrieb Richard Purdie:
>>>>>>   On Wed, 2025-02-05 at 08:15 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>>>>> I’m open for suggestions. Even ARCHIVE or TARBALL are hard to
>>>>>> understand because it is only a relative path on the download mirror.
>>>>>> Alternative we can mark the lines as upstream or download mirror and
>>>>>> give the replacement different meanings. The path could be the
>>>>>> original PATH for an upstream mirror or the relative path of the
>>>>>> downloaded file for the download mirror.
>>>>> I've been giving this topic some thought. One idea I wondered about was
>>>>> to instead markup the mirror urls with how they're expected to work
>>>>> with a new parameter. For example:
>>>>> git://.*/.*http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball
>>>> The ? could be problematic because it is the separator for the 
>>>> query. It is unlikely that the user really use this query parameter 
>>>> but it could complicate the code because we have to handle 
>>>> additional query parameters.
>>>> What does the "?mirrorformat=mirrortarball" mean? Will it work like 
>>>> a MIRRORTARBALL replacement?
>>>
>>> The mirrorformat parameter would be used by the mirroring code 
>>> itself to understand how to handle the url.
>>
>> What is the different to a MIRRORTARBALL replacement? The code will 
>> replace the word with the content.
>>
> Think about this from a usability perspective. We're struggling to 
> even work out good names for your proposal. Even if we work out the names,
We need the good names in any case. The ? is misleading because it is 
part of a common URL.

> I still don't think users are going to understand how to convert urls 
> into the new syntax.

git://.*/.* 
http://downloads.yoctoproject.org/mirror/sources/?mirrorformat=mirrortarball
git://.*/.* http://downloads.yoctoproject.org/mirror/sources/MIRRORTARBALL
git://.*/.* http://downloads.yoctoproject.org/mirror/sources/{MIRRORTARBALL}

git://.*/.* 
http://downloads.yoctoproject.org/mirror/sources/download.cgi?filename=?mirrorformat=mirrortarball
git://.*/.* 
http://downloads.yoctoproject.org/mirror/sources/download.cgi?filename=MIRRORTARBALL
git://.*/.* 
http://downloads.yoctoproject.org/mirror/sources/download.cgi?filename={MIRRORTARBALL}

> The difference with my proposed format is that we're specifying it in 
> a way which I suspect users will better understand without needing to 
> go and read the docs every time. We're saying what we're configuring 
> with the "mirrorformat" key and then the value should be able to 
> describe the format.
In this case we should add a prefix or use a parameter

git://.*/.* scmmirror:http://downloads.yoctoproject.org/mirror/sources/
http?://.*/.* 
downloadmirror:https://downloads.yoctoproject.org/mirror/sources/
http://  plain:https://

git://.*/.* 
http://downloads.yoctoproject.org/mirror/sources/;mirrorformat=scm
http?://.*/.* 
downloadmirror:https://downloads.yoctoproject.org/mirror/sources/;mirrorformat=download
http:// https://;mirrorformat=plain

> My proposal also gives us both a way to clearly detect when obsolete 
> formatting is used and a namespace mechanism to extend, with both 
> being in a way we can easily and clearly describe in the docs.
But it makes it impossible to support arbitrary regular expressions.

https://a.com/b/c/d.tar https://x.com/y/z.tar
https://a.com/b/(.*) https://x.com/y/\1

> I appreciate with your proposal we can add more strings and we can add 
> docs about how to migrate but I suspect users aren't going to be as 
> readily/easily able to understand it.

The main difference is that in your case the user have to read the docs 
to learn the pre- or postfix and in my case it has to learn the placeholder.

In any case the user need to understand that there is a difference 
between the download and a plain mirror. Either he has to mark the entry 
and hope that the magic behind work for his use case or he has to place 
the placeholder at the correct position.

git://.*/.* https://downloads.abc.org/mirror/sources/download/MIRRORTARBALL
git://.*/.* 
https://downloads.abc.org/mirror/sources/download.cgi?filename=MIRRORTARBALL
git://(.*)/(.*) 
https://downloads.abc.org/mirror/sources/download/\1/\2/MIRRORTARBALL

>>> It would be dropped from the modified url so is only therefore our 
>>> code's use. If there are additional parameters they would be passed 
>>> through as they are now.
>>>
>>>
>>>> How does a simple replacement should look like?
>>>>
>>>> http://  https://
>>>>
>>>> Because of the backward compatible this will replace the basename 
>>>> of the path.
>>>
>>> It would depend how the mirror is laid out. Some mirrors flatten the 
>>> urls like DL_DIR is laid out, some potentially don't. The standard 
>>> usage would likely have a mirrorformat=dldir parameter added.
>>
>> How does the user specify an entry that replace the http scheme with 
>> https and keeps everything else like it is (upstream mirror)?
>>
>
> http://.*/.* https://.*/.*?mirrorformat=upstream
I assume you mean the following because this doesn't make sense.

http:// https://?mirrorformat=upstream

What happens if the user need to add the query parameter mirror=1:

http:// https://?mirrorformat=upstream?mirror=1

> We need to determine the best value for "upstream". I'd also like to 
> review whether the .* formatting is the best way to handle this if we 
> are going to change the format.
>
>>>>> The possible options would be something like:
>>>>> mirrortarball - mirror tarballs taken from DL_DIR
>>>>> flattened - copy of DL_DIR so DL_DIR layout (maybe call it dldir?)
>>>>> upstream - layout is the same as the upstream directory structure so a direct url replacement
>>>> Do you think we have to handle the mirror tarball explicit? The 
>>>> mirror tarball is required for a scheme change.
>>>
>>> If we do that, we can avoid having to guess at too many urls to test 
>>> to figure out a mirror format so I think it would be an improvement 
>>> on where we are today.
>>
>> Do you mean we will test if the URL have a mirrortarball and if not 
>> skip the entry?
>>
> Correct.
But doesn't the mirrortarball depends on the scheme and we doesn't use 
the entry in anyway? Do you have an example?


>>>>> If using a mirrortarball mirror url, we'd know to use the values from
>>>>> urldata.mirrortarballs. We could add parameters to the fetcher to have
>>>>> two parameters, one will be the DL_DIR path and the other would be the
>>>>> upstream url path.
>>>> I don't understand where this is needed, because the mirror tarball 
>>>> and downloadfilename are used by different fetchers.
>>>
>>> Please keep in mind that downloadfilename is pretty much a 
>>> misfeature. It was added as we couldn't control collisions inside 
>>> dl_dir but it creates all kind of other problems. I think we do need 
>>> to handle that problem case but it does then mean we have to 
>>> indicate whether any given mirror uses "dldir" or "upstream" names 
>>> and paths.
>>
>> I don't understand the problem. The download mirror will use the 
>> downloadfilename or its default the basename of the localpath. The 
>> upstream mirror will use the path. The mirrortarball will use the 
>> mirrortarball. Why the fetcher need two parameters?
>>
> You are trying to make downloadfilename a supported parameter of every 
> fetcher.

No, I simply harmonize the use of the parameter and remove the 
problematic assumption that the filename inside the download folder 
doesn't contain additional folders.

A lot of fetcher already support the downloadfilename. Why should I 
assume that this is wrong and not the desired way.

> I'm arguing that I wish we'd never added it at all and that I'd rather 
> not use it or encourage its use.
What is your alternative to handle name clashes?

At the moment it is unclear if the usage of the basename or the folders 
inside the downloadfilename are the bug.

The problem is, that even the encoded URI as filename doesn't ensure 
name clashes because some projects doesn't encode the version inside the 
URI. The information from the recipe like BP is also problematical.

> I don't think you understand the way the fetcher API was written/used
Based on the code I assume the code was used to implement a download 
mirror. Thereby it assume that the download folder use a flat list of 
files. The problem is that neither the usage of the downloadfilename nor 
the syntax for the MIRROR variables reflect this. The functions contains 
some magic inside it.

Based on the code I assume there is a need for folders inside the 
downloadfilename and for the support of upstream mirrors.

>
>
> and this is why some of the patches are still on hold in master-next 
> until I can convince myself they are actually the right thing to do. 
> There have been too many other misunderstandings to give me confidence 
> they're going to do the right thing :(. Sadly, I just don't have the 
> time do the right level of review and everything else being asked of me.

Again my problem is that I don't know the vision. I look at the code and 
develop a solution which looks reasonable for the different existing use 
cases.

>>>>> One key question I have is how we might need to
>>>>> shorted the url path for some mirror urls to add/remove a path prefix
>>>>> in the mirroring.
>>>> What do you mean by this? The downloadfilename could contain a path 
>>>> without any problem after my change.
>>>
>>> See above, downloadfilename is not something I'm keen to promote and 
>>> is creating several of the problems we have by badly trying to hack 
>>> extra functionality onto the fetcher without thinking through all 
>>> the issues like mirroring.
>>
>> What is the desired way to avoid name clashes? The package manager 
>> fetcher need an generic way to override the basename.
>>
> Why does it need that?

What do we do if the upstream only provide a single URI to get the last 
version?

> Usually we've used the directory layout to avoid problems where we can 
> for example. I've been hoping we could do similar here rather than use 
> downloadfilename, which causes so many mirroring issues in the first 
> place.

Usually? The wget fetcher only use the basename. The npm fetcher use the 
scope and basename. The crate fetcher the name and version together with 
a specific file extension. Only the gomod fetcher use the path.

It looks like the gomod fetcher only works because it use the path. How 
should I know that it is the only correct user of the downloadfilename?

The mirror code only use the basename of the downloadfilename / path. 
The download mirror only works if the entry keeps the upstream path.

>>>> Alternative we could make some replacement mandatory if a wildcard 
>>>> is used to detect obsolete entries.
>>>
>>> I don't understand that.
>>
>> If we have a wildcard .* in the path we need to know how to replace 
>> it. This could be the PATH, BASENAME, DOWNLOADFILENAME, MIRRORARCHIVE 
>> or re group. But in case of the re group this could still be an old 
>> entry.
>>
> I'm not entirely sure we want to keep all the different syntax. One 
> frustration I have with the current code is the way pattern matches 
> are restricted to that url component for example and I've wondered if 
> we could/should do something different instead, if we can make it simpler.

What is the purpose of the MIRROR. Should it be support any use case 
(regex) or should it only support specific use cases?

Maybe I could simple remove the url components. Until now I simple 
minimize the changes to simplify the review.

>> Do support all cases we need a fix prefix or delimiter:
>>
>> r:http https
>>
>> http#https
>> http|https
>>
>> http?://.*/.*|http://downloads.yoctoproject.org/mirror/sources/DOWNLOADARCHIVE
>> git://.*/.*|http://downloads.yoctoproject.org/mirror/sources/MIRRORARCHIVE
>>
>>
>>> I do think we have too many problems in the existing mirroring url 
>>> mapping and we probably need to rework this rather than try and pile 
>>> more patches into it and complicate it further.
>> I have already rework it. If I can remove the backward compatibility 
>> and replace it with an error this would simplify the code. I only 
>> need a better name for the DOWNLOADFILENAME (DOWNLOADARCHIVE) and add 
>> the MIRRORARCHIV.
>
>>
>>> The question is whether the proposal fixes the issues it needs to 
>>> and has enough simplification and benefit to justify making the change.
>> My patches support folders in the downloadfilename, upstream mirrors 
>> and renames. I have to rework the MIRROR strings but therefore the 
>> commented out tests work.
>>
>
>
> You've created a patch, yes. I don't think it improves usability 
> though and I think you're also pushing concepts like downloadfilename 
> into places we might not want to use them too.
Until now I don't know the desired solution. Now I think your desired 
solution is to mimic the upstream in the download folder and this is a 
requirement for the rework of the mirror code. This means we have to fix 
the exiting code first and should document this requirement.

> For me to merge patches like these, there needs to be a sense of trust 
> and shared understanding. This simply isn't there, you're just saying 
> your patches are fine as they are, I disagree.
No. I don't understand what problem your proposed solution like the "?" 
solve. The solutions comes with drawbacks and I want to understand the 
advantages which balance the disadvantage.

> I therefore worry we're at an impasse and are going to struggle to 
> move beyond this. That does make me quite sad.
We haven't an impasse. The problem is the code doesn't reflect your 
concepts and it's hard for me to identify them from your replies.

If I assume that the download folder should mimic the upstream we 
doesn't need the DOWNLOADARCHIVE because it should be the same as the PATH.

But I'm unsure if this works because the PATH isn't enough to unique 
identify a file and a path not always contain a file extension.

https://www.linuxtv.org/hg/dvb-apps/archive/3d43b280298c.tar.bz2;downloadfilename=${BPN}-3d43b280298c.tar.bz2
https://wrapdb.mesonbuild.com/v2/fmt_11.0.2-1/get_patch;downloadfilename=fmt_11.0.2-1_patch.zip
http://miniupnp.tuxfamily.org/files/download.php?file=${BP}.tar.gz;downloadfilename=${BP}.tar.gz
https://selenic.com/repo/${BPN}/archive/${HG_CHANGESET}.tar.bz2;downloadfilename=${BP}.tar.bz2
http://www.netlib.org/benchmark/dhry-c;downloadfilename=dhry-c.shar
https://sourceforge.net/p/giflib/code/ci/d54b45b0240d455bbaedee4be5203d2703e59967/tree/doc/giflib-logo.gif?format=raw;downloadfilename=giflib-logo.gif

Additionally the same PATH could be used on different hosts. If we 
remove the possibility for the user to override the filename inside the 
download folder we have to ensure that every URI leads to an other 
filename inside the download folder.

We need to generate a file path from the whole URI to append it to the 
download mirror URI.

I suspect we have to define the vision for the download folder first 
before we could discuses how the mirror entries should look like.