diff mbox series

classes/pypi: update the default UPSTREAM_CHECK_URI to use the simple repo API

Message ID 20241212185619.1599668-1-derek@asterius.io
State Accepted, archived
Commit 10febb0e8193d15aec8bbf80b849ae6732da3c22
Headers show
Series classes/pypi: update the default UPSTREAM_CHECK_URI to use the simple repo API | expand

Commit Message

Derek Straka Dec. 12, 2024, 6:56 p.m. UTC
Update the UPSTREAM_CHECK_URI to leverage the simple repo API.  The
project URLs require javascript which breaks the version checking fetch
and subsequent logic.  The simple repo API provides similar
functionality with a well defined spec which is used by tools such as
pip.  Also update the UPSTREAM_CHECK_REGEX to be compatible with the
information retrieved via the API

Signed-off-by: Derek Straka <derek@asterius.io>
---
 meta/classes-recipe/pypi.bbclass | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Alexander Kanavin Dec. 13, 2024, 8:28 a.m. UTC | #1
On Thu, 12 Dec 2024 at 19:56, Derek Straka via lists.openembedded.org
<derek=asterius.io@lists.openembedded.org> wrote:

> Update the UPSTREAM_CHECK_URI to leverage the simple repo API.  The
> project URLs require javascript which breaks the version checking fetch
> and subsequent logic.  The simple repo API provides similar
> functionality with a well defined spec which is used by tools such as
> pip.  Also update the UPSTREAM_CHECK_REGEX to be compatible with the
> information retrieved via the API

Unfortunately even with this patch, there's a large amount of
UNKNOWN_BROKEN in the output of 'devtool check-upgrade-status'. Can
you please check those, hopefully it's a simple tweak?

Alex
Derek Straka Dec. 13, 2024, 2:53 p.m. UTC | #2
Hi Alex,

Thanks for your note.  I’m working through the remaining downstream recipe
changes today which should address the rest of the UNKNOWN_BROKEN recipes.

While looking at it yesterday, the download packages come primarily in two
archetypes:
1. Those that replace ‘_’ with ‘-‘ in the source archives
2. Those that leave the ‘_’ ONLY in the archives

Given that, I think it’s unlikely there’s a clean fix in the bbclass
without a more invasive change to the upstream check logic.  I can,
however, package all the changes for one-core into a single patchset and
submit a v2.  That will at least address all the core updates in one fell
swoop.

Does that sound reasonable?

-Derek

On Fri, Dec 13, 2024 at 2:28 AM Alexander Kanavin <alex.kanavin@gmail.com>
wrote:

> On Thu, 12 Dec 2024 at 19:56, Derek Straka via lists.openembedded.org
> <derek=asterius.io@lists.openembedded.org> wrote:
>
> > Update the UPSTREAM_CHECK_URI to leverage the simple repo API.  The
> > project URLs require javascript which breaks the version checking fetch
> > and subsequent logic.  The simple repo API provides similar
> > functionality with a well defined spec which is used by tools such as
> > pip.  Also update the UPSTREAM_CHECK_REGEX to be compatible with the
> > information retrieved via the API
>
> Unfortunately even with this patch, there's a large amount of
> UNKNOWN_BROKEN in the output of 'devtool check-upgrade-status'. Can
> you please check those, hopefully it's a simple tweak?
>
> Alex
>
Alexander Kanavin Dec. 13, 2024, 3:14 p.m. UTC | #3
On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote:
> Thanks for your note.  I’m working through the remaining downstream recipe changes today which should address the rest of the UNKNOWN_BROKEN recipes.
>
> While looking at it yesterday, the download packages come primarily in two archetypes:
> 1. Those that replace ‘_’ with ‘-‘ in the source archives
> 2. Those that leave the ‘_’ ONLY in the archives
>
> Given that, I think it’s unlikely there’s a clean fix in the bbclass without a more invasive change to the upstream check logic.  I can, however, package all the changes for one-core into a single patchset and submit a v2.  That will at least address all the core updates in one fell swoop.
>
> Does that sound reasonable?

Seems so, yes.

Historically pypi upstream checks have been a pain, as there has been
a constant stream of seemingly random breaking changes, of two types:

1. _ being replaced by - and vice versa
2. CamelCasing being replaced by lowercasing and vice versa.

I haven't been able to figure out any pattern in this, or come up with
a universal check. If you can simply fix up core recipes to not return
UNKNOWN_BROKEN, I'd appreciate.

Alex
Derek Straka Dec. 13, 2024, 3:50 p.m. UTC | #4
Agreed.  It's painful at times because there isn't a complete
standardization in pypi naming conventions.  I' sent a v2 that is intended
to resolve all of the oe-core UNKNOWN_BROKEN python recipes.  I'll move to
those in meta-python next.

On Fri, Dec 13, 2024 at 9:15 AM Alexander Kanavin <alex.kanavin@gmail.com>
wrote:

> On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote:
> > Thanks for your note.  I’m working through the remaining downstream
> recipe changes today which should address the rest of the UNKNOWN_BROKEN
> recipes.
> >
> > While looking at it yesterday, the download packages come primarily in
> two archetypes:
> > 1. Those that replace ‘_’ with ‘-‘ in the source archives
> > 2. Those that leave the ‘_’ ONLY in the archives
> >
> > Given that, I think it’s unlikely there’s a clean fix in the bbclass
> without a more invasive change to the upstream check logic.  I can,
> however, package all the changes for one-core into a single patchset and
> submit a v2.  That will at least address all the core updates in one fell
> swoop.
> >
> > Does that sound reasonable?
>
> Seems so, yes.
>
> Historically pypi upstream checks have been a pain, as there has been
> a constant stream of seemingly random breaking changes, of two types:
>
> 1. _ being replaced by - and vice versa
> 2. CamelCasing being replaced by lowercasing and vice versa.
>
> I haven't been able to figure out any pattern in this, or come up with
> a universal check. If you can simply fix up core recipes to not return
> UNKNOWN_BROKEN, I'd appreciate.
>
> Alex
>
Ross Burton Dec. 13, 2024, 6:07 p.m. UTC | #5
On 13 Dec 2024, at 15:14, Alexander Kanavin via lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org> wrote:
> 
> On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote:
>> Thanks for your note.  I’m working through the remaining downstream recipe changes today which should address the rest of the UNKNOWN_BROKEN recipes.
>> 
>> While looking at it yesterday, the download packages come primarily in two archetypes:
>> 1. Those that replace ‘_’ with ‘-‘ in the source archives
>> 2. Those that leave the ‘_’ ONLY in the archives
>> 
>> Given that, I think it’s unlikely there’s a clean fix in the bbclass without a more invasive change to the upstream check logic.  I can, however, package all the changes for one-core into a single patchset and submit a v2.  That will at least address all the core updates in one fell swoop.
>> 
>> Does that sound reasonable?
> 
> Seems so, yes.
> 
> Historically pypi upstream checks have been a pain, as there has been
> a constant stream of seemingly random breaking changes, of two types:
> 
> 1. _ being replaced by - and vice versa
> 2. CamelCasing being replaced by lowercasing and vice versa.
> 
> I haven't been able to figure out any pattern in this, or come up with
> a universal check. If you can simply fix up core recipes to not return
> UNKNOWN_BROKEN, I'd appreciate.

I’ve some partial branches that attempt to bring sanity to this but yes, it’s a mess.

The good news is that https://peps.python.org/pep-0625/ says that sdist filenames should be normalised and from what I can tell everything but setuptools does normalise, and the use of setuptools is falling.

The simple update API says that the project name is normalised, so we can add a little normalise function:

def pypi_normalize(s):
    import re
    return re.sub(r"[-_.]+", "_", s).lower()

And use that to turn the PYPI_PACKAGE into the right thing, surely?

Rationalising this is the source of at least three wip branches I have locally, so I’d love to see it sorted.

Ross
Derek Straka Dec. 13, 2024, 6:09 p.m. UTC | #6
Hi Ross,

I'll give that normalization another shot.  I saw a couple older packages
not following the normalized filenames, but those could be outliers.
Thanks for the pointer.

-Derek

On Fri, Dec 13, 2024 at 12:07 PM Ross Burton <Ross.Burton@arm.com> wrote:

> On 13 Dec 2024, at 15:14, Alexander Kanavin via lists.openembedded.org
> <alex.kanavin=gmail.com@lists.openembedded.org> wrote:
> >
> > On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote:
> >> Thanks for your note.  I’m working through the remaining downstream
> recipe changes today which should address the rest of the UNKNOWN_BROKEN
> recipes.
> >>
> >> While looking at it yesterday, the download packages come primarily in
> two archetypes:
> >> 1. Those that replace ‘_’ with ‘-‘ in the source archives
> >> 2. Those that leave the ‘_’ ONLY in the archives
> >>
> >> Given that, I think it’s unlikely there’s a clean fix in the bbclass
> without a more invasive change to the upstream check logic.  I can,
> however, package all the changes for one-core into a single patchset and
> submit a v2.  That will at least address all the core updates in one fell
> swoop.
> >>
> >> Does that sound reasonable?
> >
> > Seems so, yes.
> >
> > Historically pypi upstream checks have been a pain, as there has been
> > a constant stream of seemingly random breaking changes, of two types:
> >
> > 1. _ being replaced by - and vice versa
> > 2. CamelCasing being replaced by lowercasing and vice versa.
> >
> > I haven't been able to figure out any pattern in this, or come up with
> > a universal check. If you can simply fix up core recipes to not return
> > UNKNOWN_BROKEN, I'd appreciate.
>
> I’ve some partial branches that attempt to bring sanity to this but yes,
> it’s a mess.
>
> The good news is that https://peps.python.org/pep-0625/ says that sdist
> filenames should be normalised and from what I can tell everything but
> setuptools does normalise, and the use of setuptools is falling.
>
> The simple update API says that the project name is normalised, so we can
> add a little normalise function:
>
> def pypi_normalize(s):
>     import re
>     return re.sub(r"[-_.]+", "_", s).lower()
>
> And use that to turn the PYPI_PACKAGE into the right thing, surely?
>
> Rationalising this is the source of at least three wip branches I have
> locally, so I’d love to see it sorted.
>
> Ross
Ross Burton Dec. 13, 2024, 6:09 p.m. UTC | #7
On 13 Dec 2024, at 18:06, Ross Burton <Ross.Burton@arm.com> wrote:
> Rationalising this is the source of at least three wip branches I have locally, so I’d love to see it sorted.

Hit sent too early.

One of my WIP branches basically made some semantic changes, where PYPI_PACKAGE is the canonical package name (as in, what you type when pip-installing) and PYPI_ARCHIVE_NAME is the normalised name (for sdist, etc). The fun is then building everything and setting variables where the package name can’t be derived from the recipe name (eg python3-babel has a PYPI_PACKAGE of Babel) or where the archive isn’t actually normalised (eg more-itertools has a more-itertools sdist but more_itertools wheel).

I _think_ this was the right direction to take at least…

Ross
Derek Straka Dec. 13, 2024, 7:39 p.m. UTC | #8
Unfortunately, there's a large swath of packages in both oe-core (>30) in
meta-python (>70) that do not follow PEP625.  I'll send a v3 patchset that
normalizes the URLs and sdist filenames for the future (assuming folks
standardize to follow PEP625 in the future), but we'll still carry a large
number of special cases in the interim.

On Fri, Dec 13, 2024 at 12:09 PM Derek Straka <derek@asterius.io> wrote:

> Hi Ross,
>
> I'll give that normalization another shot.  I saw a couple older packages
> not following the normalized filenames, but those could be outliers.
> Thanks for the pointer.
>
> -Derek
>
> On Fri, Dec 13, 2024 at 12:07 PM Ross Burton <Ross.Burton@arm.com> wrote:
>
>> On 13 Dec 2024, at 15:14, Alexander Kanavin via lists.openembedded.org
>> <alex.kanavin=gmail.com@lists.openembedded.org> wrote:
>> >
>> > On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote:
>> >> Thanks for your note.  I’m working through the remaining downstream
>> recipe changes today which should address the rest of the UNKNOWN_BROKEN
>> recipes.
>> >>
>> >> While looking at it yesterday, the download packages come primarily in
>> two archetypes:
>> >> 1. Those that replace ‘_’ with ‘-‘ in the source archives
>> >> 2. Those that leave the ‘_’ ONLY in the archives
>> >>
>> >> Given that, I think it’s unlikely there’s a clean fix in the bbclass
>> without a more invasive change to the upstream check logic.  I can,
>> however, package all the changes for one-core into a single patchset and
>> submit a v2.  That will at least address all the core updates in one fell
>> swoop.
>> >>
>> >> Does that sound reasonable?
>> >
>> > Seems so, yes.
>> >
>> > Historically pypi upstream checks have been a pain, as there has been
>> > a constant stream of seemingly random breaking changes, of two types:
>> >
>> > 1. _ being replaced by - and vice versa
>> > 2. CamelCasing being replaced by lowercasing and vice versa.
>> >
>> > I haven't been able to figure out any pattern in this, or come up with
>> > a universal check. If you can simply fix up core recipes to not return
>> > UNKNOWN_BROKEN, I'd appreciate.
>>
>> I’ve some partial branches that attempt to bring sanity to this but yes,
>> it’s a mess.
>>
>> The good news is that https://peps.python.org/pep-0625/ says that sdist
>> filenames should be normalised and from what I can tell everything but
>> setuptools does normalise, and the use of setuptools is falling.
>>
>> The simple update API says that the project name is normalised, so we can
>> add a little normalise function:
>>
>> def pypi_normalize(s):
>>     import re
>>     return re.sub(r"[-_.]+", "_", s).lower()
>>
>> And use that to turn the PYPI_PACKAGE into the right thing, surely?
>>
>> Rationalising this is the source of at least three wip branches I have
>> locally, so I’d love to see it sorted.
>>
>> Ross
>
>
diff mbox series

Patch

diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
index c6bbe8119a..15172e97b4 100644
--- a/meta/classes-recipe/pypi.bbclass
+++ b/meta/classes-recipe/pypi.bbclass
@@ -37,7 +37,11 @@  S = "${WORKDIR}/${PYPI_PACKAGE}-${PV}"
 
 # Replace any '_' characters in the pypi URI with '-'s to follow the PyPi website naming conventions
 UPSTREAM_CHECK_PYPI_PACKAGE ?= "${@d.getVar('PYPI_PACKAGE').replace('_', '-')}"
-UPSTREAM_CHECK_URI ?= "https://pypi.org/project/${UPSTREAM_CHECK_PYPI_PACKAGE}/"
-UPSTREAM_CHECK_REGEX ?= "/${UPSTREAM_CHECK_PYPI_PACKAGE}/(?P<pver>(\d+[\.\-_]*)+)/"
+
+# Use the simple repository API rather than the potentially unstable project URL
+# More information on the pypi API specification is avaialble here:
+# https://packaging.python.org/en/latest/specifications/simple-repository-api/
+UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${UPSTREAM_CHECK_PYPI_PACKAGE}/"
+UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz)"
 
 CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"