Message ID | 20241212185619.1599668-1-derek@asterius.io |
---|---|
State | Accepted, archived |
Commit | 10febb0e8193d15aec8bbf80b849ae6732da3c22 |
Headers | show |
Series | classes/pypi: update the default UPSTREAM_CHECK_URI to use the simple repo API | expand |
On Thu, 12 Dec 2024 at 19:56, Derek Straka via lists.openembedded.org <derek=asterius.io@lists.openembedded.org> wrote: > Update the UPSTREAM_CHECK_URI to leverage the simple repo API. The > project URLs require javascript which breaks the version checking fetch > and subsequent logic. The simple repo API provides similar > functionality with a well defined spec which is used by tools such as > pip. Also update the UPSTREAM_CHECK_REGEX to be compatible with the > information retrieved via the API Unfortunately even with this patch, there's a large amount of UNKNOWN_BROKEN in the output of 'devtool check-upgrade-status'. Can you please check those, hopefully it's a simple tweak? Alex
Hi Alex, Thanks for your note. I’m working through the remaining downstream recipe changes today which should address the rest of the UNKNOWN_BROKEN recipes. While looking at it yesterday, the download packages come primarily in two archetypes: 1. Those that replace ‘_’ with ‘-‘ in the source archives 2. Those that leave the ‘_’ ONLY in the archives Given that, I think it’s unlikely there’s a clean fix in the bbclass without a more invasive change to the upstream check logic. I can, however, package all the changes for one-core into a single patchset and submit a v2. That will at least address all the core updates in one fell swoop. Does that sound reasonable? -Derek On Fri, Dec 13, 2024 at 2:28 AM Alexander Kanavin <alex.kanavin@gmail.com> wrote: > On Thu, 12 Dec 2024 at 19:56, Derek Straka via lists.openembedded.org > <derek=asterius.io@lists.openembedded.org> wrote: > > > Update the UPSTREAM_CHECK_URI to leverage the simple repo API. The > > project URLs require javascript which breaks the version checking fetch > > and subsequent logic. The simple repo API provides similar > > functionality with a well defined spec which is used by tools such as > > pip. Also update the UPSTREAM_CHECK_REGEX to be compatible with the > > information retrieved via the API > > Unfortunately even with this patch, there's a large amount of > UNKNOWN_BROKEN in the output of 'devtool check-upgrade-status'. Can > you please check those, hopefully it's a simple tweak? > > Alex >
On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote: > Thanks for your note. I’m working through the remaining downstream recipe changes today which should address the rest of the UNKNOWN_BROKEN recipes. > > While looking at it yesterday, the download packages come primarily in two archetypes: > 1. Those that replace ‘_’ with ‘-‘ in the source archives > 2. Those that leave the ‘_’ ONLY in the archives > > Given that, I think it’s unlikely there’s a clean fix in the bbclass without a more invasive change to the upstream check logic. I can, however, package all the changes for one-core into a single patchset and submit a v2. That will at least address all the core updates in one fell swoop. > > Does that sound reasonable? Seems so, yes. Historically pypi upstream checks have been a pain, as there has been a constant stream of seemingly random breaking changes, of two types: 1. _ being replaced by - and vice versa 2. CamelCasing being replaced by lowercasing and vice versa. I haven't been able to figure out any pattern in this, or come up with a universal check. If you can simply fix up core recipes to not return UNKNOWN_BROKEN, I'd appreciate. Alex
Agreed. It's painful at times because there isn't a complete standardization in pypi naming conventions. I' sent a v2 that is intended to resolve all of the oe-core UNKNOWN_BROKEN python recipes. I'll move to those in meta-python next. On Fri, Dec 13, 2024 at 9:15 AM Alexander Kanavin <alex.kanavin@gmail.com> wrote: > On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote: > > Thanks for your note. I’m working through the remaining downstream > recipe changes today which should address the rest of the UNKNOWN_BROKEN > recipes. > > > > While looking at it yesterday, the download packages come primarily in > two archetypes: > > 1. Those that replace ‘_’ with ‘-‘ in the source archives > > 2. Those that leave the ‘_’ ONLY in the archives > > > > Given that, I think it’s unlikely there’s a clean fix in the bbclass > without a more invasive change to the upstream check logic. I can, > however, package all the changes for one-core into a single patchset and > submit a v2. That will at least address all the core updates in one fell > swoop. > > > > Does that sound reasonable? > > Seems so, yes. > > Historically pypi upstream checks have been a pain, as there has been > a constant stream of seemingly random breaking changes, of two types: > > 1. _ being replaced by - and vice versa > 2. CamelCasing being replaced by lowercasing and vice versa. > > I haven't been able to figure out any pattern in this, or come up with > a universal check. If you can simply fix up core recipes to not return > UNKNOWN_BROKEN, I'd appreciate. > > Alex >
On 13 Dec 2024, at 15:14, Alexander Kanavin via lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org> wrote: > > On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote: >> Thanks for your note. I’m working through the remaining downstream recipe changes today which should address the rest of the UNKNOWN_BROKEN recipes. >> >> While looking at it yesterday, the download packages come primarily in two archetypes: >> 1. Those that replace ‘_’ with ‘-‘ in the source archives >> 2. Those that leave the ‘_’ ONLY in the archives >> >> Given that, I think it’s unlikely there’s a clean fix in the bbclass without a more invasive change to the upstream check logic. I can, however, package all the changes for one-core into a single patchset and submit a v2. That will at least address all the core updates in one fell swoop. >> >> Does that sound reasonable? > > Seems so, yes. > > Historically pypi upstream checks have been a pain, as there has been > a constant stream of seemingly random breaking changes, of two types: > > 1. _ being replaced by - and vice versa > 2. CamelCasing being replaced by lowercasing and vice versa. > > I haven't been able to figure out any pattern in this, or come up with > a universal check. If you can simply fix up core recipes to not return > UNKNOWN_BROKEN, I'd appreciate. I’ve some partial branches that attempt to bring sanity to this but yes, it’s a mess. The good news is that https://peps.python.org/pep-0625/ says that sdist filenames should be normalised and from what I can tell everything but setuptools does normalise, and the use of setuptools is falling. The simple update API says that the project name is normalised, so we can add a little normalise function: def pypi_normalize(s): import re return re.sub(r"[-_.]+", "_", s).lower() And use that to turn the PYPI_PACKAGE into the right thing, surely? Rationalising this is the source of at least three wip branches I have locally, so I’d love to see it sorted. Ross
Hi Ross, I'll give that normalization another shot. I saw a couple older packages not following the normalized filenames, but those could be outliers. Thanks for the pointer. -Derek On Fri, Dec 13, 2024 at 12:07 PM Ross Burton <Ross.Burton@arm.com> wrote: > On 13 Dec 2024, at 15:14, Alexander Kanavin via lists.openembedded.org > <alex.kanavin=gmail.com@lists.openembedded.org> wrote: > > > > On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote: > >> Thanks for your note. I’m working through the remaining downstream > recipe changes today which should address the rest of the UNKNOWN_BROKEN > recipes. > >> > >> While looking at it yesterday, the download packages come primarily in > two archetypes: > >> 1. Those that replace ‘_’ with ‘-‘ in the source archives > >> 2. Those that leave the ‘_’ ONLY in the archives > >> > >> Given that, I think it’s unlikely there’s a clean fix in the bbclass > without a more invasive change to the upstream check logic. I can, > however, package all the changes for one-core into a single patchset and > submit a v2. That will at least address all the core updates in one fell > swoop. > >> > >> Does that sound reasonable? > > > > Seems so, yes. > > > > Historically pypi upstream checks have been a pain, as there has been > > a constant stream of seemingly random breaking changes, of two types: > > > > 1. _ being replaced by - and vice versa > > 2. CamelCasing being replaced by lowercasing and vice versa. > > > > I haven't been able to figure out any pattern in this, or come up with > > a universal check. If you can simply fix up core recipes to not return > > UNKNOWN_BROKEN, I'd appreciate. > > I’ve some partial branches that attempt to bring sanity to this but yes, > it’s a mess. > > The good news is that https://peps.python.org/pep-0625/ says that sdist > filenames should be normalised and from what I can tell everything but > setuptools does normalise, and the use of setuptools is falling. > > The simple update API says that the project name is normalised, so we can > add a little normalise function: > > def pypi_normalize(s): > import re > return re.sub(r"[-_.]+", "_", s).lower() > > And use that to turn the PYPI_PACKAGE into the right thing, surely? > > Rationalising this is the source of at least three wip branches I have > locally, so I’d love to see it sorted. > > Ross
On 13 Dec 2024, at 18:06, Ross Burton <Ross.Burton@arm.com> wrote:
> Rationalising this is the source of at least three wip branches I have locally, so I’d love to see it sorted.
Hit sent too early.
One of my WIP branches basically made some semantic changes, where PYPI_PACKAGE is the canonical package name (as in, what you type when pip-installing) and PYPI_ARCHIVE_NAME is the normalised name (for sdist, etc). The fun is then building everything and setting variables where the package name can’t be derived from the recipe name (eg python3-babel has a PYPI_PACKAGE of Babel) or where the archive isn’t actually normalised (eg more-itertools has a more-itertools sdist but more_itertools wheel).
I _think_ this was the right direction to take at least…
Ross
Unfortunately, there's a large swath of packages in both oe-core (>30) in meta-python (>70) that do not follow PEP625. I'll send a v3 patchset that normalizes the URLs and sdist filenames for the future (assuming folks standardize to follow PEP625 in the future), but we'll still carry a large number of special cases in the interim. On Fri, Dec 13, 2024 at 12:09 PM Derek Straka <derek@asterius.io> wrote: > Hi Ross, > > I'll give that normalization another shot. I saw a couple older packages > not following the normalized filenames, but those could be outliers. > Thanks for the pointer. > > -Derek > > On Fri, Dec 13, 2024 at 12:07 PM Ross Burton <Ross.Burton@arm.com> wrote: > >> On 13 Dec 2024, at 15:14, Alexander Kanavin via lists.openembedded.org >> <alex.kanavin=gmail.com@lists.openembedded.org> wrote: >> > >> > On Fri, 13 Dec 2024 at 15:53, Derek Straka <derek@asterius.io> wrote: >> >> Thanks for your note. I’m working through the remaining downstream >> recipe changes today which should address the rest of the UNKNOWN_BROKEN >> recipes. >> >> >> >> While looking at it yesterday, the download packages come primarily in >> two archetypes: >> >> 1. Those that replace ‘_’ with ‘-‘ in the source archives >> >> 2. Those that leave the ‘_’ ONLY in the archives >> >> >> >> Given that, I think it’s unlikely there’s a clean fix in the bbclass >> without a more invasive change to the upstream check logic. I can, >> however, package all the changes for one-core into a single patchset and >> submit a v2. That will at least address all the core updates in one fell >> swoop. >> >> >> >> Does that sound reasonable? >> > >> > Seems so, yes. >> > >> > Historically pypi upstream checks have been a pain, as there has been >> > a constant stream of seemingly random breaking changes, of two types: >> > >> > 1. _ being replaced by - and vice versa >> > 2. CamelCasing being replaced by lowercasing and vice versa. >> > >> > I haven't been able to figure out any pattern in this, or come up with >> > a universal check. If you can simply fix up core recipes to not return >> > UNKNOWN_BROKEN, I'd appreciate. >> >> I’ve some partial branches that attempt to bring sanity to this but yes, >> it’s a mess. >> >> The good news is that https://peps.python.org/pep-0625/ says that sdist >> filenames should be normalised and from what I can tell everything but >> setuptools does normalise, and the use of setuptools is falling. >> >> The simple update API says that the project name is normalised, so we can >> add a little normalise function: >> >> def pypi_normalize(s): >> import re >> return re.sub(r"[-_.]+", "_", s).lower() >> >> And use that to turn the PYPI_PACKAGE into the right thing, surely? >> >> Rationalising this is the source of at least three wip branches I have >> locally, so I’d love to see it sorted. >> >> Ross > >
diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass index c6bbe8119a..15172e97b4 100644 --- a/meta/classes-recipe/pypi.bbclass +++ b/meta/classes-recipe/pypi.bbclass @@ -37,7 +37,11 @@ S = "${WORKDIR}/${PYPI_PACKAGE}-${PV}" # Replace any '_' characters in the pypi URI with '-'s to follow the PyPi website naming conventions UPSTREAM_CHECK_PYPI_PACKAGE ?= "${@d.getVar('PYPI_PACKAGE').replace('_', '-')}" -UPSTREAM_CHECK_URI ?= "https://pypi.org/project/${UPSTREAM_CHECK_PYPI_PACKAGE}/" -UPSTREAM_CHECK_REGEX ?= "/${UPSTREAM_CHECK_PYPI_PACKAGE}/(?P<pver>(\d+[\.\-_]*)+)/" + +# Use the simple repository API rather than the potentially unstable project URL +# More information on the pypi API specification is avaialble here: +# https://packaging.python.org/en/latest/specifications/simple-repository-api/ +UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${UPSTREAM_CHECK_PYPI_PACKAGE}/" +UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz)" CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
Update the UPSTREAM_CHECK_URI to leverage the simple repo API. The project URLs require javascript which breaks the version checking fetch and subsequent logic. The simple repo API provides similar functionality with a well defined spec which is used by tools such as pip. Also update the UPSTREAM_CHECK_REGEX to be compatible with the information retrieved via the API Signed-off-by: Derek Straka <derek@asterius.io> --- meta/classes-recipe/pypi.bbclass | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)