[v2] convert-variables: Script for Inclusive Language variable renames

Message ID 20220217003248.581903-1-saul.wold@windriver.com
State New
Headers show
Series [v2] convert-variables: Script for Inclusive Language variable renames | expand

Commit Message

Saul Wold Feb. 17, 2022, 12:32 a.m. UTC
From: Saul Wold <Saul.Wold@windriver.com>

This script searches for a list of variable that have been renamed
and converts them to their more descriptive names. It also searches
for a list of variables that have been removed or deprecated and
prints a message.

It will print a message to inform the user that there are terms that
need to be updated in their files. Many of these changes are context
sensitive and may not be modified as they might be existing calls to
other libraries. This message is informational only.

I have tested this on poky and meta-openembedded so far.

(From OE-Core rev: 50fe7ba8dba05a9681c9095506f798796cfc2750)

Signed-off-by: Saul Wold <saul.wold@windriver.com>
---
v2: renamed script, removed bitbake internal vars, added WHITELIST_ option

 scripts/contrib/convert-variables.py | 110 +++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)
 create mode 100755 scripts/contrib/convert-variables.py

Comments

Richard Purdie Feb. 17, 2022, 9:49 a.m. UTC | #1
On Wed, 2022-02-16 at 16:32 -0800, Saul Wold wrote:
> From: Saul Wold <Saul.Wold@windriver.com>
> 
> This script searches for a list of variable that have been renamed
> and converts them to their more descriptive names. It also searches
> for a list of variables that have been removed or deprecated and
> prints a message.
> 
> It will print a message to inform the user that there are terms that
> need to be updated in their files. Many of these changes are context
> sensitive and may not be modified as they might be existing calls to
> other libraries. This message is informational only.
> 
> I have tested this on poky and meta-openembedded so far.
> 
> (From OE-Core rev: 50fe7ba8dba05a9681c9095506f798796cfc2750)
> 
> Signed-off-by: Saul Wold <saul.wold@windriver.com>
> ---
> v2: renamed script, removed bitbake internal vars, added WHITELIST_ option
> 
>  scripts/contrib/convert-variables.py | 110 +++++++++++++++++++++++++++
>  1 file changed, 110 insertions(+)
>  create mode 100755 scripts/contrib/convert-variables.py
> 
> diff --git a/scripts/contrib/convert-variables.py b/scripts/contrib/convert-variables.py
> new file mode 100755
> index 0000000000..a632fd4d5c
> --- /dev/null
> +++ b/scripts/contrib/convert-variables.py
> @@ -0,0 +1,110 @@
> +#!/usr/bin/env python3
> +#
> +# Conversion script to rename variables with more descriptive terms
> +#
> +#
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +
> +import re
> +import os
> +import sys
> +import tempfile
> +import shutil
> +import mimetypes
> +
> +if len(sys.argv) < 2:
> +    print("Please specify a directory to run the conversion script against.")
> +    sys.exit(1)
> +
> +renames = {
> +"BB_ENV_WHITELIST":"BB_ENV_PASSTHROUGH",
> +"BB_ENV_EXTRAWHITE":"BB_ENV_PASSTHROUGH_ADDITIONS",
> +"BB_HASHCONFIG_WHITELIST":"BB_HASHCONFIG_IGNORE_VARS",
> +"BB_SETSCENE_ENFORCE_WHITELIST":"BB_SETSCENE_ENFORCE_IGNORE_TASKS",
> +"BB_HASHBASE_WHITELIST":"BB_BASEHASH_IGNORE_VARS",
> +"BB_HASHTASK_WHITELIST":"BB_TASKHASH_IGNORE_TASKS",
> +"CVE_CHECK_PN_WHITELIST":"CVE_CHECK_SKIP_RECIPE",
> +"CVE_CHECK_WHITELIST":"CVE_CHECK_IGNORE",
> +"MULTI_PROVIDER_WHITELIST":"BB_MULTI_PROVIDER_ALLOWED",
> +"PNBLACKLIST":"SKIP_RECIPE",
> +"SDK_LOCAL_CONF_BLACKLIST":"ESDK_LOCAL_CONF_ALLOW",
> +"SDK_LOCAL_CONF_WHITELIST":"ESDK_LOCAL_CONF_REMOVE",
> +"SDK_INHERIT_BLACKLIST":"ESDK_CLASS_INHERIT_DISABLE",
> +"SSTATE_DUPWHITELIST":"SSTATE_ALLOW_OVERLAP_FILES",
> +"SYSROOT_DIRS_BLACKLIST":"SYSROOT_DIRS_IGNORE",
> +"UNKNOWN_CONFIGURE_WHITELIST":"UNKNOWN_CONFIGURE_OPT_IGNORE",
> +"WHITELIST_":"INCOMPATIBLE_LICENSE_ALLOWED_RECIPE_",
> +}
> +
> +removed_list = [
> +"BB_STAMP_WHITELIST",
> +"BB_STAMP_POLICY",
> +"ICECC_USER_CLASS_BL",
> +"ICECC_USER_PACKAGE_BL",
> +"ICECC_USER_PACKAGE_WL",
> +"INHERIT_BLACKLIST",
> +"TUNEABI_WHITELIST",
> +]
> +

Thanks Saul. The key to making this a success is in the detail and I know you're
doing your best with it but there are some things we need to get right.

a) the script name or description at the top of it still doesn't sit right with
me. "convert-variables" doesn't really tell me what it is for in a years time.
I've tweaked the patch to "convert-variable-renames.py" locally which mentions
the fact it is renaming things and I tweaked the description too.

b) ESDK_LOCAL_CONF_ALLOW and ESDK_LOCAL_CONF_REMOVE are backwards above. The
wiki also used LOCALCONF in one case and LOCAL_CONF in the other. I prefer
LOCALCONF and that was what the original discussion said so I've fixed
everything to match that. I've updated things in master-next.

c) The WHITELIST_ change isn't what was discussed. That variable needs removing
somehow, at least the license addition bit to it as it is horrible. A straight
conversion is not appropriate. We can patch this into the conversion script when
it is ready, IMO it isn't there yet.

d) The subtleties of the ICECC_USER* translations have gotten lost in the
system. The idea was to merge several variables together.

The final piece of all this is communication. We need regular and clear
communication with the community about what is happening, when things are
merging, what work remains, the current work items and so on.

I guess my next best thing to work on is going to be to communicate where things
sit as I see them and what remains.

Cheers,

Richard
Richard Purdie Feb. 17, 2022, 9:54 a.m. UTC | #2
On Wed, 2022-02-16 at 16:32 -0800, Saul Wold wrote:
> +def processfile(fn):
> +
> +    print(f"processing file '{fn}'")

The fact that someone took the original conversion scripts, dropped the
copyright header and then added f strings to it, which the project doesn't yet
officially support and are an issue of contention, *really* could annoy me.

Cheers,

Richard
Enrico Scholz Feb. 18, 2022, 1:40 p.m. UTC | #3
"Saul Wold" <Saul.Wold@windriver.com> writes:

> From: Saul Wold <Saul.Wold@windriver.com>
>
> This script searches for a list of variable that have been renamed
> and converts them to their more descriptive names.

again: most of these renamings make identifiers much less descriptive
because they now sound like boolean flags instead of lists


> +"BB_SETSCENE_ENFORCE_WHITELIST":"BB_SETSCENE_ENFORCE_IGNORE_TASKS",
> +"BB_HASHBASE_WHITELIST":"BB_BASEHASH_IGNORE_VARS",
> +"BB_HASHTASK_WHITELIST":"BB_TASKHASH_IGNORE_TASKS",
> +"CVE_CHECK_PN_WHITELIST":"CVE_CHECK_SKIP_RECIPE",
> +"CVE_CHECK_WHITELIST":"CVE_CHECK_IGNORE",
> +"MULTI_PROVIDER_WHITELIST":"BB_MULTI_PROVIDER_ALLOWED",
> +"PNBLACKLIST":"SKIP_RECIPE",
> +"SDK_LOCAL_CONF_BLACKLIST":"ESDK_LOCAL_CONF_ALLOW",
> +"SDK_LOCAL_CONF_WHITELIST":"ESDK_LOCAL_CONF_REMOVE",
> +"SDK_INHERIT_BLACKLIST":"ESDK_CLASS_INHERIT_DISABLE",
> +"SSTATE_DUPWHITELIST":"SSTATE_ALLOW_OVERLAP_FILES",
> +"SYSROOT_DIRS_BLACKLIST":"SYSROOT_DIRS_IGNORE",
> +"UNKNOWN_CONFIGURE_WHITELIST":"UNKNOWN_CONFIGURE_OPT_IGNORE",



Enrico
Richard Purdie Feb. 18, 2022, 1:55 p.m. UTC | #4
On Fri, 2022-02-18 at 14:40 +0100, Enrico Scholz via lists.openembedded.org
wrote:
> "Saul Wold" <Saul.Wold@windriver.com> writes:
> 
> > From: Saul Wold <Saul.Wold@windriver.com>
> > 
> > This script searches for a list of variable that have been renamed
> > and converts them to their more descriptive names.
> 
> again: most of these renamings make identifiers much less descriptive
> because they now sound like boolean flags instead of lists

Whilst you could say they sound like booleans, they clearly can't be as that
wouldn't make sense, e.g. "Ignoring variables in basehash" makes no sense.

Secondly, most metadata variables are lists of things. Adding "LIST" to every
variable that is a list of items would just make the variable names much longer
with little end resulting readability gain.

Unfortunately there likely isn't a perfect rename and these were the best we
were able to come up with.

Cheers,

Richard
Enrico Scholz Feb. 18, 2022, 2:27 p.m. UTC | #5
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> > This script searches for a list of variable that have been renamed
>> > and converts them to their more descriptive names.

s/descriptive/politically correct/


>> again: most of these renamings make identifiers much less descriptive
>> because they now sound like boolean flags instead of lists
>
> Whilst you could say they sound like booleans, they clearly can't be
> as that wouldn't make sense, e.g. "Ignoring variables in basehash"
> makes no sense.

True; because it does not make sense, these variables should not be
named in this way.


> Secondly, most metadata variables are lists of things.

The renamed ones; yes.  But when I look in bitbake.conf, I can not say
that most variables are lists.


> Adding "LIST" to every variable that is a list of items would just make
> the variable names much longer with little end resulting readability
> gain.

True; explicit type annotations like in the hungarian notation are
nonsense.  But "whitelist" and "blacklist" are words with a meaning;
the "list" is not used as a type annotation there.

Variable names were perfectly readable.  A change should make things
better, not worse.


> Unfortunately there likely isn't a perfect rename and these were the
> best we were able to come up with.

Why is the rename done at all?  It makes the product just worse due to a
less usable api and because it causes a lot of work for the users of the
api (I shudder when I think about updating my CI to work with new and
old branches).



Enrico
Alexander Kanavin Feb. 18, 2022, 2:32 p.m. UTC | #6
On Fri, 18 Feb 2022 at 15:27, Enrico Scholz via lists.openembedded.org
<enrico.scholz=sigma-chemnitz.de@lists.openembedded.org> wrote:
> > Unfortunately there likely isn't a perfect rename and these were the
> > best we were able to come up with.
>
> Why is the rename done at all?  It makes the product just worse due to a
> less usable api and because it causes a lot of work for the users of the
> api (I shudder when I think about updating my CI to work with new and
> old branches).

Enrico, cut this out please. Every time you start with a specific,
kind of valid, question, then turn it into a rant about how the whole
effort is useless or worse. People simply will stop answering you if
this carries on.

Alex
Richard Purdie Feb. 18, 2022, 2:51 p.m. UTC | #7
On Fri, 2022-02-18 at 15:27 +0100, Enrico Scholz wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> 
> > > > This script searches for a list of variable that have been renamed
> > > > and converts them to their more descriptive names.
> 
> s/descriptive/politically correct/

We did try and make some of the names better describe what the variables do and
make the variables more consistent.

For example, a reference to HASHBASE was changed to BASEHASH which better
matches every other reference to that thing in the code.

The BB_ENV variables were also traditionally very confusing. Once you know, you
know but for new users they weren't great.

Also see the discussion about the license variable issues. I'm very consciously
blocking the "simple" change as I want to see things improve.

> Variable names were perfectly readable.  A change should make things
> better, not worse.

See above, there are at least some changes which do. I'm sad you don't agree.

> > Unfortunately there likely isn't a perfect rename and these were the
> > best we were able to come up with.
> 
> Why is the rename done at all?  It makes the product just worse due to a
> less usable api and because it causes a lot of work for the users of the
> api (I shudder when I think about updating my CI to work with new and
> old branches).

For better or worse some of the terminology we have is offensive to some people.
Obviously (and sadly) there will probably always be some element of something
which someone could be offended by but these issues have come under a particular
spotlight. People with far more experience of this than us have produced
guidelines on the key issues, the priority of dealing with certain language and
so on.

We do have people wanting to try and improve things. Projects do need to be open
to and able to change. If we don't try and take this input and help those people
make such changes where appropriate, we'll just alienate and frustrate more
users even if those users were not personally offended by the language, a kind
of "rot" can set in to our community. I do not want to see that.

I have advised caution all the way through this. Should the project get this
wrong, the potential for negative social media feedback against the project is
huge. For me personally, there is also a huge risk should I "misstep" in
handling this. The potential to break things for users is also really high, I
realise that. I have done a lot of work to try and make sure issues are at least
clearly reported to users with some idea of how to resolve them so the
transition is less painful. I do have some experience of making changes to the
project and am trying my best to use that to our benefit.

At this stage, rightly or wrongly (I make no judgement) the project cannot
afford not to make these changes. The best thing we can do is to try and reduce
the impact on users as best we can.

In doing so, we also make the project's values very clear. We do not mean any
offence by the language we've had, we just didn't realise the issues. Now we are
aware, we're doing our best to correct it.

Cheers,

Richard
Enrico Scholz Feb. 18, 2022, 6:31 p.m. UTC | #8
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> > > > This script searches for a list of variable that have been renamed
>> > > > and converts them to their more descriptive names.
>> 
>> s/descriptive/politically correct/
>
> We did try and make some of the names better describe what the variables
> do and make the variables more consistent.
>
> For example, a reference to HASHBASE was changed to BASEHASH which
> better matches every other reference to that thing in the code.

When you really feel that this Yoda style naming must be fixed; then ok.
But for me, the gain would not be large enough to break the api.


> The BB_ENV variables were also traditionally very confusing. Once you
> know, you know but for new users they weren't great.

The new naming is much more confusing.  While this kind of operation
(exclude or include something) was previously named by "whitelist" +
"blacklist" more or less consistently, it is scattered around across
multiple variants now.

Especially for the hash related lists, searching for "whitelist" or
"blacklist" in the sources revealed often the right way how to do
things.


> Also see the discussion about the license variable issues. I'm very
> consciously blocking the "simple" change as I want to see things
> improve.
>
>> Variable names were perfectly readable.  A change should make things
>> better, not worse.
>
> See above, there are at least some changes which do. I'm sad you don't
> agree.

ok; some changes are ok.  E.g.

| "PNBLACKLIST":"SKIP_RECIPE",

because former name is misleading (it is not really a list/set but has
some specially addressing by varflags).


>> Why is the rename done at all?  It makes the product just worse due to a
>> less usable api and because it causes a lot of work for the users of the
>> api (I shudder when I think about updating my CI to work with new and
>> old branches).
>
> For better or worse some of the terminology we have is offensive to
> some people.

Is there any real-world evidence that this is really the case?  Especially
the embedded sector is filled with blacklisted terms (i2c master/slave,
spi mosi/miso) so I do not think that somebody is really offended.

In the opposite, I feel offended when such changes are labeled as
(technical) improvements ("more descriptive names").


> Obviously (and sadly) there will probably always be some element of
> something which someone could be offended by but these issues have
> come under a particular spotlight. People with far more experience of
> this than us have produced guidelines on the key issues,

In my experience, such people have too much spare time and invented a
problem only to come up with a solution later.


> We do have people wanting to try and improve things. Projects do
> need to be open to and able to change. If we don't try and take this
> input and help those people make such changes where appropriate,
> we'll just alienate and frustrate more users even if those users
> were not personally offended by the language, a kind of "rot" can
> set in to our community. I do not want to see that

I see a much higher risk when significant changes are done due to
political reasons rather than technical ones.


> I have advised caution all the way through this. Should the project
> get this wrong, the potential for negative social media feedback
> against the project is huge.

Does such negative social media feedback really exist resp. does it
exist in channels that are relevant to us?

Linux kernel has not "cleaned" its api either and just said that future
development should avoid certain terms. I can not remember any negative
feedback.

Just do the same here: avoid some terms in future development and when
there are really technical reasons to touch existing variables, then
rename them.


> For me personally, there is also a huge risk should I "misstep" in
> handling this. The potential to break things for users is also really
> high, I realise that. I have done a lot of work to try and make sure
> issues are at least clearly reported to users with some idea of how
> to resolve them so the transition is less painful. I do have some
> experience of making changes to the project and am trying my best to
> use that to our benefit.

You are doing a great job, but the project evolved beautiful over the
last >15 years with the old identifier and I can not remember any
complaints about them.



Enrico

Patch

diff --git a/scripts/contrib/convert-variables.py b/scripts/contrib/convert-variables.py
new file mode 100755
index 0000000000..a632fd4d5c
--- /dev/null
+++ b/scripts/contrib/convert-variables.py
@@ -0,0 +1,110 @@ 
+#!/usr/bin/env python3
+#
+# Conversion script to rename variables with more descriptive terms
+#
+#
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+import re
+import os
+import sys
+import tempfile
+import shutil
+import mimetypes
+
+if len(sys.argv) < 2:
+    print("Please specify a directory to run the conversion script against.")
+    sys.exit(1)
+
+renames = {
+"BB_ENV_WHITELIST":"BB_ENV_PASSTHROUGH",
+"BB_ENV_EXTRAWHITE":"BB_ENV_PASSTHROUGH_ADDITIONS",
+"BB_HASHCONFIG_WHITELIST":"BB_HASHCONFIG_IGNORE_VARS",
+"BB_SETSCENE_ENFORCE_WHITELIST":"BB_SETSCENE_ENFORCE_IGNORE_TASKS",
+"BB_HASHBASE_WHITELIST":"BB_BASEHASH_IGNORE_VARS",
+"BB_HASHTASK_WHITELIST":"BB_TASKHASH_IGNORE_TASKS",
+"CVE_CHECK_PN_WHITELIST":"CVE_CHECK_SKIP_RECIPE",
+"CVE_CHECK_WHITELIST":"CVE_CHECK_IGNORE",
+"MULTI_PROVIDER_WHITELIST":"BB_MULTI_PROVIDER_ALLOWED",
+"PNBLACKLIST":"SKIP_RECIPE",
+"SDK_LOCAL_CONF_BLACKLIST":"ESDK_LOCAL_CONF_ALLOW",
+"SDK_LOCAL_CONF_WHITELIST":"ESDK_LOCAL_CONF_REMOVE",
+"SDK_INHERIT_BLACKLIST":"ESDK_CLASS_INHERIT_DISABLE",
+"SSTATE_DUPWHITELIST":"SSTATE_ALLOW_OVERLAP_FILES",
+"SYSROOT_DIRS_BLACKLIST":"SYSROOT_DIRS_IGNORE",
+"UNKNOWN_CONFIGURE_WHITELIST":"UNKNOWN_CONFIGURE_OPT_IGNORE",
+"WHITELIST_":"INCOMPATIBLE_LICENSE_ALLOWED_RECIPE_",
+}
+
+removed_list = [
+"BB_STAMP_WHITELIST",
+"BB_STAMP_POLICY",
+"ICECC_USER_CLASS_BL",
+"ICECC_USER_PACKAGE_BL",
+"ICECC_USER_PACKAGE_WL",
+"INHERIT_BLACKLIST",
+"TUNEABI_WHITELIST",
+]
+
+context_check_list = [
+"blacklist",
+"whitelist",
+"abort",
+]
+
+def processfile(fn):
+
+    print(f"processing file '{fn}'")
+    try:
+        fh, abs_path = tempfile.mkstemp()
+        modified = False
+        with os.fdopen(fh, 'w') as new_file:
+            with open(fn, "r") as old_file:
+                lineno = 0
+                for line in old_file:
+                    lineno += 1
+                    if not line or "BB_RENAMED_VARIABLE" in line:
+                        continue
+                    # Do the renames
+                    for old_name, new_name in renames.items():
+                        if old_name in line:
+                            line = line.replace(old_name, new_name)
+                            modified = True
+                    # Find removed names
+                    for removed_name in removed_list:
+                        if removed_name in line:
+                            print(f"{fn} needs further work at line {lineno} because {removed_name} has been deprecated")
+                    for check_word in context_check_list:
+                        if re.search(check_word, line, re.IGNORECASE):
+                            print(f"{fn} needs further work at line {lineno} since it contains {check_word}")
+                    new_file.write(line)
+            if modified:
+                print(f"*** Modified file '{fn}'")
+                shutil.copymode(fn, abs_path)
+                os.remove(fn)
+                shutil.move(abs_path, fn)
+    except UnicodeDecodeError:
+        pass
+
+ourname = os.path.basename(sys.argv[0])
+ourversion = "0.1"
+
+if os.path.isfile(sys.argv[1]):
+    processfile(sys.argv[1])
+    sys.exit(0)
+
+for targetdir in sys.argv[1:]:
+    print("processing directory '%s'" % targetdir)
+    for root, dirs, files in os.walk(targetdir):
+        for name in files:
+            if name == ourname:
+                continue
+            fn = os.path.join(root, name)
+            if os.path.islink(fn):
+                continue
+            if "ChangeLog" in fn or "/.git/" in fn or fn.endswith(".html") or fn.endswith(".patch") or fn.endswith(".m4") or fn.endswith(".diff") or fn.endswith(".orig"):
+                continue
+            processfile(fn)
+
+print("All files processed with version %s" % ourversion)