diff mbox series

parse: Forbid ambiguous assignments to ${.}, ${+}, and ${:} variables

Message ID 20250129185531.3752682-1-n.merinov@inango-systems.com
State Accepted, archived
Commit 93059aad13a12cd69d86368795c88e5349197d5d
Headers show
Series parse: Forbid ambiguous assignments to ${.}, ${+}, and ${:} variables | expand

Commit Message

Nikolai Merinov Jan. 29, 2025, 6:55 p.m. UTC
Old code that parse variable names in assignment commands behave differntly for
variables that ends with special symbol for single-character variable names and
multi-character variable names. For example:

  A+="1"   # Change variable ${A}, '+' glued to '='
  A+ = "1" # Change variable ${A+}

  +="1"    # Change variable ${+}, the '+' symbol not part of assignment operator
  + = "1"  # Change variable ${+}

New code would always assume that '.=', '+=', and ':=' is assignment operator.
As result code like the following would raise parsing error

  +="value"

While code with extra spaces would work as before

  + = "value" # Change variable ${+}

This change allow to catch issues in code that generate bitbake configuration
files in a manner like "echo ${VARNAME}+=${VALUE} >> conf/local.conf"

Signed-off-by: Nikolai Merinov <n.merinov@inango-systems.com>
---
 lib/bb/parse/parse_py/ConfHandler.py                       | 2 +-
 lib/toaster/toastermain/management/commands/buildimport.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Peter Kjellerstedt Jan. 31, 2025, 8:40 p.m. UTC | #1
> -----Original Message-----
> From: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org> On Behalf Of Nikolay Merinov via lists.openembedded.org
> Sent: den 29 januari 2025 19:56
> To: bitbake-devel@lists.openembedded.org
> Cc: Nikolai Merinov <n.merinov@inango-systems.com>
> Subject: [bitbake-devel] [PATCH] parse: Forbid ambiguous assignments to ${.}, ${+}, and ${:} variables
> 
> Old code that parse variable names in assignment commands behave differntly for

differntly -> differently 

> variables that ends with special symbol for single-character variable names and
> multi-character variable names. For example:
> 
>   A+="1"   # Change variable ${A}, '+' glued to '='
>   A+ = "1" # Change variable ${A+}
> 
>   +="1"    # Change variable ${+}, the '+' symbol not part of assignment operator
>   + = "1"  # Change variable ${+}
> 
> New code would always assume that '.=', '+=', and ':=' is assignment operator.
> As result code like the following would raise parsing error
> 
>   +="value"
> 
> While code with extra spaces would work as before
> 
>   + = "value" # Change variable ${+}
> 
> This change allow to catch issues in code that generate bitbake configuration
> files in a manner like "echo ${VARNAME}+=${VALUE} >> conf/local.conf"
> 
> Signed-off-by: Nikolai Merinov <n.merinov@inango-systems.com>
> ---
>  lib/bb/parse/parse_py/ConfHandler.py                       | 2 +-
>  lib/toaster/toastermain/management/commands/buildimport.py | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/bb/parse/parse_py/ConfHandler.py b/lib/bb/parse/parse_py/ConfHandler.py
> index 24f81f7e9..bdac66004 100644
> --- a/lib/bb/parse/parse_py/ConfHandler.py
> +++ b/lib/bb/parse/parse_py/ConfHandler.py
> @@ -20,7 +20,7 @@ from bb.parse import ParseError, resolve_file, ast, logger, handle
>  __config_regexp__  = re.compile( r"""
>      ^
>      (?P<exp>export\s+)?
> -    (?P<var>[a-zA-Z0-9\-_+.${}/~:]+?)
> +    (?P<var>([a-zA-Z0-9\-_${}/~] | [+.:](?!=))+?)
>      (\[(?P<flag>[a-zA-Z0-9\-_+.][a-zA-Z0-9\-_+.@/]*)\])?
> 
>      \s* (
> diff --git a/lib/toaster/toastermain/management/commands/buildimport.py b/lib/toaster/toastermain/management/commands/buildimport.py
> index f7139aa04..37d27a445 100644
> --- a/lib/toaster/toastermain/management/commands/buildimport.py
> +++ b/lib/toaster/toastermain/management/commands/buildimport.py
> @@ -60,7 +60,7 @@ def _log(msg):
>  __config_regexp__  = re.compile( r"""
>      ^
>      (?P<exp>export\s+)?
> -    (?P<var>[a-zA-Z0-9\-_+.${}/~]+?)
> +    (?P<var>([a-zA-Z0-9\-_${}/~] | [.+](?!=))+?)
>      (\[(?P<flag>[a-zA-Z0-9\-_+.]+)\])?
> 
>      \s* (
> --
> 2.34.1
> 

The above made me ponder on the existence of $, { and } in the regular 
expression for the variable name and what that can be (ab)used for. I 
realized that it allows you do define really odd variables such as:

${} = "foobar"
} = "foobar"
$ = "foobar"

And while you cannot access them directly using, e.g., "${$}", you can 
access them using, e.g., "${@d.getVar('$')"...

This also allows defining variables that do not match the syntax at all, 
e.g.:

EQUAL = "="
${EQUAL} = "bar"

which will define the variable "=" and where "${@d.getVar('=')}" then
returns "bar" as expected...

I am not sure this is a problem or working as intended, but it is a bit 
odd.

//Peter
Richard Purdie Feb. 1, 2025, 1:29 p.m. UTC | #2
On Wed, 2025-01-29 at 20:55 +0200, Nikolay Merinov via lists.openembedded.org wrote:
> Old code that parse variable names in assignment commands behave differntly for
> variables that ends with special symbol for single-character variable names and
> multi-character variable names. For example:
> 
>   A+="1"   # Change variable ${A}, '+' glued to '='
>   A+ = "1" # Change variable ${A+}
> 
>   +="1"    # Change variable ${+}, the '+' symbol not part of assignment operator
>   + = "1"  # Change variable ${+}
> 
> New code would always assume that '.=', '+=', and ':=' is assignment operator.
> As result code like the following would raise parsing error
> 
>   +="value"
> 
> While code with extra spaces would work as before
> 
>   + = "value" # Change variable ${+}
> 
> This change allow to catch issues in code that generate bitbake configuration
> files in a manner like "echo ${VARNAME}+=${VALUE} >> conf/local.conf"
> 
> Signed-off-by: Nikolai Merinov <n.merinov@inango-systems.com>
> ---
>  lib/bb/parse/parse_py/ConfHandler.py                       | 2 +-
>  lib/toaster/toastermain/management/commands/buildimport.py | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

This is an interesting bug, thanks for reporting it and sending a patch.

I'm a little torn on what the right thing to do here is. I'm wondering
if it is time to say that whitespace around the assignment operator is
required. That will break metadata and I've found a few hundred places
it would break. Equally, those can be fixed and we'd end up with a
better end result as the whitespace does make recipe files more
readable. I have a patch to fix most of the references in OE-Core.

I'm worried that if we do what you patch does, we're complicating the
code and potentially setting ourselves up for other new interesting
future problems.

I'd also love to see some test cases adding to lib/bb/tests/parse.py so
that "bitbake-selftest bb.tests.parse" would detect these problems if
we ever introduced them in future.

Cheers,

Richard
Nikolai Merinov Feb. 3, 2025, 4:42 p.m. UTC | #3
Hi Richard,

> I'm worried that if we do what you patch does, we're complicating the
> code and potentially setting ourselves up for other new interesting
> future problems.

How about moving complicated part out of regular expression?

Please take a look to the second version of the patch.

Regards,
Nikolai
Richard Purdie Feb. 3, 2025, 5:40 p.m. UTC | #4
Hi Nikolai,

On Mon, 2025-02-03 at 18:42 +0200, Nikolai Merinov wrote:
> Hi Richard,
> 
> > I'm worried that if we do what you patch does, we're complicating
> > the
> > code and potentially setting ourselves up for other new interesting
> > future problems.
> 
> How about moving complicated part out of regular expression?
> 
> Please take a look to the second version of the patch.

I prefer that version thanks!

If we can tweak the test name, I think we can merge that and I'll
continue to think about whether we should make the whitespace parsing
stricter.

Cheers,

Richard
Richard Purdie Feb. 3, 2025, 5:42 p.m. UTC | #5
On Fri, 2025-01-31 at 20:40 +0000, Peter Kjellerstedt via lists.openembedded.org wrote:
> > -----Original Message-----
> > From: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org> On Behalf Of Nikolay Merinov via lists.openembedded.org
> > Sent: den 29 januari 2025 19:56
> > To: bitbake-devel@lists.openembedded.org
> > Cc: Nikolai Merinov <n.merinov@inango-systems.com>
> > Subject: [bitbake-devel] [PATCH] parse: Forbid ambiguous assignments to ${.}, ${+}, and ${:} variables
> > 
> > Old code that parse variable names in assignment commands behave differntly for
> 
> differntly -> differently 
> 
> > variables that ends with special symbol for single-character variable names and
> > multi-character variable names. For example:
> > 
> >   A+="1"   # Change variable ${A}, '+' glued to '='
> >   A+ = "1" # Change variable ${A+}
> > 
> >   +="1"    # Change variable ${+}, the '+' symbol not part of assignment operator
> >   + = "1"  # Change variable ${+}
> > 
> > New code would always assume that '.=', '+=', and ':=' is assignment operator.
> > As result code like the following would raise parsing error
> > 
> >   +="value"
> > 
> > While code with extra spaces would work as before
> > 
> >   + = "value" # Change variable ${+}
> > 
> > This change allow to catch issues in code that generate bitbake configuration
> > files in a manner like "echo ${VARNAME}+=${VALUE} >> conf/local.conf"
> > 
> > Signed-off-by: Nikolai Merinov <n.merinov@inango-systems.com>
> > ---
> >  lib/bb/parse/parse_py/ConfHandler.py                       | 2 +-
> >  lib/toaster/toastermain/management/commands/buildimport.py | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/bb/parse/parse_py/ConfHandler.py b/lib/bb/parse/parse_py/ConfHandler.py
> > index 24f81f7e9..bdac66004 100644
> > --- a/lib/bb/parse/parse_py/ConfHandler.py
> > +++ b/lib/bb/parse/parse_py/ConfHandler.py
> > @@ -20,7 +20,7 @@ from bb.parse import ParseError, resolve_file, ast, logger, handle
> >  __config_regexp__  = re.compile( r"""
> >      ^
> >      (?P<exp>export\s+)?
> > -    (?P<var>[a-zA-Z0-9\-_+.${}/~:]+?)
> > +    (?P<var>([a-zA-Z0-9\-_${}/~] | [+.:](?!=))+?)
> >      (\[(?P<flag>[a-zA-Z0-9\-_+.][a-zA-Z0-9\-_+.@/]*)\])?
> > 
> >      \s* (
> > diff --git a/lib/toaster/toastermain/management/commands/buildimport.py b/lib/toaster/toastermain/management/commands/buildimport.py
> > index f7139aa04..37d27a445 100644
> > --- a/lib/toaster/toastermain/management/commands/buildimport.py
> > +++ b/lib/toaster/toastermain/management/commands/buildimport.py
> > @@ -60,7 +60,7 @@ def _log(msg):
> >  __config_regexp__  = re.compile( r"""
> >      ^
> >      (?P<exp>export\s+)?
> > -    (?P<var>[a-zA-Z0-9\-_+.${}/~]+?)
> > +    (?P<var>([a-zA-Z0-9\-_${}/~] | [.+](?!=))+?)
> >      (\[(?P<flag>[a-zA-Z0-9\-_+.]+)\])?
> > 
> >      \s* (
> > --
> > 2.34.1
> > 
> 
> The above made me ponder on the existence of $, { and } in the regular 
> expression for the variable name and what that can be (ab)used for. I
> realized that it allows you do define really odd variables such as:
> 
> ${} = "foobar"
> } = "foobar"
> $ = "foobar"
> 
> And while you cannot access them directly using, e.g., "${$}", you can 
> access them using, e.g., "${@d.getVar('$')"...
> 
> This also allows defining variables that do not match the syntax at all, 
> e.g.:
> 
> EQUAL = "="
> ${EQUAL} = "bar"
> 
> which will define the variable "=" and where "${@d.getVar('=')}" then
> returns "bar" as expected...
> 
> I am not sure this is a problem or working as intended, but it is a bit 
> odd.
> 

The set/getVar API doesn't really sanity check the variable names so it
is behaving much as I'd have expected. Whether it should be a bit
stricter about variable names is a good question...

Cheers,

Richard
Nikolai Merinov Feb. 4, 2025, 7:13 a.m. UTC | #6
Hi Richard,

> The test name needs changing from test_parse_exports though else it
> will redefinine one of the other tests.

Thank you for noticing this! I checked that I got test failure without
rest part of my patch and immediately forget to ajust test name.

Feel free to fix naming if it looks incorrectfor you.

Regards,
Nikolai
Nikolai Merinov Feb. 4, 2025, 11:11 a.m. UTC | #7
Hi Peter,

> The above made me ponder on the existence of $, { and } in the regular
> expression for the variable name and what that can be (ab)used for. I
> realized that it allows you do define really odd variables such as:
> 
> ${} = "foobar"
> } = "foobar"
> $ = "foobar"
> 
> And while you cannot access them directly using, e.g., "${$}", you can
> access them using, e.g., "${@d.getVar('$')"...
> 
> This also allows defining variables that do not match the syntax at all,
> e.g.:
> 
> EQUAL = "="
> ${EQUAL} = "bar"
> 
> which will define the variable "=" and where "${@d.getVar('=')}" then
> returns "bar" as expected...
> 
> I am not sure this is a problem or working as intended, but it is a bit
> odd.

I did not aware of bitbake development history, but I believe that there
can be only one reason to do it: Makefile allow same kind of bizarre things. 

    $ cat Makefile 
    name = ==არ გააკეთო==
    $(name) = Hello World
    all: ; echo $(==არ გააკეთო==)
    $ make
    echo Hello World
    Hello World
    $

Regards,
Nikolai
diff mbox series

Patch

diff --git a/lib/bb/parse/parse_py/ConfHandler.py b/lib/bb/parse/parse_py/ConfHandler.py
index 24f81f7e9..bdac66004 100644
--- a/lib/bb/parse/parse_py/ConfHandler.py
+++ b/lib/bb/parse/parse_py/ConfHandler.py
@@ -20,7 +20,7 @@  from bb.parse import ParseError, resolve_file, ast, logger, handle
 __config_regexp__  = re.compile( r"""
     ^
     (?P<exp>export\s+)?
-    (?P<var>[a-zA-Z0-9\-_+.${}/~:]+?)
+    (?P<var>([a-zA-Z0-9\-_${}/~] | [+.:](?!=))+?)
     (\[(?P<flag>[a-zA-Z0-9\-_+.][a-zA-Z0-9\-_+.@/]*)\])?
 
     \s* (
diff --git a/lib/toaster/toastermain/management/commands/buildimport.py b/lib/toaster/toastermain/management/commands/buildimport.py
index f7139aa04..37d27a445 100644
--- a/lib/toaster/toastermain/management/commands/buildimport.py
+++ b/lib/toaster/toastermain/management/commands/buildimport.py
@@ -60,7 +60,7 @@  def _log(msg):
 __config_regexp__  = re.compile( r"""
     ^
     (?P<exp>export\s+)?
-    (?P<var>[a-zA-Z0-9\-_+.${}/~]+?)
+    (?P<var>([a-zA-Z0-9\-_${}/~] | [.+](?!=))+?)
     (\[(?P<flag>[a-zA-Z0-9\-_+.]+)\])?
 
     \s* (