| Message ID | 20260115132223.2034792-1-daiane.angolini@foundries.io |
|---|---|
| State | New |
| Headers | show |
| Series | bitbake-setup: verify if a configuration file is html | expand |
On Thu, 15 Jan 2026 at 14:22, Daiane Angolini via lists.openembedded.org <daiane.angolini=foundries.io@lists.openembedded.org> wrote: > If the file is a html and not a json, lets notify the user. The commit message needs a bit of additional context: when a user is copy-pasting a URI from a browser, they need the URI to the raw JSON data, not an HTML representation of it (as is common with web UIs to git repositories for example). JSON decoding errors are not helpful in that situation, so let's provide a better message. > > Signed-off-by: Daiane Angolini <daiane.angolini@foundries.io> > --- > bin/bitbake-setup | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/bin/bitbake-setup b/bin/bitbake-setup > index abe7614c8..1a1f0cdd4 100755 > --- a/bin/bitbake-setup > +++ b/bin/bitbake-setup > @@ -528,7 +528,10 @@ def obtain_config(top_dir, settings, args, source_overrides, d): > logger.info("Reading configuration from network URI\n {}".format(config_id)) > import urllib.request > with urllib.request.urlopen(config_id) as f: > - upstream_config = {'type':'network','uri':config_id,'name':get_config_name(config_id),'data':json.load(f)} > + content = f.read().decode('utf-8') > + if content.lstrip().startswith('<!') or content.lstrip().lower().startswith('<html'): > + raise Exception("Invalid configuration file: received HTML instead of JSON from {}".format(config_id)) > + upstream_config = {'type':'network','uri':config_id,'name':get_config_name(config_id),'data':json.loads(f)} if f contains valid json, then f.read() is followed by json.loads(f). Will the second call still obtain the needed json data from f, once it's already been read previously? I'd suggest simply: try: json_data = json.loads(f) except json.JSONDecodeError as e: ... (do stuff with e.doc per https://docs.python.org/3/library/json.html#json.JSONDecodeError) Thanks, Alex
On Thu, Jan 15, 2026 at 11:14 AM Alexander Kanavin <alex.kanavin@gmail.com> wrote: > > On Thu, 15 Jan 2026 at 14:22, Daiane Angolini via > lists.openembedded.org > <daiane.angolini=foundries.io@lists.openembedded.org> wrote: > > If the file is a html and not a json, lets notify the user. > > The commit message needs a bit of additional context: when a user is > copy-pasting a URI from a browser, they need the URI to the raw JSON > data, not an HTML representation of it (as is common with web UIs to > git repositories for example). JSON decoding errors are not helpful in > that situation, so let's provide a better message. > ok > > > > Signed-off-by: Daiane Angolini <daiane.angolini@foundries.io> > > --- > > bin/bitbake-setup | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/bin/bitbake-setup b/bin/bitbake-setup > > index abe7614c8..1a1f0cdd4 100755 > > --- a/bin/bitbake-setup > > +++ b/bin/bitbake-setup > > @@ -528,7 +528,10 @@ def obtain_config(top_dir, settings, args, source_overrides, d): > > logger.info("Reading configuration from network URI\n {}".format(config_id)) > > import urllib.request > > with urllib.request.urlopen(config_id) as f: > > - upstream_config = {'type':'network','uri':config_id,'name':get_config_name(config_id),'data':json.load(f)} > > + content = f.read().decode('utf-8') > > + if content.lstrip().startswith('<!') or content.lstrip().lower().startswith('<html'): > > + raise Exception("Invalid configuration file: received HTML instead of JSON from {}".format(config_id)) > > + upstream_config = {'type':'network','uri':config_id,'name':get_config_name(config_id),'data':json.loads(f)} > > if f contains valid json, then f.read() is followed by json.loads(f). > Will the second call still obtain the needed json data from f, once > it's already been read previously? I've decided to go with the flow, let it fail and only raise the exception. > > I'd suggest simply: > > try: > json_data = json.loads(f) > except json.JSONDecodeError as e: > ... (do stuff with e.doc per > https://docs.python.org/3/library/json.html#json.JSONDecodeError) Thanks, will do that! Do you think it worth trying to guess the raw/plain URI and use the guessing instead? I know how to guess for git.openembedded and github.com, but I cannot imagine how to guess for every single possibility in the world (that's why I haven't tried to guess) Daiane > > > Thanks, > Alex
On Thu, 15 Jan 2026 at 15:49, Daiane Angolini <daiane.angolini@foundries.io> wrote: > > if f contains valid json, then f.read() is followed by json.loads(f). > > Will the second call still obtain the needed json data from f, once > > it's already been read previously? > > I've decided to go with the flow, let it fail and only raise the exception. That does not confirm that the code works when it is reading valid json. Indeed, getting valid json over http works without the patch, and with the patch it breaks like this: =========== $ /srv/work/alex/bitbake/bin//bitbake-setup init https://raw.githubusercontent.com/kanavin/bitbake/refs/heads/akanavin/bitbake-setup/default-registry/configurations/poky-master.conf.json NOTE: Reading configuration from network URI https://raw.githubusercontent.com/kanavin/bitbake/refs/heads/akanavin/bitbake-setup/default-registry/configurations/poky-master.conf.json Traceback (most recent call last): File "/srv/work/alex/bitbake/bin//bitbake-setup", line 1140, in <module> ... TypeError: the JSON object must be str, bytes or bytearray, not HTTPResponse =========== Even though the logic will be rewritten, please try to test all scenarios. > Do you think it worth trying to guess the raw/plain URI and use the > guessing instead? I know how to guess for git.openembedded and > github.com, but I cannot imagine how to guess for every single > possibility in the world (that's why I haven't tried to guess) I wonder if it should even try to guess whether it's html. Just say that the data at that URI cannot be decoded as it is not valid json, and suggest that it might be a html page that shows the data, not the raw data itself, and ask the user to find the raw data link on the page. There can be all kinds of bogus data coming from the network, and this would cover all possibilities. Cheers, Alex
On Thu, Jan 15, 2026 at 2:32 PM Alexander Kanavin <alex.kanavin@gmail.com> wrote: > > On Thu, 15 Jan 2026 at 15:49, Daiane Angolini > <daiane.angolini@foundries.io> wrote: > > > if f contains valid json, then f.read() is followed by json.loads(f). > > > Will the second call still obtain the needed json data from f, once > > > it's already been read previously? > > > > I've decided to go with the flow, let it fail and only raise the exception. > > That does not confirm that the code works when it is reading valid json. > > Indeed, getting valid json over http works without the patch, and with > the patch it breaks like this: > =========== > $ /srv/work/alex/bitbake/bin//bitbake-setup init > https://raw.githubusercontent.com/kanavin/bitbake/refs/heads/akanavin/bitbake-setup/default-registry/configurations/poky-master.conf.json > NOTE: Reading configuration from network URI > https://raw.githubusercontent.com/kanavin/bitbake/refs/heads/akanavin/bitbake-setup/default-registry/configurations/poky-master.conf.json > Traceback (most recent call last): > File "/srv/work/alex/bitbake/bin//bitbake-setup", line 1140, in <module> > ... > TypeError: the JSON object must be str, bytes or bytearray, not HTTPResponse > =========== > > Even though the logic will be rewritten, please try to test all scenarios. v2 sent This time I tested with working/not working, but still only github.com there was doubts from my side how do you like it, but after you review I change again. I appreciate your attention in advance. > > > Do you think it worth trying to guess the raw/plain URI and use the > > guessing instead? I know how to guess for git.openembedded and > > github.com, but I cannot imagine how to guess for every single > > possibility in the world (that's why I haven't tried to guess) > > I wonder if it should even try to guess whether it's html. Just say > that the data at that URI cannot be decoded as it is not valid json, > and suggest that it might be a html page that shows the data, not the > raw data itself, and ask the user to find the raw data link on the > page. There can be all kinds of bogus data coming from the network, > and this would cover all possibilities. ok (meaning ok, let it be like this for now) > > Cheers, > Alex
diff --git a/bin/bitbake-setup b/bin/bitbake-setup index abe7614c8..1a1f0cdd4 100755 --- a/bin/bitbake-setup +++ b/bin/bitbake-setup @@ -528,7 +528,10 @@ def obtain_config(top_dir, settings, args, source_overrides, d): logger.info("Reading configuration from network URI\n {}".format(config_id)) import urllib.request with urllib.request.urlopen(config_id) as f: - upstream_config = {'type':'network','uri':config_id,'name':get_config_name(config_id),'data':json.load(f)} + content = f.read().decode('utf-8') + if content.lstrip().startswith('<!') or content.lstrip().lower().startswith('<html'): + raise Exception("Invalid configuration file: received HTML instead of JSON from {}".format(config_id)) + upstream_config = {'type':'network','uri':config_id,'name':get_config_name(config_id),'data':json.loads(f)} else: logger.info("Looking up config {} in configuration registry".format(config_id)) registry_path = update_registry(settings["default"]["registry"], cache_dir(top_dir), d)
If the file is a html and not a json, lets notify the user. Signed-off-by: Daiane Angolini <daiane.angolini@foundries.io> --- bin/bitbake-setup | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)