[1/5] Add a script to validate documentation glossaries

Message ID	20250729-variables-checks-v1-1-5ad6ca9b9386@bootlin.com
State	New
Headers	show Return-Path: <antonin.godard@bootlin.com> ip: 217.70.183.194, mailfrom: antonin.godard@bootlin.com) From: Antonin Godard <antonin.godard@bootlin.com> Date: Tue, 29 Jul 2025 13:30:02 +0200 Subject: [PATCH 1/5] Add a script to validate documentation glossaries MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20250729-variables-checks-v1-1-5ad6ca9b9386@bootlin.com> References: <20250729-variables-checks-v1-0-5ad6ca9b9386@bootlin.com> In-Reply-To: <20250729-variables-checks-v1-0-5ad6ca9b9386@bootlin.com> To: docs@lists.yoctoproject.org Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>, Antonin Godard <antonin.godard@bootlin.com>
Series	Add checks for glossaries correctness \| expand [0/5] Add checks for glossaries correctness [1/5] Add a script to validate documentation glossaries [2/5] Makefile: add a checks rule [3/5] ref-manual/variables.rst: fix the glossary [4/5] ref-manual/variables.rst: sort variables [5/5] ref-manual/terms.rst: sort entries

Message ID

20250729-variables-checks-v1-1-5ad6ca9b9386@bootlin.com

State

New

Headers

From: Antonin Godard <antonin.godard@bootlin.com>
Date: Tue, 29 Jul 2025 13:30:02 +0200
Subject: [PATCH 1/5] Add a script to validate documentation glossaries
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <20250729-variables-checks-v1-1-5ad6ca9b9386@bootlin.com>
References: <20250729-variables-checks-v1-0-5ad6ca9b9386@bootlin.com>
In-Reply-To: <20250729-variables-checks-v1-0-5ad6ca9b9386@bootlin.com>
To: docs@lists.yoctoproject.org
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
 Antonin Godard <antonin.godard@bootlin.com>

Series

Add checks for glossaries correctness | expand

Commit Message

Antonin Godard July 29, 2025, 11:30 a.m. UTC

Instead of tracking the glossary manually, add a small script that
checks if it is properly sorted.

Add two comments between the start and end of the glossary for the
script to know where it's located.

The script also checks if the variables are properly sorted. It uses
difflib and returns the diff if there's a difference between the
unsorted and sorted list.

Messages beginning with "WARNING:" are reported by the Autobuilder,
which is the reason for this format.

Signed-off-by: Antonin Godard <antonin.godard@bootlin.com>
---
 documentation/ref-manual/variables.rst |  6 +++
 documentation/tools/check-glossaries   | 90 ++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/documentation/ref-manual/variables.rst b/documentation/ref-manual/variables.rst
index e4d5a9c97..6c2344950 100644
--- a/documentation/ref-manual/variables.rst
+++ b/documentation/ref-manual/variables.rst
@@ -7,6 +7,9 @@  Variables Glossary
 This chapter lists common variables used in the OpenEmbedded build
 system and gives an overview of their function and contents.
 
+..
+   check_glossary_begin
+
 :term:`A <ABIEXTENSION>` :term:`B` :term:`C <CACHE>`
 :term:`D` :term:`E <EFI_PROVIDER>` :term:`F <FAKEROOT>`
 :term:`G <GCCPIE>` :term:`H <HGDIR>` :term:`I <IMAGE_BASENAME>`
@@ -16,6 +19,9 @@  system and gives an overview of their function and contents.
 :term:`U <UBOOT_BINARY>` :term:`V <VIRTUAL-RUNTIME>`
 :term:`W <WARN_QA>` :term:`X <XSERVER>` :term:`Z <ZSTD_THREADS>`
 
+..
+   check_glossary_end
+
 .. glossary::
    :sorted:
 
diff --git a/documentation/tools/check-glossaries b/documentation/tools/check-glossaries
new file mode 100755
index 000000000..b5dfe834e
--- /dev/null
+++ b/documentation/tools/check-glossaries
@@ -0,0 +1,90 @@ 
+#!/usr/bin/env python3
+
+import argparse
+import difflib
+import os
+import re
+
+from pathlib import Path
+
+
+def parse_arguments() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Print supported distributions")
+
+    parser.add_argument("-d", "--docs-dir",
+                        type=Path,
+                        default=Path(os.path.dirname(os.path.realpath(__file__))) / "documentation",
+                        help="Path to documentation/ directory in yocto-docs")
+
+    return parser.parse_args()
+
+
+glossaries = (
+    'ref-manual/variables.rst',
+    'ref-manual/terms.rst',
+)
+
+
+def main():
+
+    args = parse_arguments()
+    in_glossary = False
+    # Pattern to match:
+    # :term:`A <ABIEXTENSION>` :term:`B` :term:`C <CACHE>`
+    glossary_re = re.compile(r":term:`(?P<letter>[A-Z]{1})( <(?P<varname>[A-Z_]+)>)?`")
+    entry_re = re.compile(r"^   :term:`(?P<entry>.+)`\s*$")
+
+    for rst in glossaries:
+
+        glossary = {}
+        rst_path = Path(args.docs_dir) / rst
+
+        with open(rst_path, "r") as f:
+            for line in f.readlines():
+                if "check_glossary_begin" in line:
+                    in_glossary = True
+                    continue
+                if in_glossary:
+                    for m in re.finditer(glossary_re, line.strip()):
+                        letter = m.group("letter")
+                        varname = m.group("varname")
+                        if varname is None:
+                            varname = letter
+                        glossary[letter] = varname
+                if "check_glossary_end" in line:
+                    in_glossary = False
+                    break
+
+        entries = []
+
+        with open(rst_path, "r") as f:
+            for line in f.readlines():
+                m = re.match(entry_re, line)
+                if m:
+                    entries.append(m.group("entry"))
+
+        # We lower here because underscore (_) come before lowercase letters
+        # (the natural way) but after uppercase letters (which is not natural)
+        sorted_entries = sorted(entries, key=lambda t: t.lower())
+        diffs = list(difflib.unified_diff(entries,
+                                          sorted_entries,
+                                          fromfile="original_list",
+                                          tofile="sorted_list"))
+
+        if diffs:
+            print(f"WARNING: {rst}: entries are not properly sorted:")
+            print('\n'.join(diffs))
+
+        for letter in glossary:
+            try:
+                index = entries.index(glossary[letter])
+            except ValueError:
+                print(f"WARNING: {rst}: variable "
+                      f"{glossary[letter]} in glossary does not exist")
+            if index > 0 and entries[index - 1].startswith(letter[0]):
+                print(f"WARNING: {rst}: The variable {glossary[letter]} shouldn't be in "
+                     "the glossary.")
+
+
+if __name__ == "__main__":
+    main()

[1/5] Add a script to validate documentation glossaries

Commit Message

Patch