diff mbox series

conf.py: improve SearchEnglish to handle terms with dots

Message ID 20250703102301.3566750-1-ejo@pengutronix.de
State New
Headers show
Series conf.py: improve SearchEnglish to handle terms with dots | expand

Commit Message

Enrico Jörns July 3, 2025, 10:23 a.m. UTC
While search queries already handled words with hyphens correctly, they
did not do so for words with dots.

To fix this, we

- enhance the word tokenizer to treat both dots ('.') and hyphens ('-')
  as valid characters within words.
  (For robustness, explicitly exclude dots/hyphens at the start or end
  of a word from indexing.)
- adjust query processing to avoid splitting on dots in search input

This allows search queries to correctly match terms such as
'local.conf', 'site.conf', and similar ones now.

Fixes: [YOCTO #14534]

Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
---
 documentation/conf.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/documentation/conf.py b/documentation/conf.py
index 1eca8756a..c07b6c419 100644
--- a/documentation/conf.py
+++ b/documentation/conf.py
@@ -179,13 +179,13 @@  from sphinx.search import SearchEnglish
 from sphinx.search import languages
 class DashFriendlySearchEnglish(SearchEnglish):
 
-    # Accept words that can include hyphens
-    _word_re = re.compile(r'[\w\-]+')
+    # Accept words that can include 'inner' hyphens or dots
+    _word_re = re.compile(r'[\w]+(?:[\.\-][\w]+)*')
 
     js_splitter_code = r"""
 function splitQuery(query) {
     return query
-        .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}-]+/gu)
+        .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}\-\.]+/gu)
         .filter(term => term.length > 0);
 }
 """