diff mbox series

Speeding up Checking sstate mirror object availability : Add async filtering SSTATE_MIRRORS

Message ID 20250506131751.93175-1-sider123456789101112131415@gmail.com
State New
Headers show
Series Speeding up Checking sstate mirror object availability : Add async filtering SSTATE_MIRRORS | expand

Commit Message

Константин May 6, 2025, 1:17 p.m. UTC
As the number of addresses specified in the SSTATE_MIRRORS variable increases, the duration of the "Checking sstate mirror object availability" process also increases.
This is logical, as the number of addresses to probe grows. However, it was observed that the "Checking sstate mirror object availability" process time increases even when SSTATE_MIRRORS contains one valid address and multiple unreachable addresses.

A patch has been implemented that, during the parsing of the local.conf configuration file, immediately after reading the SSTATE_MIRRORS variable, attempts to asynchronously establish TCP connections for each address listed in SSTATE_MIRRORS. If a connection is successfully established, the address remains in SSTATE_MIRRORS; if the connection attempt fails, the address is removed from the variable. This ensures that any addresses unreachable at the TCP level are excluded from subsequent processing.
These modifications are activated only if the FILTER_SSTATE_MIRRORS variable is set to 1 in local.conf.

Signed-off-by: KonstantinKondratenko <sider123456789101112131415@gmail.com>
---
 lib/bb/cookerdata.py | 67 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

Comments

Константин May 6, 2025, 1:24 p.m. UTC | #1
Hello dear colleagues!

Following the patch letter, I am attaching information about the problem we
found, how we investigated it, how we fixed it and the results of the fix.


As the number of addresses specified in the SSTATE_MIRRORS variable
increases, the duration of the "Checking sstate mirror object availability"
process also increases. This is logical, as the number of addresses to
probe grows. However, it was observed that the "Checking sstate mirror
object availability" process time increases even when SSTATE_MIRRORS
contains one valid address and multiple unreachable addresses.
To validate this observation, a testing system was developed. Two computers
were connected in a local network: one performs the build, while the second
shares the sstate-cache. On the second computer, the sstate-cache is hosted
on a single port. Iterative builds were performed on Computer 1, with
SSTATE_MIRRORS configured to include one valid port and a varying number
(from 1 to 48) of "non-working" servers — addresses pointing to Computer 2
but with closed ports. Below is an example configuration:
SSTATE_MIRRORS ?= " \
file://.* http://10.138.70.7:8150/sstate-cache/PATH;downloadfilename=PATH \
file://.* http://10.138.70.7:8151/sstate-cache/PATH;downloadfilename=PATH \
file://.* http://10.138.70.7:8152/sstate-cache/PATH;downloadfilename=PATH \
file://.* http://10.138.70.7:9999/sstate-cache/PATH;downloadfilename=PATH"
Here, http://10.138.70.7:9999 hosts a valid sstate-cache server, while
http://10.138.70.7:8150, http://10.138.70.7:8151, and
http://10.138.70.7:8152 are closed ports. The time from the start of the
build to the completion of the "Checking sstate mirror object availability"
process was measured, and the results are visualized in the figure:
https://drive.google.com/file/d/1uFwzDSnD0I_WGAeUyvNXTRb9qIxQBXqN/view?usp=sharing

Thus, it can be concluded that increasing the number of addresses in
SSTATE_MIRRORS leads to longer "Checking sstate mirror object availability"
process times, even when those addresses are unreachable.
It is also worth noting that servers can be unavailable for various
reasons, ranging from internal errors to network problems or DDoS attacks -
regardless of the reason, having unavailable servers specified in
local.cong leads to longer build times.

A patch has been implemented that, during the parsing of the local.conf
configuration file, immediately after reading the SSTATE_MIRRORS variable,
attempts to asynchronously establish TCP connections for each address
listed in SSTATE_MIRRORS. If a connection is successfully established, the
address remains in SSTATE_MIRRORS; if the connection attempt fails, the
address is removed from the variable. This ensures that any addresses
unreachable at the TCP level are excluded from subsequent processing.
These modifications are activated only if the FILTER_SSTATE_MIRRORS
variable is set to 1 in local.conf.

The experiment described above was conducted after applying the patch. The
results were recorded and visualized in:
https://drive.google.com/file/d/1vlkjbK20iCVLpYTWhZb08gNbDpXm2_ur/view?usp=sharing
As shown in the graph, the address filtering works successfully, and
increasing the number of non-working servers does not lead to longer
Checking sstate mirror object availability process times.
For clarity, the data before and after applying the patch are also
visualized in a logarithmic scale:
https://drive.google.com/file/d/1BKgWtMAlI-39VnTImI8NjNOpE6rIhG-7/view?usp=sharing
.



Best regards,
Konstantin E. *Kondratenko*

вт, 6 мая 2025 г. в 16:17, KonstantinKondratenko <
sider123456789101112131415@gmail.com>:

> As the number of addresses specified in the SSTATE_MIRRORS variable
> increases, the duration of the "Checking sstate mirror object availability"
> process also increases.
> This is logical, as the number of addresses to probe grows. However, it
> was observed that the "Checking sstate mirror object availability" process
> time increases even when SSTATE_MIRRORS contains one valid address and
> multiple unreachable addresses.
>
> A patch has been implemented that, during the parsing of the local.conf
> configuration file, immediately after reading the SSTATE_MIRRORS variable,
> attempts to asynchronously establish TCP connections for each address
> listed in SSTATE_MIRRORS. If a connection is successfully established, the
> address remains in SSTATE_MIRRORS; if the connection attempt fails, the
> address is removed from the variable. This ensures that any addresses
> unreachable at the TCP level are excluded from subsequent processing.
> These modifications are activated only if the FILTER_SSTATE_MIRRORS
> variable is set to 1 in local.conf.
>
> Signed-off-by: KonstantinKondratenko <sider123456789101112131415@gmail.com
> >
> ---
>  lib/bb/cookerdata.py | 67 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
>
> diff --git a/lib/bb/cookerdata.py b/lib/bb/cookerdata.py
> index 1f447d30c2..8b642bdd90 100644
> --- a/lib/bb/cookerdata.py
> +++ b/lib/bb/cookerdata.py
> @@ -305,6 +305,73 @@ class CookerDataBuilder(object):
>
>          bb.codeparser.update_module_dependencies(self.data)
>
> +        if self.data.getVar('FILTER_SSTATE_MIRRORS') == '1':
> +            try:
> +                import asyncio
> +                from urllib.parse import urlparse
> +
> +                mirrors = self.data.getVar('SSTATE_MIRRORS') or None
> +                if mirrors:
> +                    parts = mirrors.split()
> +                    mirrors_list = [' '.join(parts[i:i+2]) for i in
> range(0, len(parts), 2)]
> +
> +                def get_default_port(scheme):
> +                    if scheme == 'http':
> +                        return 80
> +                    elif scheme == 'https':
> +                        return 443
> +                    elif scheme == 'ftp':
> +                        return 21
> +                    elif scheme == 'ftps':
> +                        return 990
> +                    else:
> +                        return 8888  # Default value for unknown schemes
> +
> +                async def is_port_open(address, port, timeout=5):
> +                    try:
> +                        reader, writer = await
> asyncio.wait_for(asyncio.open_connection(address, port), timeout)
> +                        writer.close()
> +                        await writer.wait_closed()
> +                        return True
> +                    except (asyncio.TimeoutError, OSError):
> +                        return False
> +
> +                def extract_address_and_port(url):
> +                    parsed_url = urlparse(url)
> +                    address = parsed_url.hostname
> +                    port = parsed_url.port if parsed_url.port else
> get_default_port(parsed_url.scheme)
> +                    return address, port
> +
> +
> +                async def async_mirrors_filter(mirrors_list, data):
> +                    tasks = []
> +
> +                    for item in mirrors_list:
> +                        url = item.split()[1]
> +                        address, port = extract_address_and_port(url)
> +                        if address and port:
> +                            tasks.append(is_port_open(address, port))
> +
> +                    results = await asyncio.gather(*tasks)
> +
> +                    if False in results:
> +                        output_string = " ".join(
> +                            mirror
> +                            for mirror, result in zip(mirrors_list,
> results)
> +                            if result
> +                            )
> +                        if not output_string:
> +                            logger.warning("All SSTATE_MIRRORS are not
> available")
> +                        else:
> +                            logger.warning(f'Several SSTATE_MIRRORS are
> not available, using: {str(output_string)}!')
> +                        data.setVar('SSTATE_MIRRORS', output_string)
> +
> +                if mirrors:
> +                    asyncio.run(async_mirrors_filter(mirrors_list,
> self.data))
> +
> +            except Exception as e:
> +                logger.error(f"Error checking SSTATE_MIRRORS
> availability: {e}")
> +
>          # Handle obsolete variable names
>          d = self.data
>          renamedvars = d.getVarFlags('BB_RENAMED_VARIABLES') or {}
>
diff mbox series

Patch

diff --git a/lib/bb/cookerdata.py b/lib/bb/cookerdata.py
index 1f447d30c2..8b642bdd90 100644
--- a/lib/bb/cookerdata.py
+++ b/lib/bb/cookerdata.py
@@ -305,6 +305,73 @@  class CookerDataBuilder(object):
 
         bb.codeparser.update_module_dependencies(self.data)
 
+        if self.data.getVar('FILTER_SSTATE_MIRRORS') == '1':
+            try:
+                import asyncio
+                from urllib.parse import urlparse
+
+                mirrors = self.data.getVar('SSTATE_MIRRORS') or None
+                if mirrors:
+                    parts = mirrors.split()
+                    mirrors_list = [' '.join(parts[i:i+2]) for i in range(0, len(parts), 2)]
+
+                def get_default_port(scheme):
+                    if scheme == 'http':
+                        return 80
+                    elif scheme == 'https':
+                        return 443
+                    elif scheme == 'ftp':
+                        return 21
+                    elif scheme == 'ftps':
+                        return 990
+                    else:
+                        return 8888  # Default value for unknown schemes
+
+                async def is_port_open(address, port, timeout=5):
+                    try:
+                        reader, writer = await asyncio.wait_for(asyncio.open_connection(address, port), timeout)
+                        writer.close()
+                        await writer.wait_closed()
+                        return True
+                    except (asyncio.TimeoutError, OSError):
+                        return False
+
+                def extract_address_and_port(url):
+                    parsed_url = urlparse(url)
+                    address = parsed_url.hostname
+                    port = parsed_url.port if parsed_url.port else get_default_port(parsed_url.scheme)
+                    return address, port
+
+
+                async def async_mirrors_filter(mirrors_list, data):
+                    tasks = []
+
+                    for item in mirrors_list:
+                        url = item.split()[1]
+                        address, port = extract_address_and_port(url)
+                        if address and port:
+                            tasks.append(is_port_open(address, port))
+
+                    results = await asyncio.gather(*tasks)
+
+                    if False in results:
+                        output_string = " ".join(
+                            mirror
+                            for mirror, result in zip(mirrors_list, results)
+                            if result
+                            )
+                        if not output_string:
+                            logger.warning("All SSTATE_MIRRORS are not available")
+                        else:
+                            logger.warning(f'Several SSTATE_MIRRORS are not available, using: {str(output_string)}!')
+                        data.setVar('SSTATE_MIRRORS', output_string)
+
+                if mirrors:
+                    asyncio.run(async_mirrors_filter(mirrors_list, self.data))
+
+            except Exception as e:
+                logger.error(f"Error checking SSTATE_MIRRORS availability: {e}")
+
         # Handle obsolete variable names
         d = self.data
         renamedvars = d.getVarFlags('BB_RENAMED_VARIABLES') or {}