diff mbox series

insane: avoid race condition when DEBIAN/CONTROL entries are removed

Message ID 20241014183815.537177-1-ross.burton@arm.com
State New
Headers show
Series insane: avoid race condition when DEBIAN/CONTROL entries are removed | expand

Commit Message

Ross Burton Oct. 14, 2024, 6:38 p.m. UTC
There is a race condition when iterating directories which are being
altered whilst iterating, which is something that can and does happen
when do_package_qa runs at the same time as eg do_package_write_ipkg
(the opkg metadata is written inside the build tree). The race is that
naive code will list a directory contents and then stat() each name to
determine if its a directory or file.  The classic failure that we see
is that CONTROL/ is found on a listdir but deleted by the time the stat
happens, so is incorrectly listed as a file (because it is not a
directory).

Since Python 3.5, os.walk() uses scandir() instead of listdir() which
mitigates this race by returning the file type alongside the name, so
a stat is no longer needed to identify the type.

However, cachedpath.walk() was copied from Python before this, so it
uses listdir() and has this race condition. Since I changed insane to
use cachedpath.walk()[1] I inadvertently reintroduced this race.

I believe there's actually no need to use cachedpath.walk() and a
logical fix is to simply use os.walk():

With os.walk() each directory is listed and categorised in a single
os.scandir() as the underlying syscall, getdents64, returns the type.

However, cachedpath.walk() uses os.listdir() which ignores the type
field returned and has to do a stat() on every file to determine the
type.

Thus, we should switch users of cachedpath.walk() to os.walk(): there's
no real gain in what is effectively just a prefetch for the stat cache,
but depending on what the calling code does may result in more stat()
calls than needed.

In the future we may want to redesign cachedpath to reimplement walk so
that it can also cache the DirEntry instances as returned by scandir()
as that will avoid needing to call stat() at all in many cases. However
I believe we should instead use a caching pathlib.Path instance instead.

[1] cad3c8 insane: use oe.cachedpath.CachedPath instead of os.path

Signed-off-by: Ross Burton <ross.burton@arm.com>
---
 meta/classes-global/insane.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/meta/classes-global/insane.bbclass b/meta/classes-global/insane.bbclass
index 36b139a20d0..4344a52560d 100644
--- a/meta/classes-global/insane.bbclass
+++ b/meta/classes-global/insane.bbclass
@@ -1133,7 +1133,7 @@  python do_package_qa () {
     for pkg in packages:
         pkgdir = os.path.join(pkgdest, pkg)
         pkgfiles[pkg] = []
-        for walkroot, dirs, files in cpath.walk(pkgdir):
+        for walkroot, dirs, files in os.walk(pkgdir):
             # Don't walk into top-level CONTROL or DEBIAN directories as these
             # are temporary directories created by do_package.
             if walkroot == pkgdir: