Message ID | 20240307091925.2324637-1-changqing.li@windriver.com |
---|---|
State | New |
Headers | show |
Series | [V2] dnf: remove log_lock.pid before exit | expand |
On Thu, 7 Mar 2024 at 10:19, Changqing Li
<changqing.li@eng.windriver.com> wrote:
> +Upstream-Status: Inappropriate [oe specific workaround]
You *just* said you will send this upstream as requested. Why is it
suddenly Inappropriate?
Alex
On Thu, 7 Mar 2024 at 10:19, Changqing Li <changqing.li@eng.windriver.com> wrote: > ++ for arg in args: > ++ if arg.startswith("--installroot="): > ++ root=arg.split("=")[1] > ++ if os.path.exists(os.path.join(root, "log_lock.pid")): > ++ os.unlink(os.path.join(root, "log_lock.pid")) > ++ break Apologies, but after looking at this more carefully I have to say no once more. The obstacle is that the code checks for existence before removing the lock, which means the issue of why the lock is left in place to begin with is still unresolved. This doesn't always happen, and we need to get to the bottom of *why*. Alex
Hi Alex, You can see dnf's solution is: https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf I don't think dnf community will look into this issue. And I would expect it to be a complicated one. Because dnf's own solution looks like more of a workaround. At the same time, yocto based systems will sometimes have log_lock.pid left in target filesystems. Users typing 'ls /' will notice it. We have an OE specific patch: https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66 I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-root.patch, does solve some issue. Unfortunately, at the same time, it introduces this specific issue, a easy-to-notice one. My suggestion is that we merge this fix into 0001-dnf-write-the-log-lock-to-root.patch, because they really belong together. I guess we can't expect dnf community to devote time into this, as they've already got a solution. And I'm not sure if anyone in OE community is familiar with dnf enough to solve this issue. So let's make things a little better, avoiding users of Yocto systems see this log_lock.pid file when they type 'ls /'. If you have any other suggestion, please let us know. Regards, Qi -----Original Message----- From: openembedded-core@lists.openembedded.org <openembedded-core@lists.openembedded.org> On Behalf Of Alexander Kanavin via lists.openembedded.org Sent: Thursday, March 7, 2024 5:57 PM To: Li, Changqing <Changqing.Li@windriver.com>; Richard Purdie <richard.purdie@linuxfoundation.org> Cc: openembedded-core@lists.openembedded.org Subject: Re: [OE-core] [PATCH V2] dnf: remove log_lock.pid before exit On Thu, 7 Mar 2024 at 10:19, Changqing Li <changqing.li@eng.windriver.com> wrote: > ++ for arg in args: > ++ if arg.startswith("--installroot="): > ++ root=arg.split("=")[1] > ++ if os.path.exists(os.path.join(root, "log_lock.pid")): > ++ os.unlink(os.path.join(root, "log_lock.pid")) > ++ break Apologies, but after looking at this more carefully I have to say no once more. The obstacle is that the code checks for existence before removing the lock, which means the issue of why the lock is left in place to begin with is still unresolved. This doesn't always happen, and we need to get to the bottom of *why*. Alex
On Thu, 7 Mar 2024 at 11:21, Chen Qi via lists.openembedded.org <Qi.Chen=windriver.com@lists.openembedded.org> wrote: > You can see dnf's solution is: https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf > > I don't think dnf community will look into this issue. And I would expect it to be a complicated one. Because dnf's own solution looks like more of a workaround. At the same time, yocto based systems will sometimes have log_lock.pid left in target filesystems. Users typing 'ls /' will notice it. > > We have an OE specific patch: https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66 > I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-root.patch, does solve some issue. Unfortunately, at the same time, it introduces this specific issue, a easy-to-notice one. > > My suggestion is that we merge this fix into 0001-dnf-write-the-log-lock-to-root.patch, because they really belong together. I guess we can't expect dnf community to devote time into this, as they've already got a solution. And I'm not sure if anyone in OE community is familiar with dnf enough to solve this issue. So let's make things a little better, avoiding users of Yocto systems see this log_lock.pid file when they type 'ls /'. > > If you have any other suggestion, please let us know. We need to find out why it happens, and why it happens only sometimes. If lock file does not get properly deleted that is quite possibly masking a different issue which needs fixing, and the proposed patch just sweeps it all under the carpet in the name of giving users an aesthetically pleasing rootfs. So the answer is still no. If you are having difficulty debugging the situation where the lock file does get left behind, can you provide a way for me to reproduce the issue? I just build core-image-minimal, and I'm not seeing it: alex@Zen2:/srv/storage/alex/yocto/build-64-alt$ ls tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0/rootfs/ bin boot dev etc home lib media mnt proc root run sbin srv sys tmp usr var If you can't provide that, then you can certainly still do more on your side: build core-image-minimal with plain poky, ensure the lock file gets deleted, then build your own private things where the file does not get deleted, then investigate why the code path that deletes the lock file isn't being taken in the latter case. You can see the code where the lock log (self.rotate_lock) is used in logging.py: def emit(self, record): while True: try: if self.shouldRollover(record): with self.rotate_lock: # Do rollover while preserving the mode of the new log file mode = os.stat(self.baseFilename).st_mode self.doRollover() os.chmod(self.baseFilename, mode) logging.FileHandler.emit(self, record) return except (dnf.exceptions.ProcessLockError, dnf.exceptions.ThreadLockError): time.sleep(0.01) except Exception: self.handleError(record) return And the deletion happens in lock.py: def __exit__(self, *exc_args): if self.count == 1: os.unlink(self.target) self._unlock_thread() Alex
> -----Original Message----- > From: openembedded-core@lists.openembedded.org <openembedded- > core@lists.openembedded.org> On Behalf Of Alexander Kanavin > Sent: den 7 mars 2024 12:42 > To: Qi.Chen@windriver.com > Cc: Li, Changqing <Changqing.Li@windriver.com>; Richard Purdie > <richard.purdie@linuxfoundation.org>; openembedded- > core@lists.openembedded.org > Subject: Re: [OE-core] [PATCH V2] dnf: remove log_lock.pid before exit > > On Thu, 7 Mar 2024 at 11:21, Chen Qi via lists.openembedded.org > <Qi.Chen=windriver.com@lists.openembedded.org> wrote: > > You can see dnf's solution is: https://github.com/rpm-software- > management/dnf/blob/master/etc/tmpfiles.d/dnf.conf > > > > I don't think dnf community will look into this issue. And I would > expect it to be a complicated one. Because dnf's own solution looks like > more of a workaround. At the same time, yocto based systems will sometimes > have log_lock.pid left in target filesystems. Users typing 'ls /' will > notice it. > > > > We have an OE specific patch: https://git.openembedded.org/openembedded- > core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66 > > I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to- > root.patch, does solve some issue. Unfortunately, at the same time, it > introduces this specific issue, a easy-to-notice one. > > > > My suggestion is that we merge this fix into 0001-dnf-write-the-log- > lock-to-root.patch, because they really belong together. I guess we can't > expect dnf community to devote time into this, as they've already got a > solution. And I'm not sure if anyone in OE community is familiar with dnf > enough to solve this issue. So let's make things a little better, avoiding > users of Yocto systems see this log_lock.pid file when they type 'ls /'. > > > > If you have any other suggestion, please let us know. > > We need to find out why it happens, and why it happens only sometimes. > If lock file does not get properly deleted that is quite possibly > masking a different issue which needs fixing, and the proposed patch > just sweeps it all under the carpet in the name of giving users an > aesthetically pleasing rootfs. So the answer is still no. > > If you are having difficulty debugging the situation where the lock > file does get left behind, can you provide a way for me to reproduce > the issue? I just build core-image-minimal, and I'm not seeing it: > > alex@Zen2:/srv/storage/alex/yocto/build-64-alt$ ls > tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0/rootfs/ > bin boot dev etc home lib media mnt proc root run sbin srv > sys tmp usr var > > If you can't provide that, then you can certainly still do more on > your side: build core-image-minimal with plain poky, ensure the lock > file gets deleted, then build your own private things where the file > does not get deleted, then investigate why the code path that deletes > the lock file isn't being taken in the latter case. You can see the > code where the lock log (self.rotate_lock) is used in logging.py: > > def emit(self, record): > while True: > try: > if self.shouldRollover(record): > with self.rotate_lock: > # Do rollover while preserving the mode of the new log file > mode = os.stat(self.baseFilename).st_mode > self.doRollover() > os.chmod(self.baseFilename, mode) > logging.FileHandler.emit(self, record) > return > except (dnf.exceptions.ProcessLockError, dnf.exceptions.ThreadLockError): > time.sleep(0.01) > except Exception: > self.handleError(record) > return > > And the deletion happens in lock.py: > > def __exit__(self, *exc_args): > if self.count == 1: > os.unlink(self.target) > self._unlock_thread() > > > > Alex I did try to find a solution to this problem quite a while back, but unfortunately never managed to get to the bottom of it. What I did figure out was that the problem is related to whether dnf thinks it is running as root or not as that results in different paths being used. I.e., it may create the file in one place and then try to delete it in another. What I never could figure out is how/where dnf switches between root/not root as it seems to happen mid-run. //Peter
On 3/7/24 19:42, Alexander Kanavin wrote: > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > On Thu, 7 Mar 2024 at 11:21, Chen Qi via lists.openembedded.org > <Qi.Chen=windriver.com@lists.openembedded.org> wrote: >> You can see dnf's solution is:https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf >> >> I don't think dnf community will look into this issue. And I would expect it to be a complicated one. Because dnf's own solution looks like more of a workaround. At the same time, yocto based systems will sometimes have log_lock.pid left in target filesystems. Users typing 'ls /' will notice it. >> >> We have an OE specific patch:https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66 >> I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-root.patch, does solve some issue. Unfortunately, at the same time, it introduces this specific issue, a easy-to-notice one. >> >> My suggestion is that we merge this fix into 0001-dnf-write-the-log-lock-to-root.patch, because they really belong together. I guess we can't expect dnf community to devote time into this, as they've already got a solution. And I'm not sure if anyone in OE community is familiar with dnf enough to solve this issue. So let's make things a little better, avoiding users of Yocto systems see this log_lock.pid file when they type 'ls /'. >> >> If you have any other suggestion, please let us know. > We need to find out why it happens, and why it happens only sometimes. > If lock file does not get properly deleted that is quite possibly > masking a different issue which needs fixing, and the proposed patch > just sweeps it all under the carpet in the name of giving users an > aesthetically pleasing rootfs. So the answer is still no. > > If you are having difficulty debugging the situation where the lock > file does get left behind, can you provide a way for me to reproduce > the issue? I just build core-image-minimal, and I'm not seeing it: > > alex@Zen2:/srv/storage/alex/yocto/build-64-alt$ ls > tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0/rootfs/ > bin boot dev etc home lib media mnt proc root run sbin srv > sys tmp usr var This issue only happened when dnf log rotate is happened. maybe install "a lot" packages can trigger reproduce this issue more quickly. > > If you can't provide that, then you can certainly still do more on > your side: build core-image-minimal with plain poky, ensure the lock > file gets deleted, then build your own private things where the file > does not get deleted, then investigate why the code path that deletes > the lock file isn't being taken in the latter case. You can see the > code where the lock log (self.rotate_lock) is used in logging.py: > > def emit(self, record): > while True: > try: > if self.shouldRollover(record): > with self.rotate_lock: > # Do rollover while preserving the mode of the > new log file > mode = os.stat(self.baseFilename).st_mode > self.doRollover() > os.chmod(self.baseFilename, mode) > logging.FileHandler.emit(self, record) > return > except (dnf.exceptions.ProcessLockError, > dnf.exceptions.ThreadLockError): > time.sleep(0.01) > except Exception: > self.handleError(record) > return > > And the deletion happens in lock.py: > > def __exit__(self, *exc_args): > if self.count == 1: > os.unlink(self.target) > self._unlock_thread() Hi, Thanks for trying to help. I had try to find the root cause when I opened the upstream issue, but unfortunately, not got the root cause, I thought it should be a complicated one. After do more investigating recently, the root cause should be found, and I have send pull request [1] to upsteam dnf , will backport the patch if upstream accept the fix. [1] https://github.com/rpm-software-management/dnf/pull/2065 //Changqing > > > > Alex > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#196785):https://lists.openembedded.org/g/openembedded-core/message/196785 > Mute This Topic:https://lists.openembedded.org/mt/104784184/3616873 > Group Owner:openembedded-core+owner@lists.openembedded.org > Unsubscribe:https://lists.openembedded.org/g/openembedded-core/unsub [changqing.li@eng.windriver.com] > -=-=-=-=-=-=-=-=-=-=-=- >
On Fri, 15 Mar 2024 at 02:43, Changqing Li <changqing.li@windriver.com> wrote: > This issue only happened when dnf log rotate is happened. maybe install > "a lot" packages can trigger reproduce this issue more quickly. Maybe logrotate threshold can as well be adjusted to happen on small, easily reproducible installations? > Thanks for trying to help. I had try to find the root cause when I opened the upstream issue, but unfortunately, not got the root cause, I thought > > it should be a complicated one. After do more investigating recently, the root cause should be found, and I have send pull request [1] to upsteam dnf , > > will backport the patch if upstream accept the fix. > > [1] https://github.com/rpm-software-management/dnf/pull/2065 Thanks, that looks roughly correct to me, let's see what upstream says. Alex
diff --git a/meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch b/meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch new file mode 100644 index 0000000000..2dd4969d57 --- /dev/null +++ b/meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch @@ -0,0 +1,41 @@ +From bdf17281385cf33ad59267fe75a9e427e6e2ffc4 Mon Sep 17 00:00:00 2001 +From: Changqing Li <changqing.li@windriver.com> +Date: Thu, 7 Mar 2024 14:03:07 +0800 +Subject: [PATCH] main.py: remove log_lock.pid before exit + +dnf has a bug, refer [1], it causes that log_lock.pid may not removed +after dnf exit. And for native dnf, since we change the lock file path +to /, it will never be removed, and make the rootfs not clean,refer +[2][3]. This patch is a workaround to fix above issue. + +[1] https://github.com/rpm-software-management/dnf/issues/1963 +[2] https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66 +[3] https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf + +Upstream-Status: Inappropriate [oe specific workaround] + +Signed-off-by: Changqing Li <changqing.li@windriver.com> +--- + dnf/cli/main.py | 6 ++++++ + 1 file changed, 6 insertions(+) + +diff --git a/dnf/cli/main.py b/dnf/cli/main.py +index 2a7f92d54..2f672cff2 100644 +--- a/dnf/cli/main.py ++++ b/dnf/cli/main.py +@@ -200,6 +200,12 @@ def user_main(args, exit_code=False): + + errcode = main(args) + if exit_code: ++ for arg in args: ++ if arg.startswith("--installroot="): ++ root=arg.split("=")[1] ++ if os.path.exists(os.path.join(root, "log_lock.pid")): ++ os.unlink(os.path.join(root, "log_lock.pid")) ++ break + sys.exit(errcode) + return errcode + +-- +2.25.1 + diff --git a/meta/recipes-devtools/dnf/dnf_4.18.2.bb b/meta/recipes-devtools/dnf/dnf_4.18.2.bb index dc0c18be5e..7770c9d7e0 100644 --- a/meta/recipes-devtools/dnf/dnf_4.18.2.bb +++ b/meta/recipes-devtools/dnf/dnf_4.18.2.bb @@ -17,7 +17,8 @@ SRC_URI = "git://github.com/rpm-software-management/dnf.git;branch=master;protoc file://0001-set-python-path-for-completion_helper.patch \ " -SRC_URI:append:class-native = "file://0001-dnf-write-the-log-lock-to-root.patch" +SRC_URI:append:class-native = "file://0001-dnf-write-the-log-lock-to-root.patch \ + file://0001-main.py-remove-log_lock.pid-before-exit.patch" SRCREV = "1c43d0999178d492381ad0b43917ffd9c74016f8" UPSTREAM_CHECK_GITTAGREGEX = "(?P<pver>\d+(\.\d+)+)"