Message ID | 20240423182729.226345-1-JPEWhacker@gmail.com |
---|---|
State | New |
Headers | show |
Series | [bitbake-devel,RFC,v2] prserv: Enable database sharing | expand |
Hi Joshua On 4/23/24 at 20:27, Joshua Watt wrote: > sqlite3 can allow multiple processes to access the database > simultaneously, but it must be opened correctly. The key change is that > the database is no longer opened in "exclusive" mode (defaulting to > shared mode). In addition, the journal is set to "WAL" mode, as this is > the most efficient for dealing with simultaneous access between > different processes. In order to keep the database performance, > synchronous mode is set to "off". The WAL journal will protect against > incomplete transactions in any given client, however the database will > not be protected against unexpected power loss from the OS (which is a > fine trade off for performance, and also the same as the previous > implementation). Thanks, I also made experiments with the "WAL" journal mode, but didn't go on as it didn't solve my empty database issues. However, I agree it seems like the way to go! > > Finally, the _execute() API now uses a database cursor. The cursor > automatically makes sure that the query happens in an atomic transaction > and commits when finished. > > NOTE: THIS MAY BE INCOMPLETE; All APIs need to be evaluated to see if > the transaction (cursor) needs to cover mode than one "execute" > statement. > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com> > --- > bitbake/lib/prserv/db.py | 19 ++++++++----------- > 1 file changed, 8 insertions(+), 11 deletions(-) Which branch did you use to generate your patch? I can't manage to apply it to either "master", "master-next" or the branch I shared recently... Ah, just found it. You used "nanbield". I'll test it in my branch and "torture" it with regular tests and my new bitbake selftests. Thanks, Michael.
Hi Joshua On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote: > Which branch did you use to generate your patch? I can't manage to > apply it to either "master", "master-next" or the branch I shared > recently... > > Ah, just found it. You used "nanbield". > > I'll test it in my branch and "torture" it with regular tests and my > new bitbake selftests. Oops, the tests are failing: ====================================================================== ERROR: test_0b_db (prserv.tests.FunctionTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line 114, in test_0b_db self.assertTrue(self.table.test_package(version, pkgarch)) File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76, in test_package row=data.fetchone() sqlite3.ProgrammingError: Cannot operate on a closed cursor. Let me send you my patch series with your change included. I'll send them through a private e-mail. Don't want to spam the list at this stage. Thanks Michael.
On Wed, Apr 24, 2024 at 3:20 AM Michael Opdenacker <michael.opdenacker@bootlin.com> wrote: > > Hi Joshua > > On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote: > > Which branch did you use to generate your patch? I can't manage to > > apply it to either "master", "master-next" or the branch I shared > > recently... > > > > Ah, just found it. You used "nanbield". > > > > I'll test it in my branch and "torture" it with regular tests and my > > new bitbake selftests. > > Oops, the tests are failing: > > ====================================================================== > ERROR: test_0b_db (prserv.tests.FunctionTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line > 114, in test_0b_db > self.assertTrue(self.table.test_package(version, pkgarch)) > File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76, > in test_package > row=data.fetchone() > sqlite3.ProgrammingError: Cannot operate on a closed cursor. Ah right. Ok so instead of doing the cursor operation in the _exectue(), you'll need to do it in each API call. The general strategy is to do: def _importHist(): if self.read_only: return None with closing(self.conn.cursor()) as cursor: # Use cursor.execute() for all statements, e.g. row = cursor.execute("SELECT value ....") data = row.fetchone() You can use the same cursor for multiple queries, so you only need the one for each API call (e.g. you don't need to make a new one for each execute statement). With this strategy I don't think you'll ever get DB locking errors, so you should be able to eliminate the _execute() wrapper Just make sure you _always_ use a cursor to execute any database queries. It will ruin your day if you combine self.conn.execute() with cursor queries :) The cursor makes your database modifications atomic, that is any changes you make to the database aren't applied until the cursor is closed, and then they are applied atomically so either all the changes you made are commited, or none of them. Keep in mind that this does _not_ mean that your database reads are also atomic with your writes; if another process commits to the database between your read and write (or even between two reads) it's possible you will be dealing with a stale understanding of the database when you commit. Sorry about the nanbield branch, I wasn't paying attention > > Let me send you my patch series with your change included. I'll send > them through a private e-mail. Don't want to spam the list at this stage. > Thanks > Michael. > > -- > Michael Opdenacker, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com >
Hi Joshua Many thanks for the tips! On 4/24/24 at 16:23, Joshua Watt wrote: > On Wed, Apr 24, 2024 at 3:20 AM Michael Opdenacker > <michael.opdenacker@bootlin.com> wrote: >> Hi Joshua >> >> On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote: >>> Which branch did you use to generate your patch? I can't manage to >>> apply it to either "master", "master-next" or the branch I shared >>> recently... >>> >>> Ah, just found it. You used "nanbield". >>> >>> I'll test it in my branch and "torture" it with regular tests and my >>> new bitbake selftests. >> Oops, the tests are failing: >> >> ====================================================================== >> ERROR: test_0b_db (prserv.tests.FunctionTests) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line >> 114, in test_0b_db >> self.assertTrue(self.table.test_package(version, pkgarch)) >> File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76, >> in test_package >> row=data.fetchone() >> sqlite3.ProgrammingError: Cannot operate on a closed cursor. > Ah right. Ok so instead of doing the cursor operation in the > _exectue(), you'll need to do it in each API call. The general > strategy is to do: > > def _importHist(): > if self.read_only: > return None > with closing(self.conn.cursor()) as cursor: > # Use cursor.execute() for all statements, e.g. > row = cursor.execute("SELECT value ....") > data = row.fetchone() > > You can use the same cursor for multiple queries, so you only need the > one for each API call (e.g. you don't need to make a new one for each > execute statement). > > With this strategy I don't think you'll ever get DB locking errors, so > you should be able to eliminate the _execute() wrapper > > Just make sure you _always_ use a cursor to execute any database > queries. It will ruin your day if you combine self.conn.execute() with > cursor queries :) > > The cursor makes your database modifications atomic, that is any > changes you make to the database aren't applied until the cursor is > closed, and then they are applied atomically so either all the changes > you made are commited, or none of them. Keep in mind that this does > _not_ mean that your database reads are also atomic with your writes; > if another process commits to the database between your read and write > (or even between two reads) it's possible you will be dealing with a > stale understanding of the database when you commit. > > Sorry about the nanbield branch, I wasn't paying attention What you suggest seems to work fine. By the way, I still had empty databases after leaving the tests, but adding: self.conn.commit() after a group or INSERT or UPDATE commands did the trick. What's strange is that the hashserver code does without doing this. Anyway, let me test that two servers can access the same database now. If everything works, I'll be able to send a V5 soon! Cheers Michael.
On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote: > By the way, I still had empty databases after leaving the tests, but > adding: > self.conn.commit() > after a group or INSERT or UPDATE commands did the trick. For the record, my database was created with the below settings: self.connection.execute("PRAGMA synchronous = NORMAL;") self.connection.execute("PRAGMA journal_mode = WAL;") Cheers Michael.
On 4/26/24 at 14:56, Michael Opdenacker via lists.openembedded.org wrote: > > On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote: >> By the way, I still had empty databases after leaving the tests, but >> adding: >> self.conn.commit() >> after a group or INSERT or UPDATE commands did the trick. > > > For the record, my database was created with the below settings: > > self.connection.execute("PRAGMA synchronous = NORMAL;") > self.connection.execute("PRAGMA journal_mode = WAL;") Oh, forget it, it's gone again (databases being empty again). Maybe, I looked too quickly. I'll keep investigating and will keep you posted. Michael.
On Fri, Apr 26, 2024 at 7:50 AM Michael Opdenacker <michael.opdenacker@bootlin.com> wrote: > > > On 4/26/24 at 14:56, Michael Opdenacker via lists.openembedded.org wrote: > > > > On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote: > >> By the way, I still had empty databases after leaving the tests, but > >> adding: > >> self.conn.commit() > >> after a group or INSERT or UPDATE commands did the trick. > > > > > > For the record, my database was created with the below settings: > > > > self.connection.execute("PRAGMA synchronous = NORMAL;") > > self.connection.execute("PRAGMA journal_mode = WAL;") > > Oh, forget it, it's gone again (databases being empty again). Maybe, I > looked too quickly. > I'll keep investigating and will keep you posted. I'm thinking something else is wrong then (maybe in the test setup?). The original code as I read it would probably have had some sort of database error due to the exclusive lock instead of just being empty, and I can't think of any case in the new code where the database would just appear empty if both servers are actually pointing to the same database > Michael. > > -- > Michael Opdenacker, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com >
On Fri, Apr 26, 2024 at 6:53 AM Michael Opdenacker <michael.opdenacker@bootlin.com> wrote: > > Hi Joshua > > Many thanks for the tips! > > On 4/24/24 at 16:23, Joshua Watt wrote: > > On Wed, Apr 24, 2024 at 3:20 AM Michael Opdenacker > > <michael.opdenacker@bootlin.com> wrote: > >> Hi Joshua > >> > >> On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote: > >>> Which branch did you use to generate your patch? I can't manage to > >>> apply it to either "master", "master-next" or the branch I shared > >>> recently... > >>> > >>> Ah, just found it. You used "nanbield". > >>> > >>> I'll test it in my branch and "torture" it with regular tests and my > >>> new bitbake selftests. > >> Oops, the tests are failing: > >> > >> ====================================================================== > >> ERROR: test_0b_db (prserv.tests.FunctionTests) > >> ---------------------------------------------------------------------- > >> Traceback (most recent call last): > >> File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line > >> 114, in test_0b_db > >> self.assertTrue(self.table.test_package(version, pkgarch)) > >> File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76, > >> in test_package > >> row=data.fetchone() > >> sqlite3.ProgrammingError: Cannot operate on a closed cursor. > > Ah right. Ok so instead of doing the cursor operation in the > > _exectue(), you'll need to do it in each API call. The general > > strategy is to do: > > > > def _importHist(): > > if self.read_only: > > return None > > with closing(self.conn.cursor()) as cursor: > > # Use cursor.execute() for all statements, e.g. > > row = cursor.execute("SELECT value ....") > > data = row.fetchone() > > > > You can use the same cursor for multiple queries, so you only need the > > one for each API call (e.g. you don't need to make a new one for each > > execute statement). > > > > With this strategy I don't think you'll ever get DB locking errors, so > > you should be able to eliminate the _execute() wrapper > > > > Just make sure you _always_ use a cursor to execute any database > > queries. It will ruin your day if you combine self.conn.execute() with > > cursor queries :) > > > > The cursor makes your database modifications atomic, that is any > > changes you make to the database aren't applied until the cursor is > > closed, and then they are applied atomically so either all the changes > > you made are commited, or none of them. Keep in mind that this does > > _not_ mean that your database reads are also atomic with your writes; > > if another process commits to the database between your read and write > > (or even between two reads) it's possible you will be dealing with a > > stale understanding of the database when you commit. > > > > Sorry about the nanbield branch, I wasn't paying attention > > What you suggest seems to work fine. > > By the way, I still had empty databases after leaving the tests, but adding: > self.conn.commit() > after a group or INSERT or UPDATE commands did the trick. > > What's strange is that the hashserver code does without doing this. It has commit() calls also, look in sqlite.py. I frequently forget that the cursor doesn't implicitly commit when closed (unlike SQL Alchemy, which _does_ do that) :) > > Anyway, let me test that two servers can access the same database now. > If everything works, I'll be able to send a V5 soon! > > Cheers > Michael. > > -- > Michael Opdenacker, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com >
On Fri, Apr 26, 2024 at 6:56 AM Michael Opdenacker <michael.opdenacker@bootlin.com> wrote: > > > On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote: > > By the way, I still had empty databases after leaving the tests, but > > adding: > > self.conn.commit() > > after a group or INSERT or UPDATE commands did the trick. > > > For the record, my database was created with the below settings: > > self.connection.execute("PRAGMA synchronous = NORMAL;") synchronous only affects when sqlite makes the sync() syscall, and therefore should have no impact on multiple processes using the database. Setting it NORMAL will cause some sync() calls to occur such that if power loss occurs, there is no (or very minimal) chance of database corruption (but perhaps lost data). Since sync() generally has a higher than normal impact on Yocto builds, the hash server defaults to OFF, so that there are no sync() calls. The data is correct and fine, as long as no power loss occurs and can be shared between multiple processes and doesn't affect build performance. > self.connection.execute("PRAGMA journal_mode = WAL;") > > Cheers > Michael. > > -- > Michael Opdenacker, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com >
Hi Joshua, On 4/26/24 at 19:55, Joshua Watt wrote: > It has commit() calls also, look in sqlite.py. I frequently forget > that the cursor doesn't implicitly commit when closed (unlike SQL > Alchemy, which _does_ do that) :) Correct, I must have checked the wrong file. Actually, my code should have all the commit() calls it needs. I put one every time a change is made to the database. However, I still have this very weird issue: * NOK: in bitbake-selftests, the tests that use the PR server run successfully (except test_3b_readonly, because of the issue), but the PR table is empty in the database file (if opened through the sqlite3 command line). It seems that the table is only updated in memory. It must be so, otherwise the tests wouldn't pass. * OK: the only bitbake-selftests that use the DB functions directly don't have this issue * OK: creating an image with BitBake does create a non empty table in the PR database file (cache/prserv.sqlite3) Would anyone have any clue why? I've just submitted the latest version of my code if you want to have a look or test by yourself: https://lore.kernel.org/bitbake-devel/20240429201418.657042-1-michael.opdenacker@bootlin.com/T/#t You can run the bitbake selftests as follows: bitbake-selftest prserv.tests Thanks in advance Cheers Michael.
On Mon, Apr 29, 2024 at 2:27 PM Michael Opdenacker <michael.opdenacker@bootlin.com> wrote: > > Hi Joshua, > > On 4/26/24 at 19:55, Joshua Watt wrote: > > It has commit() calls also, look in sqlite.py. I frequently forget > > that the cursor doesn't implicitly commit when closed (unlike SQL > > Alchemy, which _does_ do that) :) > > Correct, I must have checked the wrong file. > > Actually, my code should have all the commit() calls it needs. I put one > every time a change is made to the database. > > However, I still have this very weird issue: > > * NOK: in bitbake-selftests, the tests that use the PR server run > successfully (except test_3b_readonly, because of the issue), but > the PR table is empty in the database file (if opened through the > sqlite3 command line). It seems that the table is only updated in > memory. It must be so, otherwise the tests wouldn't pass. > * OK: the only bitbake-selftests that use the DB functions directly > don't have this issue > * OK: creating an image with BitBake does create a non empty table in > the PR database file (cache/prserv.sqlite3) > > Would anyone have any clue why? I think it might actually be returning the correct value ("0"). Remember that each test case starts with a blank database, and I don't see any code in that populates anything into the database in that test.... so "0" seems like it would be the correct PR value? > > I've just submitted the latest version of my code if you want to have a > look or test by yourself: > https://lore.kernel.org/bitbake-devel/20240429201418.657042-1-michael.opdenacker@bootlin.com/T/#t > > You can run the bitbake selftests as follows: > bitbake-selftest prserv.tests > > Thanks in advance > Cheers > Michael. > > -- > Michael Opdenacker, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com >
Hi Joshua, On 4/29/24 at 23:03, Joshua Watt wrote: > On Mon, Apr 29, 2024 at 2:27 PM Michael Opdenacker > <michael.opdenacker@bootlin.com> wrote: >> Hi Joshua, >> >> On 4/26/24 at 19:55, Joshua Watt wrote: >>> It has commit() calls also, look in sqlite.py. I frequently forget >>> that the cursor doesn't implicitly commit when closed (unlike SQL >>> Alchemy, which _does_ do that) :) >> Correct, I must have checked the wrong file. >> >> Actually, my code should have all the commit() calls it needs. I put one >> every time a change is made to the database. >> >> However, I still have this very weird issue: >> >> * NOK: in bitbake-selftests, the tests that use the PR server run >> successfully (except test_3b_readonly, because of the issue), but >> the PR table is empty in the database file (if opened through the >> sqlite3 command line). It seems that the table is only updated in >> memory. It must be so, otherwise the tests wouldn't pass. >> * OK: the only bitbake-selftests that use the DB functions directly >> don't have this issue >> * OK: creating an image with BitBake does create a non empty table in >> the PR database file (cache/prserv.sqlite3) >> >> Would anyone have any clue why? > I think it might actually be returning the correct value ("0"). > Remember that each test case starts with a blank database, and I don't > see any code in that populates anything into the database in that > test.... so "0" seems like it would be the correct PR value? Wow, your remark made me understand that the setUp() function is run before *every test*, and not only once for each class of tests! If I merge my read-only tests with the previous ones, making only one test, they now work as expected (except for the very last one, maybe a bug in my code). Many thanks! You saved my day :) Cheers Michael.
diff --git a/bitbake/lib/prserv/db.py b/bitbake/lib/prserv/db.py index b4bda7078cd..8ee8d458194 100644 --- a/bitbake/lib/prserv/db.py +++ b/bitbake/lib/prserv/db.py @@ -9,6 +9,7 @@ import os.path import errno import prserv import time +from contextlib import closing try: import sqlite3 @@ -62,21 +63,18 @@ class PRTable(object): end = start + 20 while True: try: - return self.conn.execute(*query) + with closing(self.conn.cursor()) as cursor: + return cursor.execute(*query) except sqlite3.OperationalError as exc: if 'is locked' in str(exc) and end > time.time(): continue raise exc def sync(self): - if not self.read_only: - self.conn.commit() - self._execute("BEGIN EXCLUSIVE TRANSACTION") + pass def sync_if_dirty(self): - if self.dirty: - self.sync() - self.dirty = False + pass def _getValueHist(self, version, pkgarch, checksum): data=self._execute("SELECT value FROM %s WHERE version=? AND pkgarch=? AND checksum=?;" % self.table, @@ -292,11 +290,10 @@ class PRData(object): raise e uri = "file:%s%s" % (self.filename, "?mode=ro" if self.read_only else "") logger.debug("Opening PRServ database '%s'" % (uri)) - self.connection=sqlite3.connect(uri, uri=True, isolation_level="EXCLUSIVE", check_same_thread = False) + self.connection=sqlite3.connect(uri, uri=True) self.connection.row_factory=sqlite3.Row - if not self.read_only: - self.connection.execute("pragma synchronous = off;") - self.connection.execute("PRAGMA journal_mode = MEMORY;") + self.connection.execute("pragma synchronous = off;") + self.connection.execute("PRAGMA journal_mode = WAL;") self._tables={} def disconnect(self):
sqlite3 can allow multiple processes to access the database simultaneously, but it must be opened correctly. The key change is that the database is no longer opened in "exclusive" mode (defaulting to shared mode). In addition, the journal is set to "WAL" mode, as this is the most efficient for dealing with simultaneous access between different processes. In order to keep the database performance, synchronous mode is set to "off". The WAL journal will protect against incomplete transactions in any given client, however the database will not be protected against unexpected power loss from the OS (which is a fine trade off for performance, and also the same as the previous implementation). Finally, the _execute() API now uses a database cursor. The cursor automatically makes sure that the query happens in an atomic transaction and commits when finished. NOTE: THIS MAY BE INCOMPLETE; All APIs need to be evaluated to see if the transaction (cursor) needs to cover mode than one "execute" statement. Signed-off-by: Joshua Watt <JPEWhacker@gmail.com> --- bitbake/lib/prserv/db.py | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)