[bitbake-devel,RFC,v2] prserv: Enable database sharing

Message ID	20240423182729.226345-1-JPEWhacker@gmail.com
State	New
Headers	show Return-Path: <JPEWhacker@gmail.com> ip: 209.85.166.48, mailfrom: jpewhacker@gmail.com) From: Joshua Watt <jpewhacker@gmail.com> To: bitbake-devel@lists.openembedded.org Cc: michael.opdenacker@bootlin.com, Joshua Watt <JPEWhacker@gmail.com> Subject: [bitbake-devel][RFC v2] prserv: Enable database sharing Date: Tue, 23 Apr 2024 12:27:29 -0600 Message-Id: <20240423182729.226345-1-JPEWhacker@gmail.com> In-Reply-To: <20240423182543.226113-1-JPEWhacker@gmail.com> References: <20240423182543.226113-1-JPEWhacker@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[bitbake-devel,RFC,v2] prserv: Enable database sharing \| expand [bitbake-devel,RFC,v2] prserv: Enable database sharing

Joshua Watt April 23, 2024, 6:27 p.m. UTC

sqlite3 can allow multiple processes to access the database
simultaneously, but it must be opened correctly. The key change is that
the database is no longer opened in "exclusive" mode (defaulting to
shared mode). In addition, the journal is set to "WAL" mode, as this is
the most efficient for dealing with simultaneous access between
different processes. In order to keep the database performance,
synchronous mode is set to "off". The WAL journal will protect against
incomplete transactions in any given client, however the database will
not be protected against unexpected power loss from the OS (which is a
fine trade off for performance, and also the same as the previous
implementation).

Finally, the _execute() API now uses a database cursor. The cursor
automatically makes sure that the query happens in an atomic transaction
and commits when finished.

NOTE: THIS MAY BE INCOMPLETE; All APIs need to be evaluated to see if
the transaction (cursor) needs to cover mode than one "execute"
statement.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/prserv/db.py | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

Michael Opdenacker April 24, 2024, 8:15 a.m. UTC | #1

Hi Joshua

On 4/23/24 at 20:27, Joshua Watt wrote:
> sqlite3 can allow multiple processes to access the database
> simultaneously, but it must be opened correctly. The key change is that
> the database is no longer opened in "exclusive" mode (defaulting to
> shared mode). In addition, the journal is set to "WAL" mode, as this is
> the most efficient for dealing with simultaneous access between
> different processes. In order to keep the database performance,
> synchronous mode is set to "off". The WAL journal will protect against
> incomplete transactions in any given client, however the database will
> not be protected against unexpected power loss from the OS (which is a
> fine trade off for performance, and also the same as the previous
> implementation).

Thanks, I also made experiments with the "WAL" journal mode, but didn't 
go on as it didn't solve my empty database issues. However, I agree it 
seems like the way to go!

>
> Finally, the _execute() API now uses a database cursor. The cursor
> automatically makes sure that the query happens in an atomic transaction
> and commits when finished.
>
> NOTE: THIS MAY BE INCOMPLETE; All APIs need to be evaluated to see if
> the transaction (cursor) needs to cover mode than one "execute"
> statement.
>
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>   bitbake/lib/prserv/db.py | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)

Which branch did you use to generate your patch? I can't manage to apply 
it to either "master", "master-next" or the branch I shared recently...

Ah, just found it. You used "nanbield".

I'll test it in my branch and "torture" it with regular tests and my new 
bitbake selftests.
Thanks,
Michael.

Michael Opdenacker April 24, 2024, 9:20 a.m. UTC | #2

Hi Joshua

On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote:
> Which branch did you use to generate your patch? I can't manage to 
> apply it to either "master", "master-next" or the branch I shared 
> recently...
>
> Ah, just found it. You used "nanbield".
>
> I'll test it in my branch and "torture" it with regular tests and my 
> new bitbake selftests.

Oops, the tests are failing:

======================================================================
ERROR: test_0b_db (prserv.tests.FunctionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line 
114, in test_0b_db
     self.assertTrue(self.table.test_package(version, pkgarch))
   File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76, 
in test_package
     row=data.fetchone()
sqlite3.ProgrammingError: Cannot operate on a closed cursor.

Let me send you my patch series with your change included. I'll send 
them through a private e-mail. Don't want to spam the list at this stage.
Thanks
Michael.

Joshua Watt April 24, 2024, 2:23 p.m. UTC | #3

On Wed, Apr 24, 2024 at 3:20 AM Michael Opdenacker
<michael.opdenacker@bootlin.com> wrote:
>
> Hi Joshua
>
> On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote:
> > Which branch did you use to generate your patch? I can't manage to
> > apply it to either "master", "master-next" or the branch I shared
> > recently...
> >
> > Ah, just found it. You used "nanbield".
> >
> > I'll test it in my branch and "torture" it with regular tests and my
> > new bitbake selftests.
>
> Oops, the tests are failing:
>
> ======================================================================
> ERROR: test_0b_db (prserv.tests.FunctionTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line
> 114, in test_0b_db
>      self.assertTrue(self.table.test_package(version, pkgarch))
>    File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76,
> in test_package
>      row=data.fetchone()
> sqlite3.ProgrammingError: Cannot operate on a closed cursor.

Ah right. Ok so instead of doing the cursor operation in the
_exectue(), you'll need to do it in each API call. The general
strategy is to do:

 def _importHist():
     if self.read_only:
         return None
     with closing(self.conn.cursor()) as cursor:
         # Use cursor.execute() for all statements, e.g.
         row = cursor.execute("SELECT value ....")
         data = row.fetchone()

You can use the same cursor for multiple queries, so you only need the
one for each API call (e.g. you don't need to make a new one for each
execute statement).

With this strategy I don't think you'll ever get DB locking errors, so
you should be able to eliminate the _execute() wrapper

Just make sure you _always_ use a cursor to execute any database
queries. It will ruin your day if you combine self.conn.execute() with
cursor queries :)

The cursor makes your database modifications atomic, that is any
changes you make to the database aren't applied until the cursor is
closed, and then they are applied atomically so either all the changes
you made are commited, or none of them. Keep in mind that this does
_not_ mean that your database reads are also atomic with your writes;
if another process commits to the database between your read and write
(or even between two reads) it's possible you will be dealing with a
stale understanding of the database when you commit.

Sorry about the nanbield branch, I wasn't paying attention

>
> Let me send you my patch series with your change included. I'll send
> them through a private e-mail. Don't want to spam the list at this stage.
> Thanks
> Michael.
>
> --
> Michael Opdenacker, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
>

Michael Opdenacker April 26, 2024, 12:53 p.m. UTC | #4

Hi Joshua

Many thanks for the tips!

On 4/24/24 at 16:23, Joshua Watt wrote:
> On Wed, Apr 24, 2024 at 3:20 AM Michael Opdenacker
> <michael.opdenacker@bootlin.com> wrote:
>> Hi Joshua
>>
>> On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote:
>>> Which branch did you use to generate your patch? I can't manage to
>>> apply it to either "master", "master-next" or the branch I shared
>>> recently...
>>>
>>> Ah, just found it. You used "nanbield".
>>>
>>> I'll test it in my branch and "torture" it with regular tests and my
>>> new bitbake selftests.
>> Oops, the tests are failing:
>>
>> ======================================================================
>> ERROR: test_0b_db (prserv.tests.FunctionTests)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>     File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line
>> 114, in test_0b_db
>>       self.assertTrue(self.table.test_package(version, pkgarch))
>>     File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76,
>> in test_package
>>       row=data.fetchone()
>> sqlite3.ProgrammingError: Cannot operate on a closed cursor.
> Ah right. Ok so instead of doing the cursor operation in the
> _exectue(), you'll need to do it in each API call. The general
> strategy is to do:
>
>   def _importHist():
>       if self.read_only:
>           return None
>       with closing(self.conn.cursor()) as cursor:
>           # Use cursor.execute() for all statements, e.g.
>           row = cursor.execute("SELECT value ....")
>           data = row.fetchone()
>
> You can use the same cursor for multiple queries, so you only need the
> one for each API call (e.g. you don't need to make a new one for each
> execute statement).
>
> With this strategy I don't think you'll ever get DB locking errors, so
> you should be able to eliminate the _execute() wrapper
>
> Just make sure you _always_ use a cursor to execute any database
> queries. It will ruin your day if you combine self.conn.execute() with
> cursor queries :)
>
> The cursor makes your database modifications atomic, that is any
> changes you make to the database aren't applied until the cursor is
> closed, and then they are applied atomically so either all the changes
> you made are commited, or none of them. Keep in mind that this does
> _not_ mean that your database reads are also atomic with your writes;
> if another process commits to the database between your read and write
> (or even between two reads) it's possible you will be dealing with a
> stale understanding of the database when you commit.
>
> Sorry about the nanbield branch, I wasn't paying attention

What you suggest seems to work fine.

By the way, I still had empty databases after leaving the tests, but adding:
self.conn.commit()
after a group or INSERT or UPDATE commands did the trick.

What's strange is that the hashserver code does without doing this.

Anyway, let me test that two servers can access the same database now. 
If everything works, I'll be able to send a V5 soon!

Cheers
Michael.

Michael Opdenacker April 26, 2024, 12:56 p.m. UTC | #5

On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote:
> By the way, I still had empty databases after leaving the tests, but 
> adding:
> self.conn.commit()
> after a group or INSERT or UPDATE commands did the trick.


For the record, my database was created with the below settings:

self.connection.execute("PRAGMA synchronous = NORMAL;")
self.connection.execute("PRAGMA journal_mode = WAL;")

Cheers
Michael.

Michael Opdenacker April 26, 2024, 1:50 p.m. UTC | #6

On 4/26/24 at 14:56, Michael Opdenacker via lists.openembedded.org wrote:
>
> On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote:
>> By the way, I still had empty databases after leaving the tests, but 
>> adding:
>> self.conn.commit()
>> after a group or INSERT or UPDATE commands did the trick.
>
>
> For the record, my database was created with the below settings:
>
> self.connection.execute("PRAGMA synchronous = NORMAL;")
> self.connection.execute("PRAGMA journal_mode = WAL;")

Oh, forget it, it's gone again (databases being empty again). Maybe, I 
looked too quickly.
I'll keep investigating and will keep you posted.
Michael.

Joshua Watt April 26, 2024, 5:52 p.m. UTC | #7

On Fri, Apr 26, 2024 at 7:50 AM Michael Opdenacker
<michael.opdenacker@bootlin.com> wrote:
>
>
> On 4/26/24 at 14:56, Michael Opdenacker via lists.openembedded.org wrote:
> >
> > On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote:
> >> By the way, I still had empty databases after leaving the tests, but
> >> adding:
> >> self.conn.commit()
> >> after a group or INSERT or UPDATE commands did the trick.
> >
> >
> > For the record, my database was created with the below settings:
> >
> > self.connection.execute("PRAGMA synchronous = NORMAL;")
> > self.connection.execute("PRAGMA journal_mode = WAL;")
>
> Oh, forget it, it's gone again (databases being empty again). Maybe, I
> looked too quickly.
> I'll keep investigating and will keep you posted.

I'm thinking something else is wrong then (maybe in the test setup?).
The original code as I read it would probably have had some sort of
database error due to the exclusive lock instead of just being empty,
and I can't think of any case in the new code where the database would
just appear empty if both servers are actually pointing to the same
database

> Michael.
>
> --
> Michael Opdenacker, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
>

Joshua Watt April 26, 2024, 5:55 p.m. UTC | #8

On Fri, Apr 26, 2024 at 6:53 AM Michael Opdenacker
<michael.opdenacker@bootlin.com> wrote:
>
> Hi Joshua
>
> Many thanks for the tips!
>
> On 4/24/24 at 16:23, Joshua Watt wrote:
> > On Wed, Apr 24, 2024 at 3:20 AM Michael Opdenacker
> > <michael.opdenacker@bootlin.com> wrote:
> >> Hi Joshua
> >>
> >> On 4/24/24 at 10:15, Michael Opdenacker via lists.openembedded.org wrote:
> >>> Which branch did you use to generate your patch? I can't manage to
> >>> apply it to either "master", "master-next" or the branch I shared
> >>> recently...
> >>>
> >>> Ah, just found it. You used "nanbield".
> >>>
> >>> I'll test it in my branch and "torture" it with regular tests and my
> >>> new bitbake selftests.
> >> Oops, the tests are failing:
> >>
> >> ======================================================================
> >> ERROR: test_0b_db (prserv.tests.FunctionTests)
> >> ----------------------------------------------------------------------
> >> Traceback (most recent call last):
> >>     File "/home/mike/work/yocto/poky/bitbake/lib/prserv/tests.py", line
> >> 114, in test_0b_db
> >>       self.assertTrue(self.table.test_package(version, pkgarch))
> >>     File "/home/mike/work/yocto/poky/bitbake/lib/prserv/db.py", line 76,
> >> in test_package
> >>       row=data.fetchone()
> >> sqlite3.ProgrammingError: Cannot operate on a closed cursor.
> > Ah right. Ok so instead of doing the cursor operation in the
> > _exectue(), you'll need to do it in each API call. The general
> > strategy is to do:
> >
> >   def _importHist():
> >       if self.read_only:
> >           return None
> >       with closing(self.conn.cursor()) as cursor:
> >           # Use cursor.execute() for all statements, e.g.
> >           row = cursor.execute("SELECT value ....")
> >           data = row.fetchone()
> >
> > You can use the same cursor for multiple queries, so you only need the
> > one for each API call (e.g. you don't need to make a new one for each
> > execute statement).
> >
> > With this strategy I don't think you'll ever get DB locking errors, so
> > you should be able to eliminate the _execute() wrapper
> >
> > Just make sure you _always_ use a cursor to execute any database
> > queries. It will ruin your day if you combine self.conn.execute() with
> > cursor queries :)
> >
> > The cursor makes your database modifications atomic, that is any
> > changes you make to the database aren't applied until the cursor is
> > closed, and then they are applied atomically so either all the changes
> > you made are commited, or none of them. Keep in mind that this does
> > _not_ mean that your database reads are also atomic with your writes;
> > if another process commits to the database between your read and write
> > (or even between two reads) it's possible you will be dealing with a
> > stale understanding of the database when you commit.
> >
> > Sorry about the nanbield branch, I wasn't paying attention
>
> What you suggest seems to work fine.
>
> By the way, I still had empty databases after leaving the tests, but adding:
> self.conn.commit()
> after a group or INSERT or UPDATE commands did the trick.
>
> What's strange is that the hashserver code does without doing this.

It has commit() calls also, look in sqlite.py. I frequently forget
that the cursor doesn't implicitly commit when closed (unlike SQL
Alchemy, which _does_ do that) :)

>
> Anyway, let me test that two servers can access the same database now.
> If everything works, I'll be able to send a V5 soon!
>
> Cheers
> Michael.
>
> --
> Michael Opdenacker, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
>

Joshua Watt April 26, 2024, 6:01 p.m. UTC | #9

On Fri, Apr 26, 2024 at 6:56 AM Michael Opdenacker
<michael.opdenacker@bootlin.com> wrote:
>
>
> On 4/26/24 at 14:53, Michael Opdenacker via lists.openembedded.org wrote:
> > By the way, I still had empty databases after leaving the tests, but
> > adding:
> > self.conn.commit()
> > after a group or INSERT or UPDATE commands did the trick.
>
>
> For the record, my database was created with the below settings:
>
> self.connection.execute("PRAGMA synchronous = NORMAL;")

synchronous only affects when sqlite makes the sync() syscall, and
therefore should have no impact on multiple processes using the
database. Setting it NORMAL will cause some sync() calls to occur such
that if power loss occurs, there is no (or very minimal) chance of
database corruption (but perhaps lost data). Since sync() generally
has a higher than normal impact on Yocto builds, the hash server
defaults to OFF, so that there are no sync() calls. The data is
correct and fine, as long as no power loss occurs and can be shared
between multiple processes and doesn't affect build performance.

> self.connection.execute("PRAGMA journal_mode = WAL;")
>
> Cheers
> Michael.
>
> --
> Michael Opdenacker, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
>

Michael Opdenacker April 29, 2024, 8:27 p.m. UTC | #10

Hi Joshua,

On 4/26/24 at 19:55, Joshua Watt wrote:
> It has commit() calls also, look in sqlite.py. I frequently forget
> that the cursor doesn't implicitly commit when closed (unlike SQL
> Alchemy, which _does_ do that) :)

Correct, I must have checked the wrong file.

Actually, my code should have all the commit() calls it needs. I put one 
every time a change is made to the database.

However, I still have this very weird issue:

  * NOK: in bitbake-selftests, the tests that use the PR server run
    successfully (except test_3b_readonly, because of the issue), but
    the PR table is empty in the database file (if opened through the
    sqlite3 command line). It seems that the table is only updated in
    memory. It must be so, otherwise the tests wouldn't pass.
  * OK: the only bitbake-selftests that use the DB functions directly
    don't have this issue
  * OK: creating an image with BitBake does create a non empty table in
    the PR database file (cache/prserv.sqlite3)

Would anyone have any clue why?

I've just submitted the latest version of my code if you want to have a 
look or test by yourself:
https://lore.kernel.org/bitbake-devel/20240429201418.657042-1-michael.opdenacker@bootlin.com/T/#t

You can run the bitbake selftests as follows:
bitbake-selftest prserv.tests

Thanks in advance
Cheers
Michael.

Joshua Watt April 29, 2024, 9:03 p.m. UTC | #11

On Mon, Apr 29, 2024 at 2:27 PM Michael Opdenacker
<michael.opdenacker@bootlin.com> wrote:
>
> Hi Joshua,
>
> On 4/26/24 at 19:55, Joshua Watt wrote:
> > It has commit() calls also, look in sqlite.py. I frequently forget
> > that the cursor doesn't implicitly commit when closed (unlike SQL
> > Alchemy, which _does_ do that) :)
>
> Correct, I must have checked the wrong file.
>
> Actually, my code should have all the commit() calls it needs. I put one
> every time a change is made to the database.
>
> However, I still have this very weird issue:
>
>   * NOK: in bitbake-selftests, the tests that use the PR server run
>     successfully (except test_3b_readonly, because of the issue), but
>     the PR table is empty in the database file (if opened through the
>     sqlite3 command line). It seems that the table is only updated in
>     memory. It must be so, otherwise the tests wouldn't pass.
>   * OK: the only bitbake-selftests that use the DB functions directly
>     don't have this issue
>   * OK: creating an image with BitBake does create a non empty table in
>     the PR database file (cache/prserv.sqlite3)
>
> Would anyone have any clue why?

I think it might actually be returning the correct value ("0").
Remember that each test case starts with a blank database, and I don't
see any code in that populates anything into the database in that
test.... so "0" seems like it would be the correct PR value?

>
> I've just submitted the latest version of my code if you want to have a
> look or test by yourself:
> https://lore.kernel.org/bitbake-devel/20240429201418.657042-1-michael.opdenacker@bootlin.com/T/#t
>
> You can run the bitbake selftests as follows:
> bitbake-selftest prserv.tests
>
> Thanks in advance
> Cheers
> Michael.
>
> --
> Michael Opdenacker, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
>

Michael Opdenacker April 30, 2024, 8:57 a.m. UTC | #12

Hi Joshua,

On 4/29/24 at 23:03, Joshua Watt wrote:
> On Mon, Apr 29, 2024 at 2:27 PM Michael Opdenacker
> <michael.opdenacker@bootlin.com> wrote:
>> Hi Joshua,
>>
>> On 4/26/24 at 19:55, Joshua Watt wrote:
>>> It has commit() calls also, look in sqlite.py. I frequently forget
>>> that the cursor doesn't implicitly commit when closed (unlike SQL
>>> Alchemy, which _does_ do that) :)
>> Correct, I must have checked the wrong file.
>>
>> Actually, my code should have all the commit() calls it needs. I put one
>> every time a change is made to the database.
>>
>> However, I still have this very weird issue:
>>
>>    * NOK: in bitbake-selftests, the tests that use the PR server run
>>      successfully (except test_3b_readonly, because of the issue), but
>>      the PR table is empty in the database file (if opened through the
>>      sqlite3 command line). It seems that the table is only updated in
>>      memory. It must be so, otherwise the tests wouldn't pass.
>>    * OK: the only bitbake-selftests that use the DB functions directly
>>      don't have this issue
>>    * OK: creating an image with BitBake does create a non empty table in
>>      the PR database file (cache/prserv.sqlite3)
>>
>> Would anyone have any clue why?
> I think it might actually be returning the correct value ("0").
> Remember that each test case starts with a blank database, and I don't
> see any code in that populates anything into the database in that
> test.... so "0" seems like it would be the correct PR value?


Wow, your remark made me understand that the setUp() function is run 
before *every test*, and not only once for each class of tests!
If I merge my read-only tests with the previous ones, making only one 
test, they now work as expected (except for the very last one, maybe a 
bug in my code).

Many thanks! You saved my day :)
Cheers
Michael.

[bitbake-devel,RFC,v2] prserv: Enable database sharing

Commit Message

Comments

Patch