You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jonas Eckerman <jo...@frukt.org> on 2006/05/01 00:40:20 UTC

Re: SQLite

Michael Parker wrote:

>> On a stable system with working backup routines running SQLite with
>> 'PRAGMA SYNCHRONOUS=OFF' for bayes makes a lot of sense.

> It has been awhile, but I believe you just need to do this at create
> time, so you'd only need a proper .sql file that did it.

I think that might have been true for the pragma "default_sunchronous" (or something like that) in SQLite 2.*.

In SQLite 3.' there is no persistant setting for this so the command must be given for each "connection" to the database.

> That said, that doesn't mean that I wouldn't welcome a contribution from
> someone who went off and did the work, so feel free to create the module
> and do the testing.

I created a small module for SQLlite that simply inherits almost everything from Mail::SpamAssassin::SQL. It seems to work and I've done some benchmarks with it.

I did notice some two things though:

1: In phase 2 and 5 there's an enormous amount of calls to _db_connect.

IIRC "connecting" to a SQLite databse can be potentially time consuming, so using more persistant database connections *might* give a SQLite bayes-store better performance. Actually, a more persistant connection makes sense for other SQL modules as well.

I might try the benchmarjs again with an override so that the untie_db method doesn't really disconnect from the databse in the SQLite module.

2: In phase 5 I see a number of "warn: closing dbh with active statement handles" in "output.txt". While such warning *can* indicate a potential memory leak, I have no idea wether it is any problem or not in this case.

> Submit a bug with the code and results attached and

I'll do some more testing, before doing that. Here's the benchmarks so far:

Total times:
SDBM:      44:07
DB_File:   49:43
SQLite:  2:26_03

Detailed Times:
Phase	SDBM	DB_File	SQLite
1.a	305,95	375,05	1719,04
1.b	234,67	308,42	759,78
2.-	934,95	939,6	923,21
?	1,19	1,18	1,23
3.-	11,41	24,38	28,84
4.a	213,47	235,41	2982,64
4.b	110,09	122,56	737,73
5.a	484,73	578,32	1139,67
5.b	349,21	396,2	470,01
?	1,44	1,58	1,24
Total	2647,13	2982,7	8763,4

Obviously SQLite is nowehere near SDBM or DB_File when just using the standard SQL module this way.

Notable though is that SQLite actually performed best in phase 2 though, and that's one of the phases where I saw a big number of calls to _connect_db.

> discovered it just was not worth it, you were better off sticking with
> Berkeley DBD or the MUCH faster SDBM.

I believe you're right.

I don't have any normal SQL server on the machine I'm testing this on. Otherwise it would make sense comparing the bechmarks for SQLite with MySQL and PostGreSQL.

I was a bit surprised that the differenmce between SDBM and DB_File wasn't bigger though. I had the impression that the difference would be bigger. It is possible that some parts of the becnhmark isn't working right for me, I guess. Or Berkley DB has become faster than it was. Or it just fits well together with FreeBSDs file system.

> Improvements to the
> benchmark are also more than welcome.

I added helper and tests files for the SQLite module, and will send them later.

Regards
/Jonas

-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/


Re: SQLite

Posted by Jakob Hirsch <jh...@plonk.de>.
Hi,

Trying to keep this a little alive :)

I ran a SA-independent benchmark, which was simply using a .dump (which 
outputs BEGIN TRANSACTION, CREATE TABLE, INSERT ..., COMMIT) of a single 
table with 5 columns, ca. 17000 rows:

with BEGIN TRANSACTION/COMMIT:

PRAGMA synchronous=OFF;
real    0m1.455s
user    0m1.400s
sys     0m0.052s

PRAGMA synchronous=NORMAL;
real    0m1.523s
user    0m1.372s
sys     0m0.072s

PRAGMA synchronous=FULL;
real    0m1.537s
user    0m1.400s
sys     0m0.052s


without BEGIN TRANSACTION/COMMIT:

PRAGMA synchronous=OFF;
real    0m10.113s
user    0m2.692s
sys     0m7.220s

PRAGMA synchronous=NORMAL;
real    10m38.229s
user    0m4.788s
sys     0m13.353s

PRAGMA synchronous=FULL;
real    14m3.243s
user    0m4.920s
sys     0m14.193s


so, if you run multiple INSERTs (and probably UPDATES), you should do it 
in a single transaction (which should be done for integrity, anyway).
That fits perfectly in what I saw before when I converted my bayes DB 
into SQLite.

> 1: In phase 2 and 5 there's an enormous amount of calls to _db_connect.
> 
> IIRC "connecting" to a SQLite databse can be potentially time consuming, 
> so using more persistant database connections *might* give a SQLite 
> bayes-store better performance. Actually, a more persistant connection 
> makes sense for other SQL modules as well.

Sure, but for SQLite the effect is probably not that big. And I think 
this is just what http://wiki.apache.org/spamassassin/DBIPlugin does.

If I get the time, I'll try to make my own SQLite module (and probably 
ask the SQLite people before, the mailing list is usually helpful). I 
doubt that it outperforms SDBM, but you never know.