You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kasper Sacharias Eenberg <ks...@hovmark.dk> on 2009/06/22 12:06:36 UTC

Bayes and SQL.

Goodday.

I'm installing a new spamfilter for my company, and i figured i'd try
Bayes is SQL.
However i have some problems with maintenance and a few general
questions.

I'm not completely sure that force-expire does anything. I ran it
several times last week, and nothing showed up in the 'last expiry
atime' column. So i figured it wasn't working.
But lo and behold when i ran it to get output for this e-mail, something
did happen.

1) Now, expiry gives me some 'strange' output. Can anyone take a look at
this and tell me if it's normal?

2) The 'sync' apparently will not work. No sync atime is reported.

I attached the sync, expire and dump magic for debugging purposes. The
dump magic was run after  sync and expire.

3) As i understand i need to run 'optimize table' on the 'tokens_seen'
regularly to keep it clean right?

4) Are there any other things i need to run regularly, assuming i can
make the sync/expire work?

5) Another 'problem' i'm having is that restoring from backup -> sql is
horribly slow. Is this normal or might my mysql/network not be running
optimally? I don't really know how to test bayes queries, but normal
queries to the SQL go fast.

sa-learn --force-expire -D
http://pastebin.com/m5f32ae32

sa-learn --dump magic -D
http://pastebin.com/m42066d18

sa-learn --sync -D
http://pastebin.com/m3d4a8931

Versions:
CentOS 5.3
Spamassassin 3.2.5
Perl: 5.8.8
MySQL: 5.0.45-7.el5  (The mysql is run on another server of WAN)
perl-DBD-mysql: 4.011-1.el5.rf

---
With regards,
Kasper


Re: Bayes and SQL.

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Jun 22, 2009 at 6:06 AM, Kasper Sacharias Eenberg<ks...@hovmark.dk> wrote:
> I'm not completely sure that force-expire does anything. I ran it
> several times last week, and nothing showed up in the 'last expiry
> atime' column. So i figured it wasn't working.

Please keep in mind that "--force-expire" means "force an expire run
to occur" which isn't the same as "force tokens to be expired".
Reading the verbose expirations docs in "man sa-learn" may be useful.

fwiw, that's actually what the man page says for "--force-expire" as well. ;)
       --force-expire
           Forces an expiry attempt, regardless of whether it may be necessary
           or not.  Note: This doesn't mean any tokens will actually expire.
           Please see the EXPIRATION section below.

> 1) Now, expiry gives me some 'strange' output. Can anyone take a look at
> this and tell me if it's normal?

What's strange?  It says "couldn't find a good delta atime, need more
token difference, skipping expire".
That's explained in the man page as mentioned above (see "ESTIMATION
PASS LOGIC").

In short:
[3753] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[3753] dbg: bayes: token count: 171682, final goal reduction size: 59182

SA wants to expire down to 112500, by removing 59182 tokens.

[3753] dbg: bayes: 1382400 89566
[3753] dbg: bayes: 2764800 0

Your DB is pretty new, and so when looking at the atime deltas, there
is no delta which will expire <= 59182 (it can't take 2764800 because
there's nothing to do there, and 1382400 expires too many tokens).

Therefore it can't do anything, and needs more atime differences which
should let it find an appropriate delta to use.

> 2) The 'sync' apparently will not work. No sync atime is reported.

If you're using SQL, there is no sync time because there is no journal.
e

Re: Bayes and SQL.

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Kasper Sacharias Eenberg pisze:
> Goodday.
> 
> I'm installing a new spamfilter for my company, and i figured i'd try
> Bayes is SQL.
> However i have some problems with maintenance and a few general
> questions.

Hi Kasper,

We have been using Bayes in SQL formerly. I don't know/remember all
answers for your questions, but I can try to help you little.

> 5) Another 'problem' i'm having is that restoring from backup -> sql is
> horribly slow. Is this normal or might my mysql/network not be running
> optimally? I don't really know how to test bayes queries, but normal
> queries to the SQL go fast.

Did you try MySQL dumps? It should be faster way for restoring.

mysqldump your_bayes_db > bayes_dump.sql
echo "drop database your_bayes_db" |mysql your_bayes_db
echo "create database your_bayes_db" |mysql your_bayes_db
mysql your_bayes_db < bayes_dump.sql

> Versions:
> CentOS 5.3
> Spamassassin 3.2.5
> Perl: 5.8.8
> MySQL: 5.0.45-7.el5  (The mysql is run on another server of WAN)

What database storage do you use for your Bayes? I remember that we had
to switch from MyISAM do InnoDB because of stable and performance issues.

Have a nice summer day :)

P.