You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matt Kettler <mk...@evi-inc.com> on 2005/05/24 04:27:41 UTC

Re: {SPAM} Bayes expiring during message test

Ben Wylie wrote:
> I am running SA 3.02 on a Windows 2003 server.
> As previously posted to this list I have had a problem where SA seems unable
> to remove a bayes lock file or something like that.
> 
> I include complete logs below to show what it is like - I apologise for the
> size of it.
> 
> First of all, I was wondering if anyone knows what the error message that is
> being displayed and what might be causing it?

First that's NOT an error message.  You are running SA in debug mode, and you
are seeing a debug message. All it means is just what it says, part of the SA
code is refreshing it's hold on the database lock. It's not failing anything,
it's normal.

The expiry process on a non-SQL based bayes DB refreshes often to avoid having
another SA process assume the lock is stale and delete it. (see sub
set_running_expire_tok in BayeStore/DBM.pm)


If the message bothers you, don't run SA with -D.


> Secondly, in my local.cf file I have:
> bayes_expiry_max_db_size 500000
> Why is it expiring the database when it is only 11mb big?

That sounds about right.. the comments about 150k tokens being 8mb are outdated
and belong to SA 2.6x. In 2.6x tokens were text strings, and thus rather large.

SA 3.0 tokens are SHA1 hashes (16 bytes), plus a few extra bytes for atime,
nspam, nham. I'm not sure the exact size of the tokens, but 11mb does sound
feasible. My own ballpark guess at the format runs 13mb for 500k tokens.

Unfortunately, I don't run SA 3.x at this time, so I can't verify that.

> Why is it expiring the database during a message scan?

Because SA does that by default. In some SA environments SA only runs when
messages are being scanned. It's got to expire at some point, so it does it once
in a while during a message scan. This is on by default, otherwise users that
just call "spamassassin" instead of using spamd would have their bayes files
grow without bound.

> Is there a command line option to prevent it from expiring during a scan of
> a message?

No, but there's a config option you can add to local.cf:
bayes_auto_expire 0

> I presume if u use the --no-sync option when learning messages, it just
> creates a journal file which can then by synchronised later with the main
> bayes db, is this correct?

Yes, or you can run sa-learn --sync to cause a sync check to occur.

Or you can use sa-learn --force-expire which will force a sync and expire to
run, regardless of perceived need.



RE: Bayes expiring during message test

Posted by Ben Wylie <sa...@benwylie.co.uk>.
>> I am running SA 3.02 on a Windows 2003 server.
>> As previously posted to this list I have had a problem where SA seems 
>> unable to remove a bayes lock file or something like that.
>> 
>> First of all, I was wondering if anyone knows what the error message that

>> is being displayed and what might be causing it?
>
> First that's NOT an error message.  You are running SA in debug mode, and
> you are seeing a debug message. All it means is just what it says, part of
> the SA code is refreshing it's hold on the database lock. It's not failing
> anything, it's normal.
> 
> The expiry process on a non-SQL based bayes DB refreshes often to avoid >
> having another SA process assume the lock is stale and delete it. (see 
> subset_running_expire_tok in BayeStore/DBM.pm)
> 
> If the message bothers you, don't run SA with -D.

It's not that it bothers me. It takes an age to get to that stage, and I
thought that the expiry had already taken place.
It spends a long time on the line:
debug: bayes: expiry max exponent: 9
I thought that all this time meant that it would have already processed the
expiry, and so the only thing that was preventing it from completing was
this message about bayes.lock thing. I now understand that the expiry hasn't
taken place yet and that this repeated message isn't an indication of
anything going wrong, but just it making sure that it isn't overridden.

>> Secondly, in my local.cf file I have:
>> bayes_expiry_max_db_size 500000
>> Why is it expiring the database when it is only 11mb big?

> That sounds about right.. the comments about 150k tokens being 8mb are 
> outdated and belong to SA 2.6x. In 2.6x tokens were text strings, and thus
> rather large.
>
> SA 3.0 tokens are SHA1 hashes (16 bytes), plus a few extra bytes for 
> atime, nspam, nham. I'm not sure the exact size of the tokens, but 11mb 
> does sound feasible. My own ballpark guess at the format runs 13mb for 
> 500k tokens.
>
> Unfortunately, I don't run SA 3.x at this time, so I can't verify that.

Ok. I thought that a 500k database would be much larger. Since I specified
the larger database (I can't remember how big the default one is), the bayes
db file doesn't seem to be any bigger, so I was expecting it to grow more
before expiry. It is not recommended to have a db of more then 500k?
Presumably the larger it is the slower it is. Is that the only reason to
keep the db size down?

>> Why is it expiring the database during a message scan?

> Because SA does that by default. In some SA environments SA only runs when
> messages are being scanned. It's got to expire at some point, so it does 
> it once in a while during a message scan. This is on by default, otherwise
> users that just call "spamassassin" instead of using spamd would have 
> their bayes files grow without bound.

How can this be? ... Actually I guess if you use autolearning, then you
don't need to run sa-learn separately. Before I noticed this happening, I
thought that it would only autoexpire during the learning process. I do a
batch learn over night. If it starts expiring during scanning of a message,
it messes up the timeout that my mailserver has.

>> Is there a command line option to prevent it from expiring during a scan 
>> of a message?
>
> No, but there's a config option you can add to local.cf:
> bayes_auto_expire 0

This will prevent it auto expiring during a sa-learn batch as well I
presume, so I will have to schedule an expiry specifically.

>> I presume if u use the --no-sync option when learning messages, it just
>> creates a journal file which can then by synchronised later with the main
>> bayes db, is this correct?
>
> Yes, or you can run sa-learn --sync to cause a sync check to occur.
>
> Or you can use sa-learn --force-expire which will force a sync and expire
> to run, regardless of perceived need.

Thanks Matt for setting me straight.

Ben