You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matus UHLAR - fantomas <uh...@fantomas.sk> on 2009/01/20 16:49:12 UTC

bayes autolearn off but journal updated

Hello,

on my systems I turned bayes filter off by default:

cd /etc/mail/spamassassin/
grep bayes *

local.cf:use_bayes 0
local.cf:bayes_auto_learn 0
local.cf:bayes_auto_expire 0
local.cf:bayes_learn_to_journal 1

...I keep the journal default so any user who turns on bayes, would use
journalling even for manual learning.

One of users has BAYES turned on, without changing value of auto_learn or
anything:

# bayes databazu plnit budeme...
use_bayes 1
bayes_auto_learn 0
bayes_auto_expire 0

However, this users' bayes_journal keeps being changed, even without manual
intervention. I also get ocasionally the error in logs:

Jan 20 16:33:22 t02 spamd[5073]: bayes: cannot open bayes databases /<...>/.spamassassin/bayes_* R/W: lock failed: File exists

Why does it update the journal? Why does it try to open journal in R/W mode?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"Where do you want to go to die?" [Microsoft]

Re: bayes autolearn off but journal updated

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> >> On 20.01.09 19:45, Matt Kettler wrote:
> >>> Yes, more specifically, it's mostly going to be updating the "atime", or
> >>> time of last access, records for tokens. This time is used by the expiry
> >>> process to drop the least recently used tokens.

> > Matus UHLAR - fantomas wrote:
> >> What does SA do, if it can't r/w open bayes database? Will it skip BAYES
> >> checks or just tie it r/o ?
> >>
> >> (I notice ocasional missing BAYES in X-Spam headers)

> On Thu, Jan 22, 2009 at 02:48, Matt Kettler <mk...@verizon.net> wrote:
> > Well, first let's be clear.. it's R/W opening the journal, not the
> > database itself.

well, sorry, OK.

> > As for write locks to the journal, if for some reason there's a
> > conflict, the update is just dropped with a warning. This isn't
> > incredibly likely unless your bayes is really busy, as journal updates
> > are pretty short in nature.

Yes, this is what I wanted to know...

On 22.01.09 09:47, Justin Mason wrote:
> on POSIX filesystems, this should be virtually impossible, since the
> file is opened for append with atomic writes.

we have mailboxes on NFS, accessed from more machined, i guess that may be
the reason.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I feel like I'm diagonally parked in a parallel universe. 

Re: bayes autolearn off but journal updated

Posted by Justin Mason <jm...@jmason.org>.
On Thu, Jan 22, 2009 at 10:05, Paweł Sasin <ha...@wp-sa.pl> wrote:
>> >>> Yes, more specifically, it's mostly going to be updating the
>> >>> "atime", or time of last access, records for tokens. This time is
>> >>> used by the expiry process to drop the least recently used tokens.
>> >>>
>> >>
>> >> What does SA do, if it can't r/w open bayes database? Will it skip
>> >> BAYES checks or just tie it r/o ?
>> >>
>> >> (I notice ocasional missing BAYES in X-Spam headers)
>> >>
>> > Well, first let's be clear.. it's R/W opening the journal, not the
>> > database itself.
>> >
>> > The main _toks and _seen files are only locked R/W if there's one
>> > of the following going on:
>> >     learning without bayes_learn_to_journal set
>> >     a journal sync
>> >     token expiry is running
>> >
>> > As for write locks to the journal, if for some reason there's a
>> > conflict, the update is just dropped with a warning. This isn't
>> > incredibly likely unless your bayes is really busy, as journal
>> > updates are pretty short in nature.
>>
>> on POSIX filesystems, this should be virtually impossible, since the
>> file is opened for append with atomic writes.
>
> It is quite common on Solaris with 40+ working spamds and really high
> traffic volume. Some time ago we had such situation. The server had 50%
> idle while the spamds were striving to lock the journal (auto_learn and
> auto_expire disabled) rather than going on to handle a next message. Ie
> the machine was 50% idle but was unable to handle more messages and the
> bottleneck was in journal updates.

You definitely mean the journal, right?  not the bayes dbs?
interesting to hear this, I haven't encountered it before...

--j.

Re: bayes autolearn off but journal updated

Posted by Paweł Sasin <ha...@wp-sa.pl>.
> >>> Yes, more specifically, it's mostly going to be updating the
> >>> "atime", or time of last access, records for tokens. This time is
> >>> used by the expiry process to drop the least recently used tokens.
> >>>
> >>
> >> What does SA do, if it can't r/w open bayes database? Will it skip
> >> BAYES checks or just tie it r/o ?
> >>
> >> (I notice ocasional missing BAYES in X-Spam headers)
> >>
> > Well, first let's be clear.. it's R/W opening the journal, not the
> > database itself.
> >
> > The main _toks and _seen files are only locked R/W if there's one
> > of the following going on:
> >     learning without bayes_learn_to_journal set
> >     a journal sync
> >     token expiry is running
> >
> > As for write locks to the journal, if for some reason there's a
> > conflict, the update is just dropped with a warning. This isn't
> > incredibly likely unless your bayes is really busy, as journal
> > updates are pretty short in nature.
> 
> on POSIX filesystems, this should be virtually impossible, since the
> file is opened for append with atomic writes.

It is quite common on Solaris with 40+ working spamds and really high
traffic volume. Some time ago we had such situation. The server had 50%
idle while the spamds were striving to lock the journal (auto_learn and
auto_expire disabled) rather than going on to handle a next message. Ie
the machine was 50% idle but was unable to handle more messages and the
bottleneck was in journal updates.

-- 
Paweł Sasin

"WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 0000068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.

Re: bayes autolearn off but journal updated

Posted by Justin Mason <jm...@jmason.org>.
On Thu, Jan 22, 2009 at 02:48, Matt Kettler <mk...@verizon.net> wrote:
> Matus UHLAR - fantomas wrote:
>>
>> On 20.01.09 19:45, Matt Kettler wrote:
>>
>>> Yes, more specifically, it's mostly going to be updating the "atime", or
>>> time of last access, records for tokens. This time is used by the expiry
>>> process to drop the least recently used tokens.
>>>
>>
>> What does SA do, if it can't r/w open bayes database? Will it skip BAYES
>> checks or just tie it r/o ?
>>
>> (I notice ocasional missing BAYES in X-Spam headers)
>>
> Well, first let's be clear.. it's R/W opening the journal, not the
> database itself.
>
> The main _toks and _seen files are only locked R/W if there's one of the
> following going on:
>     learning without bayes_learn_to_journal set
>     a journal sync
>     token expiry is running
>
> As for write locks to the journal, if for some reason there's a
> conflict, the update is just dropped with a warning. This isn't
> incredibly likely unless your bayes is really busy, as journal updates
> are pretty short in nature.

on POSIX filesystems, this should be virtually impossible, since the
file is opened for append with atomic writes.

--j.

> If you look at /lib/Mail/SpamAssassin/BayesStore/DBM.pm and find "sub
> cleanup" in it.
>
> Snippets of that code:
>
>  my $path = $self->_get_journal_filename();
>  ...
>
>  if (!open (OUT, ">>".$path)) {
>    warn "bayes: cannot write to $path, bayes db update ignored: $!\n";
>    umask $umask; # reset umask
>    return;
>   }
>
>
>
>

Re: bayes autolearn off but journal updated

Posted by Matt Kettler <mk...@verizon.net>.
Matus UHLAR - fantomas wrote:
>
> On 20.01.09 19:45, Matt Kettler wrote:
>   
>> Yes, more specifically, it's mostly going to be updating the "atime", or
>> time of last access, records for tokens. This time is used by the expiry
>> process to drop the least recently used tokens.
>>     
>
> What does SA do, if it can't r/w open bayes database? Will it skip BAYES
> checks or just tie it r/o ?
>
> (I notice ocasional missing BAYES in X-Spam headers)
>   
Well, first let's be clear.. it's R/W opening the journal, not the
database itself.

The main _toks and _seen files are only locked R/W if there's one of the
following going on:
     learning without bayes_learn_to_journal set
     a journal sync
     token expiry is running

As for write locks to the journal, if for some reason there's a
conflict, the update is just dropped with a warning. This isn't
incredibly likely unless your bayes is really busy, as journal updates
are pretty short in nature.

If you look at /lib/Mail/SpamAssassin/BayesStore/DBM.pm and find "sub
cleanup" in it.

Snippets of that code:

  my $path = $self->_get_journal_filename();
  ...

  if (!open (OUT, ">>".$path)) {
    warn "bayes: cannot write to $path, bayes db update ignored: $!\n";
    umask $umask; # reset umask
    return;
   }




Re: bayes autolearn off but journal updated

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> > On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:
> >   
> >> Why does it update the journal? Why does it try to open journal in R/W mode?

> Theo Van Dinter wrote:
> > $ man sa-learn

Oh, sorry for missing that in docs :(

> > In other words, the journal isn't just for learning.

On 20.01.09 19:45, Matt Kettler wrote:
> Yes, more specifically, it's mostly going to be updating the "atime", or
> time of last access, records for tokens. This time is used by the expiry
> process to drop the least recently used tokens.

What does SA do, if it can't r/w open bayes database? Will it skip BAYES
checks or just tie it r/o ?

(I notice ocasional missing BAYES in X-Spam headers)
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.

Re: bayes autolearn off but journal updated

Posted by Matt Kettler <mk...@verizon.net>.
Theo Van Dinter wrote:
> On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:
>   
>> Why does it update the journal? Why does it try to open journal in R/W mode?
>>     
>
> $ man sa-learn
> [...]
>        bayes_journal
>            While SpamAssassin is scanning mails, it needs to track which tokens it uses in its cal-
>            culations.  To avoid the contention of having each SpamAssassin process attempting to
>            gain write access to the Bayes DB, the token timestamps are written to a ’journal’ file
>            which will later (either automatically or via "sa-learn --sync") be used to synchronize
>            the Bayes DB.
>
> In other words, the journal isn't just for learning.
>
>   
Yes, more specifically, it's mostly going to be updating the "atime", or
time of last access, records for tokens. This time is used by the expiry
process to drop the least recently used tokens.


Re: bayes autolearn off but journal updated

Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:
> Why does it update the journal? Why does it try to open journal in R/W mode?

$ man sa-learn
[...]
       bayes_journal
           While SpamAssassin is scanning mails, it needs to track which tokens it uses in its cal-
           culations.  To avoid the contention of having each SpamAssassin process attempting to
           gain write access to the Bayes DB, the token timestamps are written to a ’journal’ file
           which will later (either automatically or via "sa-learn --sync") be used to synchronize
           the Bayes DB.

In other words, the journal isn't just for learning.

-- 
Randomly Selected Tagline:
Cats are smarter than dogs.  You can't make eight cats pull a sled through
 the snow.