You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matt <lm...@gmail.com> on 2010/05/27 17:53:52 UTC

bayes_learn_to_journal 1

Does 'bayes_learn_to_journal 1' in local.cf do anything yet?  I
thought in past it helped save on disk I/O

Matt

Re: bayes_learn_to_journal 1

Posted by RW <rw...@googlemail.com>.
On Thu, 27 May 2010 10:53:52 -0500
Matt <lm...@gmail.com> wrote:

> Does 'bayes_learn_to_journal 1' in local.cf do anything yet?  I
> thought in past it helped save on disk I/O

Then, do you mean "does it do anything *still*"? Have you any reason to
think it doesn't?

AFAIK the journal is only used for the db backend, and the point of it
is to avoid frequent write-locking. The default is to only journal token
timestamps, which are only needed during expiry. bayes_learn_to_journal
journals all updates, but then learning doesn't take effect until the
sync is done.

Re: bayes_learn_to_journal 1 (should we eliminate journaling) - subject oops

Posted by Matt Kettler <mk...@verizon.net>.
On 5/27/2010 11:50 PM, Matt Kettler wrote:
> On 5/27/2010 11:53 AM, Matt wrote:
>   
>> Does 'bayes_learn_to_journal 1' in local.cf do anything yet?  I
>> thought in past it helped save on disk I/O
>>
>> Matt
>>   
>>     
>
FYI, please ignore the "should we ignore journaling". My original
version of the email questioned if any real benefit existed, but was
based on a flawed understanding of BDB's locking.. I revised the email,
but forgot to touch the subject.. oops.

Re: bayes_learn_to_journal 1 (should we eliminate journaling)

Posted by Matt Kettler <mk...@verizon.net>.
On 5/27/2010 11:53 AM, Matt wrote:
> Does 'bayes_learn_to_journal 1' in local.cf do anything yet?  I
> thought in past it helped save on disk I/O
>
> Matt
>   

Actually it will *increase* disk I/O during journal syncs by making the
synced dataset larger. During all other times, this has no affect on
disk I/O at all.

However, it will reduce write-lock contention on the main database, by
moving more of those locks over to the separate journal file.
Ultimately, this increases message processing speed on busy systems
where auto-learning frequently happens while other mail is ready to be
scanned.

At least in Berkeley DB, you are operating in a "one writer, many
reader" model. If one SA instance is writing to the database, it has
exclusive access, blocking all other SA instances which are merely
seeking to read. If no instances are writing, then multiple instances
can share read access safely.

By default:
message scanning reads the main tokens file, and writes atime updates to
the journal (they're only relevant to expiry anyway).
message learning writes the main tokens file.

This setup causes a write lock on the main token file during learning,
which briefly blocks all other SA instances seeking to scan mail,
slowing them down. However, it also makes all learning immediately hit
the live dataset used to scan messages.

When you activate journaled learning, the learning data is also written
to the journal. This causes no delays for scanners reading the main
tokens file. The drawback is the updated tokens won't "go live" until
the next sync. It also means the tokens ultimately written to disk
twice. Once to the journal at learning time, then during sync they get
read out of the journal and written to the main tokens file. (the fact
that the sync updates many tokens all in a batch does ultimately mean
the main token file spends less total time being locked)