You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Srilatha <sr...@intoto.com> on 2007/08/30 09:55:12 UTC

Usage of journal in Bayesian Filtering.

Hi,

I am trying understand the usage of journal in Bayesian Filtering.

If bayes_learn_to_journal is set to 1, SA stores newly learnt tokens 
in the journal.

When bayesian filter is activated, while scanning a message
SA reads tokens from BOTH 'bayes_tokens' database and 'bayes_journel'

While scanning a message, tokens found in bayes_tokens database are 
written to bayes_journel with modified timestamp


Is my understanding correct ?
Please correct me if my understanding is wrong

regards,
Srilatha




********************************************************************************
This email message (including any attachments) is for the sole use of the intended recipient(s) 
and may contain confidential, proprietary and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended recipient, 
please immediately notify the sender by reply email and destroy all copies of the original message. 
Thank you.
 
Intoto Inc. 


Re: Usage of journal in Bayesian Filtering.

Posted by Matt Kettler <mk...@verizon.net>.
Srilatha wrote:
> Hi,
>
> I am trying understand the usage of journal in Bayesian Filtering.
>
> If bayes_learn_to_journal is set to 1, SA stores newly learnt tokens
> in the journal.
Correct.
>
>
> When bayesian filter is activated, while scanning a message
> SA reads tokens from BOTH 'bayes_tokens' database and 'bayes_journel'
No, it only reads bayes_tokens.

 If it read bayes_journal while scanning, it would defeat the purpose of
the journal.

The journal exits to be more readily writable. This is possible only
because it is rarely read from. If you read from the journal during
scans, the write lock wouldn't be any more available than the write lock
for the main tokens database, so you might as well use that for all your
writes.

Data is merged from the journal into the tokens at regular intervals as
a part of SA's automatic sync process (once a day), when you run
sa-learn --sync, or sa-learn --force-expire.

This in general means data in the journal doesn't "go live" until a sync
kicks off. This is why bayes_learn_to_journal defaults to 0. It improves
learning performance, but also introduces a "lag" where the results
don't take effect until there's a sync.
>
> While scanning a message, tokens found in bayes_tokens database are
> written to bayes_journel with modified timestamp
Correct. Timestamp updates are always written to the journal, largely
because they're only relevant during expiry scans, and SA always does a
sync before it scans for expiry. There's no sense holding up scanners in
order to update timestamps, as it has no affect at all on the scan
results, so dumping it into the journal is ideal.
>
>
> Is my understanding correct ?
> Please correct me if my understanding is wrong 
Corrected where appropriate.