You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by dalchri <ch...@apexhomes.net> on 2007/07/23 22:15:37 UTC

Force autolearn=ham for manual whitelist

Hello,

I completed configuring all my network tests and the bayes database has
passed 200 ham messages and is being used.  The bayes database has been
accumulating knowledge so far through autolearn.

I was concerned about how one sided the autolearning has been since over 90%
of our email is spam.  To avoid FP, I put our customer database of email
addresses into a manual whitelist.

Although these addresses are making it through fine, only a few are being
reported as autolearn=ham in the X-Spam-Status header, most are being
reported as autolearn=no.

Is there any way to force these messages through the autolearn process?
-- 
View this message in context: http://www.nabble.com/Force-autolearn%3Dham-for-manual-whitelist-tf4132168.html#a11751873
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Force autolearn=ham for manual whitelist

Posted by Matt Kettler <mk...@verizon.net>.
dalchri wrote:
> Hello,
>
> I completed configuring all my network tests and the bayes database has
> passed 200 ham messages and is being used.  The bayes database has been
> accumulating knowledge so far through autolearn.
>
> I was concerned about how one sided the autolearning has been since over 90%
> of our email is spam.  To avoid FP, I put our customer database of email
> addresses into a manual whitelist.
>
> Although these addresses are making it through fine, only a few are being
> reported as autolearn=ham in the X-Spam-Status header, most are being
> reported as autolearn=no.
>
> Is there any way to force these messages through the autolearn process?
>   
No, in fact, the autolearner currently intentionally ignores manual
whitelists when deciding if it should autolearn.

This is largely done to prevent whitelisting mistakes from creating a
"bayes hangover", where the autolearning causes a lot of mistakenly
whitelisted spam to get learned as nonspam.

This risk is quite realistic if you're whitelist_from, particularly if
you do whole domains, and inevitable if you use "whitelist_from
*@mydomain.com". This is because whitelist_from offers no protections at
all against forgery. Fundamentally, whitelist_from is a tool of last
resort, and only exists for a few rare situations where no other option
exists. (were it not for those situations, there are strong arguments
that would likely result in whitelist_from being removed from SA)

Ok, I suppose I lied a bit, you could modify the tflags for the
USER_IN_WHITELIST rule so it no longer has userconf or noautolearn. That
should cause the autolearner to start considering the score of the
whitelist, which will almost certainly result in most of the messages
being learned as nonspam. (however, if they score really high in the
BAYES_* rules, it will still refuse to autolearn something that strongly
contradicts the existing training)

 However, proceed with due caution, and only if you're using
whitelist_from_rcvd or whitelist_from_spf. Don't do this with
whitelist_from.