You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Roman Gelfand <rg...@gmail.com> on 2015/07/22 02:55:05 UTC

DKIM, SPF and Bayesian Learning

It seems that if DKIM or SPF is verified, the bayesian learning doesn't
matter.

X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no version=3.3.2

Re: DKIM, SPF and Bayesian Learning

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 21 Jul 2015, at 20:55, Roman Gelfand wrote:

> It seems that if DKIM or SPF is verified, the bayesian learning 
> doesn't
> matter.

Not so. Perhaps you need to refresh your understanding of what 
SpamAssassin is. It is not a collection of binary switches, but rather a 
scoring system consisting of rules which have various scores.

How much each rule matters is a local decision, subject to default 
values

> X-Spam-Status: No, score=3.6 required=5.0 
> tests=BAYES_99,BAYES_999,DKIM_SIGNED,
> 	DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no 
> version=3.3.2

3.3.2 is rather obsolete, but I still have the defaultrules laying 
about...

/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
BAYES_99  0  0  3.8    3.5
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
BAYES_999 0  0  0.2    0.2
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
DKIM_SIGNED 0.1
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
DKIM_VALID -0.1
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
DKIM_VALID_AU -0.1
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
HTML_MESSAGE 0.001
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score 
SPF_PASS -0.001

The arithmetic, assuming you allow network tests: The 2 Bayes rules (de 
facto Bayes certitude of spaminess) only add up to 3.7. All of the DKIM 
and SPF crap nets out to -0.101, vastly overstating their value in 
making spam/ham decisions, which in fact is indistinguishable from zero 
as independent rules. However, that remains a small mitigation relative 
the Bayes rules, which are much more reliable but still subject to error 
by their nature as statistically-derived values. This is consistent with 
your shown score and a reasonable understanding of spam.

On the other hand, if you really trust your Bayes DB and have a 
particular widespread flavor of spam hitting you, that precise set of 
rules (including HTML_MESSAGE) makes an excellent 'meta' rule worth a 
solid half point, and if you don't have a lot of non-spam marketing mail 
that you get voluntarily, you can probably lower your threshold to 4.5 
or maybe even 4. Try this first on a personal mail server, NOT on one 
handling mail for a broad audience including people who can fire you 
(until after you've analyzed the mail stream very carefully.)

Re: DKIM, SPF and Bayesian Learning

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 7/21/2015 8:55 PM, Roman Gelfand wrote:
> It seems that if DKIM or SPF is verified, the bayesian learning 
> doesn't matter.
>
> X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
> 	DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no version=3.3.2
If you mean autolearn, it requires a mixture of body and header rules.  
Most all the rules hit appear to be header rules

"Normally, SpamAssassin will require 3 points from the header and 3 
points from the body to be auto-learned as spam. "


See perldoc for Mail::SpamAssassin::Plugin::AutoLearnThreshold and 
Mail::SpamAssassin::Conf

Regards,
KAM