You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Sam <sa...@cryptolab.net> on 2019/04/18 17:56:25 UTC

Question about scoring and autolearning

Dear fellow SpamAssassin users,

I’ve read everything I could find on scoring and autolearning before
posting here, and yet cannot figure why autolearn triggers properly in
the presence of ham but never triggers when SpamAssassin is fed spam.

My global settings for SpamAssassin 3.4.2 launched from procmail on
Gentoo Linux (x86_64) are (in |/etc/spamassassin/local.cf|)

|bayes_auto_learn_on_error 1 |

which should be irrelevant here, with plugins RelayCountry, URIDNSBL,
SPF and TxRep in addition to the default config (the [mis-?]behaviour
was identical before I activated those plugins). Plugins all have
default parameters.

My user settings (in |~/.spamassassin/user_prefs|):

|bayes_auto_learn_threshold_spam 8.0 use_txrep 1 txrep_autolearn 2 |

However, even with heavy spam, autolearn does not seem to engage in spam
mode. The rule of “minimum score of 3 for the headers and 3 for the
body” seems ok with this one, excluding Bayes rules (body should be 2.6
for DEAR_FRIEND + 1.4 for MONEY_FORM_SHORT + 1.0 for FORM_FRAUD + 2.0
for ADVANCE_FEE_2_NEW_MONEY):

|X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on […]
X-Spam-Flag: YES X-Spam-Level: ********************************
X-Spam-Status: Yes, score=32.3 required=5.0
tests=ADVANCE_FEE_2_NEW_MONEY,
BAYES_99,BAYES_999,DATE_IN_FUTURE_03_06,DEAR_FRIEND,FORGED_MUA_OUTLOOK,
FORM_FRAUD,FREEMAIL_FORGED_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,
FROM_MISSPACED,FROM_MISSP_EH_MATCH,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,
FROM_MISSP_USER,FROM_MISSP_XPRIO,FSL_NEW_HELO_USER,KHOP_DYNAMIC,
LOTS_OF_MONEY,MISSING_HEADERS,MISSING_MID,MONEY_FORM_SHORT,
MONEY_FROM_MISSP,NSL_RCVD_FROM_USER,RCVD_IN_BL_SPAMCOP_NET,
RCVD_IN_RP_RNBL,RCVD_IN_SBL_CSS,REPLYTO_WITHOUT_TO_CC,STATIC_XPRIO_OLE,
TO_NO_BRKTS_FROM_MSSP,TO_NO_BRKTS_MSFT,TXREP,T_FILL_THIS_FORM_SHORT
shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 […]
Content analysis details: (32.3 points, 5.0 required) pts rule name
description ---- ----------------------
-------------------------------------------------- 0.2 BAYES_999 BODY:
Bayes spam probability is 99.9 to 100% [score: 1.0000] 3.5 BAYES_99
BODY: Bayes spam probability is 99 to 100% [score: 1.0000] 0.0
NSL_RCVD_FROM_USER Received from User 3.3 RCVD_IN_SBL_CSS RBL: Received
via a relay in Spamhaus SBL-CSS [65.29.9.30 listed in zen.spamhaus.org]
1.3 RCVD_IN_RP_RNBL RBL: Relay in RNBL,
https://senderscore.org/blacklistlookup/ [81.83.3.92 listed in
bl.score.senderscore.com] 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a
relay in bl.spamcop.net [Blocked - see
<https://www.spamcop.net/bl.shtml?65.29.9.30>] 0.2
FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit
(mariaroberts427[at]gmail.com) 3.0 DATE_IN_FUTURE_03_06 Date: is 3 to 6
hours after Received: date 1.0 MISSING_HEADERS Missing To: header 2.6
DEAR_FRIEND BODY: Dear Friend? That's not very dear! 0.0 FROM_MISSP_MSFT
From misspaced + supposed Microsoft tool 0.5 MISSING_MID Missing
Message-Id: header 1.3 KHOP_DYNAMIC Relay looks like a dynamic address
0.0 LOTS_OF_MONEY Huge... sums of money 0.0 FROM_MISSP_XPRIO Misspaced
FROM + X-Priority 0.0 FSL_NEW_HELO_USER Spam's using Helo and User 1.6
REPLYTO_WITHOUT_TO_CC No description available. 0.0 FROM_MISSP_USER From
misspaced, from "User" 2.1 FREEMAIL_FORGED_REPLYTO Freemail in Reply-To,
but not From 0.0 MONEY_FROM_MISSP Lots of money and misspaced From 2.0
STATIC_XPRIO_OLE Static RDNS + X-Priority + MIMEOLE 0.0
FROM_MISSP_REPLYTO From misspaced, has Reply-To 0.0 FROM_MISSPACED From:
missing whitespace 2.1 TO_NO_BRKTS_FROM_MSSP Multiple header formatting
problems 1.9 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS
Outlook 0.0 FROM_MISSP_EH_MATCH From misspaced, matches envelope 0.0
TO_NO_BRKTS_MSFT To: lacks brackets and supposed Microsoft tool 0.0
T_FILL_THIS_FORM_SHORT Fill in a short form with personal information
1.4 MONEY_FORM_SHORT Lots of money if you fill out a short form 1.0
FORM_FRAUD Fill a form and a fraud phrase 2.0 ADVANCE_FEE_2_NEW_MONEY
Advance Fee fraud and lots of money -0.1 TXREP TXREP: Score normalizing
based on sender's reputation |

What did I do wrong? What (probably trivial) bit did I miss?

Thanks in advance.

   Sam

Re: Question about scoring and autolearning

Posted by Sam <sa...@cryptolab.net>.

On 4/18/19 9:05 PM, @lbutlr wrote:
> On 18 Apr 2019, at 12:32, Sam <sa...@cryptolab.net> wrote:
>> I guess I’ll have to raise some scores to make it learn.
> Reconsider. The message was clearly marked as spam with a score more than 6 times the threshold. There is nothing here to train, SA did its job.
You have a good point there.

Re: Question about scoring and autolearning

Posted by "@lbutlr" <kr...@kreme.com>.

On 18 Apr 2019, at 12:32, Sam <sa...@cryptolab.net> wrote:
> I guess I’ll have to raise some scores to make it learn.

Reconsider. The message was clearly marked as spam with a score more than 6 times the threshold. There is nothing here to train, SA did its job.


-- 
Light thinks it travels faster than anything but it's wrong. No matter
how fast light travels it finds the darkness has always got there first,
and is waiting for it. --Reaper Man

Re: Question about scoring and autolearning

Posted by Sam <sa...@cryptolab.net>.

On 4/18/19 8:19 PM, @lbutlr wrote:

> On 18 Apr 2019, at 11:56, Sam <sa...@cryptolab.net> wrote:
>> However, even with heavy spam, autolearn does not seem to engage in spam mode. 
> Why do you think it should?
>
> Did you check the message with spamassassin -D?
>
> You might want to read this:
>
> <https://wiki.apache.org/spamassassin/AutolearningNotWorking>

Somehow, I managed to miss the -D mention in this page. And then indeed:

|Apr 18 20:21:39.229 [24804] dbg: learn: auto-learn? ham=0.1, spam=8,
body-points=2.604, head-points=10.041, learned-points=1.5 Apr 18
20:21:39.229 [24804] dbg: learn: auto-learn: autolearn_force not flagged
for a rule. Body Only Points: 2.604 (3 req'd) / Head Only Points: 10.041
(3 req'd) Apr 18 20:21:39.229 [24804] dbg: learn: auto-learn? no: scored
as spam but too few body points (2.604 < 3) |

I guess I’ll have to raise some scores to make it learn.

Thanks.

  Sam

Re: Question about scoring and autolearning

Posted by "@lbutlr" <kr...@kreme.com>.

On 18 Apr 2019, at 11:56, Sam <sa...@cryptolab.net> wrote:
> However, even with heavy spam, autolearn does not seem to engage in spam mode. 

Why do you think it should?

Did you check the message with spamassassin -D?

You might want to read this:

<https://wiki.apache.org/spamassassin/AutolearningNotWorking>


-- 
You may be anti anti-spam-kook if: Despite having invented the FUSSP you
not only don't know the difference between the SMTP envelope and SMTP
headers; you doubt there is such a thing as the SMTP envelop because
email doesn't involve paper.

Re: Question about scoring and autolearning

Posted by RW <rw...@googlemail.com>.

On Thu, 18 Apr 2019 19:56:25 +0200
Sam wrote:

> Dear fellow SpamAssassin users,
> 
> I’ve read everything I could find on scoring and autolearning before
> posting here, and yet cannot figure why autolearn triggers properly in
> the presence of ham but never triggers when SpamAssassin is fed spam.
> 
> My global settings for SpamAssassin 3.4.2 launched from procmail on
> Gentoo Linux (x86_64) are (in |/etc/spamassassin/local.cf|)
> 
> |bayes_auto_learn_on_error 1 |
> 
> which should be irrelevant here,

It's not irrelevant, it means that spam should not be learned if it's
already hitting BAYES_99, which it is in the example you gave.

Unless you have no choice, don't use autotraining, do it manually.