You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Ingo Reinhart <i....@dung.de> on 2005/05/19 11:43:39 UTC

bayes_auto_learn_threshold_spam not working?

Hello!

I am a little confused with the bayes. I have set 
bayes_auto_learn_threshold_spam =  6.

Why is the following mail not autolearn as spam? No user.prefs or else is 
set.

May 19 10:25:16 mail spamd[6531]: result: Y 14 - 
BIZ_TLD,DATE_IN_PAST_06_12,FROM_ENDS_IN_NUMS,LOCAL_OBFU_DUNG_BIZ,LOCAL_OBFU_DUNG_GAMES,LOCAL_OBFU_DUNG_SOFTWARE,TDE_FM_BU_EXCUSE,TDE_RO_BV_GRATIS,TDE_WS_BV_GARANTIEN,TDE_WS_BV_KEINRISIKO,TDE_WS_BV_PREIS1,TDE_WS_BV_RABATT,TDE_WS_BV_SPARPREIS,TDE_WS_BV_WORD_ALLES,TDE_WS_BV_WORD_GARANTIE,TDE_WS_BV_WORD_JETZT 
scantime=2.1,size=6817,mid=<17...@amuro.net>,autolearn=no

Ingo

Re: bayes_auto_learn_threshold_spam not working?

Posted by Theo Van Dinter <fe...@kluge.net>.

On Thu, May 19, 2005 at 11:43:39AM +0200, Ingo Reinhart wrote:
> Why is the following mail not autolearn as spam? No user.prefs or else is 
> set.

As usual, run with -D.  It will tell you.

-- 
Randomly Generated Tagline:
"Marriage is a three ring circus: engagement ring, wedding ring, and
 suffering."                  - Unknown

Re: bayes_auto_learn_threshold_spam not working?

Posted by Pete <pe...@supermail.dyndns.org>.

On Thursday 19 May 2005 10:43, Ingo Reinhart wrote:
> Hello!
>
> I am a little confused with the bayes. I have set
> bayes_auto_learn_threshold_spam =  6.
>
> Why is the following mail not autolearn as spam? No user.prefs or else is
> set.

Sorry to snip your data Ingo, but I am in the process of working out a 
problem of my own with spamc/spamd, and in doing so, might be able to help 
you.

I found this link helpful to me (although it didn't quite solve my 
particular problem) :

http://wiki.apache.org/spamassassin/AutolearningNotWorking

"Lots of people seem to be confused by the "autolearn=no" statement in the 
default X-Spam-Status header. There are usually questions regarding whether 
or not "no" means SpamAssassin is not autolearning at all. What it actually 
means is that the specific message which includes the "autolearn=no" part 
was not autolearned, not that autolearning is disabled or somehow broken."

(snip)

"If a message has already been learned by SpamAssassin, then that message 
will not be learned again. Therefore, if you run a message through 
SpamAssassin to see why it was classified as spam or ham, and it has 
already been learned, you will always get the result "autolearn=no". (To 
see this more clearly, use the "-D" flag, and you will see debug output 
explaining that the message has already been learned.)"

I'm just going to 'piggyback' my query on top of yours if that's ok.

My mail server runs SpamAssassin 3.0.3 on Fedora Core 3 (2.6.11-1.14_FC3).

SA was installed via Perl. I had previously installed SA using up2date but 
due to the problem I will mention in a minute, I decided to try the Perl 
install to see if anything changed. It didn't.

When I had the dynamic duo of spamc/spamd running, I noticed I was getting 
similar outputs to Ingo, as far as 'autolearn' was concerned, but with 
'autolearn=failed' coming up more often. I never got 'autolearn=ham or 
spam'.

Switching to using the spamassassin binary on it's own, instead of 
spamc/spamd produced better results, with 'autolearn=ham' coming up for 
every ham mail I got. I am very pleased with this, but can't help wondering 
why the spamc/spamd combo produces 'autolearn=failed' when the spamassassin 
binary doesn't. To clarify what I mean, here is the section of my 
procmailrc on my mail server, that is relevant to SA :

:0fw: spamassassin.lock
| /usr/bin/spamassassin

(this always seems to make SA do an 'autolearn=ham')

Before, I had this in my procmailrc :

:0fw:
| /usr/bin/spamc

(This produces 'autolearn=failed', with the occasional 'autolearn=no')

From the aforementioned link, I found this explanation of the failure :

"failed: means that autolearning was attempted, but couldn't complete. This 
happens if SpamAssassin can't gain a lock on the Bayes database files, 
etc."

Basically, what do I do now to enable the spamc/spamd combo to get proper 
locking ?

By the way, I did try adding the 'spamassassin.lock' entry to the second 
procmailrc excerpt above, but nothing changed.

If it helps, this is my local.cf, in /etc/mail/spamassassin :

rewrite_header Subject [SPAM]
required_hits 4.8
# report_safe 1
# trusted_networks 212.17.35.
lock_method flock

# These addresses should never be marked as [SPAM].
whitelist_from ********@***********.***
whitelist_from *********@******.***
whitelist_from ****@********.***
whitelist_from *********@*******.***

The 'lock_method flock' was initially commented out. I enabled it to 
experiment, as I don't use NFS.

Sorry for the long post.

Pete.

Re: bayes_auto_learn_threshold_spam not working?

Posted by Jim Maul <jm...@elih.org>.

Ingo Reinhart wrote:
> Hello!
> 
> I am a little confused with the bayes. I have set 
> bayes_auto_learn_threshold_spam =  6.
> 
> Why is the following mail not autolearn as spam? No user.prefs or else 
> is set.
> 
> May 19 10:25:16 mail spamd[6531]: result: Y 14 - 
> BIZ_TLD,DATE_IN_PAST_06_12,FROM_ENDS_IN_NUMS,LOCAL_OBFU_DUNG_BIZ,LOCAL_OBFU_DUNG_GAMES,LOCAL_OBFU_DUNG_SOFTWARE,TDE_FM_BU_EXCUSE,TDE_RO_BV_GRATIS,TDE_WS_BV_GARANTIEN,TDE_WS_BV_KEINRISIKO,TDE_WS_BV_PREIS1,TDE_WS_BV_RABATT,TDE_WS_BV_SPARPREIS,TDE_WS_BV_WORD_ALLES,TDE_WS_BV_WORD_GARANTIE,TDE_WS_BV_WORD_JETZT 
> scantime=2.1,size=6817,mid=<17...@amuro.net>,autolearn=no 
> 
> 
> 
because there are a lot more things involved with autolearning than 
simply the score of the message.  a certain number of points of body 
points as well as header points are needed.  There are actually a bunch 
of other criteria as well, but i think this is why it did not autolearn. 
  check the sa docs..all of the tests are in there somewhere.

-Jim

Re: bayes_auto_learn_threshold_spam not working?

Posted by Matt Kettler <mk...@comcast.net>.

At 05:43 AM 5/19/2005, Ingo Reinhart wrote:
>I am a little confused with the bayes. I have set 
>bayes_auto_learn_threshold_spam =  6.
>
>Why is the following mail not autolearn as spam? No user.prefs or else is 
>set.

>May 19 10:25:16 mail spamd[6531]: result: Y 14 - 
>BIZ_TLD,DATE_IN_PAST_06_12,FROM_ENDS_IN_NUMS,LOCAL_OBFU_DUNG_BIZ,LOCAL_OBFU_DUNG_GAMES,LOCAL_OBFU_DUNG_SOFTWARE,TDE_FM_BU_EXCUSE,TDE_RO_BV_GRATIS,TDE_WS_BV_GARANTIEN,TDE_WS_BV_KEINRISIKO,TDE_WS_BV_PREIS1,TDE_WS_BV_RABATT,TDE_WS_BV_SPARPREIS,TDE_WS_BV_WORD_ALLES,TDE_WS_BV_WORD_GARANTIE,TDE_WS_BV_WORD_JETZT 
>scantime=2.1,size=6817,mid=<17...@amuro.net>,autolearn=no

Since I don't have the full score statements for all your rules, I can't 
answer that.  However, I can make some general statements.

Autolearning scores are very much NOT the same thing as message scores. And 
the threshold alone is NOT the only criteria to learn as spam.

For autolearning purposes the score is recalculated with bayes disabled. 
This includes shifting the scoreset used, which can cause a dramatic 
decrease or increase in score.

Also, regardless of threshold, there MUST be 3.0 worth of header rules, and 
3.0 worth of body rules hit.

 From scanning the rules, it looks like you might not have made 3.0 worth 
of header hits. DATE_IN_PAST_06_12 and FROM_ENDS_IN_NUMS are the only ones 
that I know to be header rules, although your custom rules might have some 
more header rules. I can't tell.

score DATE_IN_PAST_06_12 0.301 0.211 0.918 0
score FROM_ENDS_IN_NUMS 0.177 0.516 0.517 0.000

Those two definitely don't add to 3.0. You might want to see if any of your 
other hits are header rules.