You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Nick Rout <ni...@rout.co.nz> on 2006/08/23 23:46:38 UTC

analysing the logs

Using spamd started from .procmailrc, it logs to syslog and ends up in
/var/log/mail.log, along with postfix's log and courier-imap's log.

How can I get some analysis of this?, eg positives per day, etc.

Have googled a bit, and looked in the archives, a lot of people talk
about their stats, not many messages show the commands they use to get
them!
-- 
Nick Rout <ni...@rout.co.nz>


Re: bayes autolearn acting up

Posted by li...@zeta.net.
>> Here's another example:
>> X-Spam-Flag: YES
>> X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost
>> X-Spam-Level: *****
>> X-Spam-Status: Yes, score=6.0 required=5.0  
>> tests=ADVANCE_FEE_1,BAYES_95,
>>         DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS   
>> autolearn=spam
>>         version=3.1.4
>> I just can't see why it is autolearning everything that is tagged  
>> as  spam.
>> If anyone has any ideas, i'd appreciate it!
>
> grep bayes_auto_learn_threshold /etc/mail/spamassassin/*
> grep bayes_auto_learn_threshold /usr/share/spamassassin/*.cf
> grep bayes_auto_learn_threshold /var/lib/spamassassin/*.*/*.cf
>
> See if somewhere your setting is getting overridden. You might also
> perform some simply checks to see if the file you are changing is
> actually one that SpamAssassin is using. Some distros move the
> directories around. /etc/mail/spamassassin is often /etc/spamassassin,
> for example.

No other conf files anywhere - the conf files I am modifying are  
definitely the ones SA is using.
Could my Bayes DB be corrupt?  I am using a mysql DB for Bayes.   
Besides this autolearn glitch,
Bayes is performing well.

Thanks,
Devin


Re: bayes autolearn acting up

Posted by jdow <jd...@earthlink.net>.
From: <li...@zeta.net>

> On Aug 24, 2006, at 10:11 AM, lists@zeta.net wrote:
> 
>>
>>>> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>>>>
>>>> bayes_auto_learn 1
>>>>
>>>> and I set the learn boundaries with:
>>>>
>>>> bayes_auto_learn_threshold_nonspam    -3.5
>>>> bayes_auto_learn_threshold_spam       15.5
>>>>
>>>> I get unexpected auto-learning.  Example:  I just saw a spam come
>>>> through that scored 9.9, which is enough for it to be tagged as  
>>>> spam,
>>>> but it should not be auto-learned as spam.  But, in the header it
>>>> clearly reads:
>>>>
>>>> X-Spam-Status:
>>>> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
>>>> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE 
>>>> ,
>>>> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
>>>> version=3.1.4
>>>>
>>>>
>>>> Any ideas?
>>> SA does not autolearn based on the final message score. So, toss  
>>> the 9.9
>>> out the window. That's not the number SA compares to the 15.5.
>>>
>>> For learning SA uses what the message score would have been if: 1)  
>>> the
>>> AWL is off. 2) Bayes was disabled, including shifting what  
>>> scoreset is
>>> used for all the other rules. 3) all white/blacklists are  
>>> disabled. This
>>> is often *quite* different from the final score.
>>>
>>> However, in this case I don't entirely understand... The default  
>>> SA 3.1
>>> scores are:
>>>
>>> score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
>>> score DCC_CHECK 0 1.37 0 2.17
>>> score DIGEST_MULTIPLE 0 0.233 0 0.765
>>> score HTML_40_50 0.611 0 0.497 0.496
>>> score HTML_MESSAGE 0.001
>>> score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
>>> score RAZOR2_CHECK 0 0.5 0 0.5
>>> score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234
>>>
>>> Adding the set1 scores up, the learning score should have been 4.753.
>>>
>>> Have you modified any rule scores?
> 
> Here's another example:
> 
> X-Spam-Flag: YES
> X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost
> X-Spam-Level: *****
> X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95,
>         DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS  
> autolearn=spam
>         version=3.1.4
> 
> 
> I just can't see why it is autolearning everything that is tagged as  
> spam.
> 
> If anyone has any ideas, i'd appreciate it!

grep bayes_auto_learn_threshold /etc/mail/spamassassin/*
grep bayes_auto_learn_threshold /usr/share/spamassassin/*.cf
grep bayes_auto_learn_threshold /var/lib/spamassassin/*.*/*.cf

See if somewhere your setting is getting overridden. You might also
perform some simply checks to see if the file you are changing is
actually one that SpamAssassin is using. Some distros move the
directories around. /etc/mail/spamassassin is often /etc/spamassassin,
for example.

{^_^}

Re: bayes autolearn acting up

Posted by li...@zeta.net.
On Aug 24, 2006, at 10:11 AM, lists@zeta.net wrote:

>
>>> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>>>
>>> bayes_auto_learn 1
>>>
>>> and I set the learn boundaries with:
>>>
>>> bayes_auto_learn_threshold_nonspam    -3.5
>>> bayes_auto_learn_threshold_spam       15.5
>>>
>>> I get unexpected auto-learning.  Example:  I just saw a spam come
>>> through that scored 9.9, which is enough for it to be tagged as  
>>> spam,
>>> but it should not be auto-learned as spam.  But, in the header it
>>> clearly reads:
>>>
>>> X-Spam-Status:
>>> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
>>> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE 
>>> ,
>>> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
>>> version=3.1.4
>>>
>>>
>>> Any ideas?
>> SA does not autolearn based on the final message score. So, toss  
>> the 9.9
>> out the window. That's not the number SA compares to the 15.5.
>>
>> For learning SA uses what the message score would have been if: 1)  
>> the
>> AWL is off. 2) Bayes was disabled, including shifting what  
>> scoreset is
>> used for all the other rules. 3) all white/blacklists are  
>> disabled. This
>> is often *quite* different from the final score.
>>
>> However, in this case I don't entirely understand... The default  
>> SA 3.1
>> scores are:
>>
>> score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
>> score DCC_CHECK 0 1.37 0 2.17
>> score DIGEST_MULTIPLE 0 0.233 0 0.765
>> score HTML_40_50 0.611 0 0.497 0.496
>> score HTML_MESSAGE 0.001
>> score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
>> score RAZOR2_CHECK 0 0.5 0 0.5
>> score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234
>>
>> Adding the set1 scores up, the learning score should have been 4.753.
>>
>> Have you modified any rule scores?

Here's another example:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost
X-Spam-Level: *****
X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95,
         DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS  
autolearn=spam
         version=3.1.4


I just can't see why it is autolearning everything that is tagged as  
spam.

If anyone has any ideas, i'd appreciate it!

Regards,
Devin

Re: bayes autolearn acting up

Posted by li...@zeta.net.
On Aug 24, 2006, at 10:11 AM, lists@zeta.net wrote:

>
>>> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>>>
>>> bayes_auto_learn 1
>>>
>>> and I set the learn boundaries with:
>>>
>>> bayes_auto_learn_threshold_nonspam    -3.5
>>> bayes_auto_learn_threshold_spam       15.5
>>>
>>> I get unexpected auto-learning.  Example:  I just saw a spam come
>>> through that scored 9.9, which is enough for it to be tagged as  
>>> spam,
>>> but it should not be auto-learned as spam.  But, in the header it
>>> clearly reads:
>>>
>>> X-Spam-Status:
>>> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
>>> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE 
>>> ,
>>> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
>>> version=3.1.4
>>>
>>>
>>> Any ideas?
>> SA does not autolearn based on the final message score. So, toss  
>> the 9.9
>> out the window. That's not the number SA compares to the 15.5.
>>
>> For learning SA uses what the message score would have been if: 1)  
>> the
>> AWL is off. 2) Bayes was disabled, including shifting what  
>> scoreset is
>> used for all the other rules. 3) all white/blacklists are  
>> disabled. This
>> is often *quite* different from the final score.
>>
>> However, in this case I don't entirely understand... The default  
>> SA 3.1
>> scores are:
>>
>> score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
>> score DCC_CHECK 0 1.37 0 2.17
>> score DIGEST_MULTIPLE 0 0.233 0 0.765
>> score HTML_40_50 0.611 0 0.497 0.496
>> score HTML_MESSAGE 0.001
>> score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
>> score RAZOR2_CHECK 0 0.5 0 0.5
>> score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234
>>
>> Adding the set1 scores up, the learning score should have been 4.753.
>>
>> Have you modified any rule scores?

Here's another example:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost
X-Spam-Level: *****
X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95,
         DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS  
autolearn=spam
         version=3.1.4


I just can't see why it is autolearning everything that is tagged as  
spam.

Regards,
Devin

Re: bayes autolearn acting up

Posted by li...@zeta.net.
On Aug 24, 2006, at 10:11 AM, lists@zeta.net wrote:

>
>>> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>>>
>>> bayes_auto_learn 1
>>>
>>> and I set the learn boundaries with:
>>>
>>> bayes_auto_learn_threshold_nonspam    -3.5
>>> bayes_auto_learn_threshold_spam       15.5
>>>
>>> I get unexpected auto-learning.  Example:  I just saw a spam come
>>> through that scored 9.9, which is enough for it to be tagged as  
>>> spam,
>>> but it should not be auto-learned as spam.  But, in the header it
>>> clearly reads:
>>>
>>> X-Spam-Status:
>>> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
>>> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE 
>>> ,
>>> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
>>> version=3.1.4
>>>
>>>
>>> Any ideas?
>> SA does not autolearn based on the final message score. So, toss  
>> the 9.9
>> out the window. That's not the number SA compares to the 15.5.
>>
>> For learning SA uses what the message score would have been if: 1)  
>> the
>> AWL is off. 2) Bayes was disabled, including shifting what  
>> scoreset is
>> used for all the other rules. 3) all white/blacklists are  
>> disabled. This
>> is often *quite* different from the final score.
>>
>> However, in this case I don't entirely understand... The default  
>> SA 3.1
>> scores are:
>>
>> score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
>> score DCC_CHECK 0 1.37 0 2.17
>> score DIGEST_MULTIPLE 0 0.233 0 0.765
>> score HTML_40_50 0.611 0 0.497 0.496
>> score HTML_MESSAGE 0.001
>> score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
>> score RAZOR2_CHECK 0 0.5 0 0.5
>> score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234
>>
>> Adding the set1 scores up, the learning score should have been 4.753.
>>
>> Have you modified any rule scores?

Here's another example:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost
X-Spam-Level: *****
X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95,
         DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS  
autolearn=spam
         version=3.1.4


I just can't see why it is autolearning everything that is tagged as  
spam.

If anyone has any ideas, i'd appreciate it!

Regards,
Devin

Re: bayes autolearn acting up

Posted by li...@zeta.net.
>> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>>
>> bayes_auto_learn 1
>>
>> and I set the learn boundaries with:
>>
>> bayes_auto_learn_threshold_nonspam    -3.5
>> bayes_auto_learn_threshold_spam       15.5
>>
>> I get unexpected auto-learning.  Example:  I just saw a spam come
>> through that scored 9.9, which is enough for it to be tagged as spam,
>> but it should not be auto-learned as spam.  But, in the header it
>> clearly reads:
>>
>> X-Spam-Status:
>> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
>> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE,
>> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
>> version=3.1.4
>>
>>
>> Any ideas?
> SA does not autolearn based on the final message score. So, toss  
> the 9.9
> out the window. That's not the number SA compares to the 15.5.
>
> For learning SA uses what the message score would have been if: 1) the
> AWL is off. 2) Bayes was disabled, including shifting what scoreset is
> used for all the other rules. 3) all white/blacklists are disabled.  
> This
> is often *quite* different from the final score.
>
> However, in this case I don't entirely understand... The default SA  
> 3.1
> scores are:
>
> score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
> score DCC_CHECK 0 1.37 0 2.17
> score DIGEST_MULTIPLE 0 0.233 0 0.765
> score HTML_40_50 0.611 0 0.497 0.496
> score HTML_MESSAGE 0.001
> score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
> score RAZOR2_CHECK 0 0.5 0 0.5
> score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234
>
> Adding the set1 scores up, the learning score should have been 4.753.
>
> Have you modified any rule scores?


Thanks for trying to help Matt.  No, I don't think I have changed any  
of those scores.  I understand the basics of how the autolearn  
works.  For a long time, with the settings above, it would usually  
only autolearn spams with extremely high scores (well over 15).  Now,  
basically EVERY mail tagged as spam is being autolearned as spam  
whether it has scored 30 or 5.2.  The other weird issue is that  
anything that is not being tagged as spam is also being autolearned  
as ham.  (i.e. mails with scores of 3.5)  which is absolutely not  
what I want.

Thanks,
Devin

Re: bayes autolearn acting up

Posted by Matt Kettler <mk...@comcast.net>.
lists@zeta.net wrote:
> Hello,
>
> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>
> bayes_auto_learn 1
>
> and I set the learn boundaries with:
>
> bayes_auto_learn_threshold_nonspam    -3.5
> bayes_auto_learn_threshold_spam       15.5
>
> I get unexpected auto-learning.  Example:  I just saw a spam come
> through that scored 9.9, which is enough for it to be tagged as spam,
> but it should not be auto-learned as spam.  But, in the header it
> clearly reads:
>
> X-Spam-Status: 
> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE,
> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
> version=3.1.4
>
>
> Any ideas?
SA does not autolearn based on the final message score. So, toss the 9.9
out the window. That's not the number SA compares to the 15.5.

For learning SA uses what the message score would have been if: 1) the
AWL is off. 2) Bayes was disabled, including shifting what scoreset is
used for all the other rules. 3) all white/blacklists are disabled. This
is often *quite* different from the final score.

However, in this case I don't entirely understand... The default SA 3.1
scores are:

score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
score DCC_CHECK 0 1.37 0 2.17
score DIGEST_MULTIPLE 0 0.233 0 0.765
score HTML_40_50 0.611 0 0.497 0.496
score HTML_MESSAGE 0.001
score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
score RAZOR2_CHECK 0 0.5 0 0.5
score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234

Adding the set1 scores up, the learning score should have been 4.753.

Have you modified any rule scores?





Re: bayes autolearn acting up

Posted by jdow <jd...@earthlink.net>.
From: <li...@zeta.net>

> Hello,
> 
> Since upgrading to 3.14, when I turn on bayes auto-learn with:
> 
> bayes_auto_learn 1
> 
> and I set the learn boundaries with:
> 
> bayes_auto_learn_threshold_nonspam    -3.5

This doesn't answer your question. But, I suspect a -3.5 here will
all but turn off learning on ham.

{o.o}

bayes autolearn acting up

Posted by li...@zeta.net.
Hello,

Since upgrading to 3.14, when I turn on bayes auto-learn with:

bayes_auto_learn	1

and I set the learn boundaries with:

bayes_auto_learn_threshold_nonspam    -3.5
bayes_auto_learn_threshold_spam       15.5

I get unexpected auto-learning.  Example:  I just saw a spam come  
through that scored 9.9, which is enough for it to be tagged as spam,  
but it should not be auto-learned as spam.  But, in the header it  
clearly reads:

X-Spam-Status:
Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,  
DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE,  
MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam  
version=3.1.4


Any ideas?

Thanks,
Devin

Re: analysing the logs

Posted by jdow <jd...@earthlink.net>.
From: "Nick Rout" <ni...@rout.co.nz>

> Using spamd started from .procmailrc, it logs to syslog and ends up in
> /var/log/mail.log, along with postfix's log and courier-imap's log.
> 
> How can I get some analysis of this?, eg positives per day, etc.
> 
> Have googled a bit, and looked in the archives, a lot of people talk
> about their stats, not many messages show the commands they use to get
> them!

If you have a normal distro the nice set of tools that comes with
SpamAssassin are likely not there. With a cpan install there are
some interesting tools in /usr/share/doc/spamassassin/tools/.

A different tool, sadly named the same as one of the official tools,
that I like better was done by Dallas Engelken. It is carefully hidden
where nobody can find it at:
http://www.rulesemporium.com/programs/sa-stats.txt

Rename it to goodsa-stats.pl or something. It is quite informative
about what rules are hitting on ham or spam.

{^_^}