You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Carlos Mennens <ca...@gmail.com> on 2010/04/28 17:53:55 UTC

Auto Learn Spam

I noticed when reviewing headers today that there was a section for
'autolearn=no' and was wondering what exactly does this mean and
wouldn't autolearn be a good thing? I use Amavisd-new which calls out
to SpamAssassin modules but I don't have the spamd daemon running
physically. The Amavisd-new daemon simply loads the modules for spamd
and does the scoring directly saving my mail server from running more
daemon's and system resources that it needs to. So below are the
headers:

X-Spam-Status: No, score=2.808 tagged_above=-999 required=5
    tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001,
    HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723,
    RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01]
    autolearn=no

The last line is what I am confused about.

-Carlos

Re: Auto Learn Spam

Posted by Bowie Bailey <Bo...@BUC.com>.

Carlos Mennens wrote:
> On Wed, Apr 28, 2010 at 12:10 PM, Dennis B. Hopp <dh...@coreps.com> wrote:
>   
>> Autolearn kicks in at certain scores.  I believe the default is 12.0 for
>> spam and 0.1 for ham.  You can customize those settings in your local.cf
>> file.
>>
>> bayes_auto_learn 1
>> bayes_auto_learn_threshold_nonspam -3.0
>> bayes_auto_learn_threshold_spam 12.0
>>     
>
> I checked /etc/mail/spamassassin/local.cf just now and found only the following:
>
> required_hits 5
> report_safe 0
> rewrite_header Subject [SPAM]
>
> However I don't know if Amavisd-new is looking at local.cf because I
> show parameters in my amavisd.conf file for SpamAssassin:
>
> $sa_tag_level_deflt  = -999.0;  # add spam info headers if at, or
> above that level
> $sa_tag2_level_deflt = 5.0;     # add 'spam detected' headers at that level
> $sa_kill_level_deflt = 8.0;     # triggers spam evasive actions (e.g.
> blocks mail)
> $sa_dsn_cutoff_level = 10;      # spam level beyond which a DSN is not sent
> $sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off
> $penpals_bonus_score = 8;    # (no effect without a @storage_sql_dsn database)
> $penpals_threshold_high = $sa_kill_level_deflt;  # don't waste time on hi spam
>
> $sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is larger
> $sa_local_tests_only = 0;    # only tests which do not require internet access?
> [...]
> $sa_spam_subject_tag = '***SPAM*** ';
> $defang_virus  = 1;  # MIME-wrap passed infected mail
> $defang_banned = 1;  # MIME-wrap passed mail containing banned name
> # for defanging bad headers only turn on certain minor contents categories:
> $defang_by_ccat{+CC_BADH.",3"} = 1;  # NUL or CR character in header
> $defang_by_ccat{+CC_BADH.",5"} = 1;  # header line longer than 998 characters
>
> When I get a spam message that was scored by SA, it says ***SPAM***
> and not [SPAM] so that leaves me to believe that SA parameters are
> being fed from amavisd.conf file. Does this make sense to you guys?

There are a few differences when you run SA through Amavis:

1) Required scores for tagging or rejecting messages are set in the
Amavis config (SA settings are ignored)
2) Settings for adding headers/markup to the email are set via Amavis
3) amavisd loads the SA libraries internally, so it is not necessary to
run spamd.

So your required_hits, report_safe, and rewrite_header options will not
be used by amavis.

However, the bayes settings along with rules, scores, etc, ARE read from
the normal SA configs, so if you want to change the Bayes learning
behavior, you can add the settings given above to your local.cf file and
then restart amavisd.  Keep in mind that the settings shown above are
more conservative than the default, so it will result in fewer messages
being learned automatically, but it is less likely to learn messages
incorrectly (spam being learned as ham or ham being learned as spam).

-- 
Bowie

Re: Auto Learn Spam

Posted by "Dennis B. Hopp" <dh...@coreps.com>.

On Wed, 2010-04-28 at 12:38 -0400, Carlos Mennens wrote:

> I checked /etc/mail/spamassassin/local.cf just now and found only the following:
> 
> required_hits 5
> report_safe 0
> rewrite_header Subject [SPAM]
> 
> However I don't know if Amavisd-new is looking at local.cf because I
> show parameters in my amavisd.conf file for SpamAssassin:
> 
> $sa_tag_level_deflt  = -999.0;  # add spam info headers if at, or
> above that level
> $sa_tag2_level_deflt = 5.0;     # add 'spam detected' headers at that level
> $sa_kill_level_deflt = 8.0;     # triggers spam evasive actions (e.g.
> blocks mail)
> $sa_dsn_cutoff_level = 10;      # spam level beyond which a DSN is not sent
> $sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off
> $penpals_bonus_score = 8;    # (no effect without a @storage_sql_dsn database)
> $penpals_threshold_high = $sa_kill_level_deflt;  # don't waste time on hi spam
> 

These settings are for amavisd-new and not spamassassin.  Amavisd-new is
the glue between your MTA and spamassassin (and virus scanners).  Most
of the behavior of spamassassin is still controlled through the local.cf
(although some settings can be defined in both places and the
amavisd.conf file will take precedence).

> $sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is larger
> $sa_local_tests_only = 0;    # only tests which do not require internet access?
> [...]
> $sa_spam_subject_tag = '***SPAM*** ';
> $defang_virus  = 1;  # MIME-wrap passed infected mail
> $defang_banned = 1;  # MIME-wrap passed mail containing banned name
> # for defanging bad headers only turn on certain minor contents categories:
> $defang_by_ccat{+CC_BADH.",3"} = 1;  # NUL or CR character in header
> $defang_by_ccat{+CC_BADH.",5"} = 1;  # header line longer than 998 characters
> 
> When I get a spam message that was scored by SA, it says ***SPAM***
> and not [SPAM] so that leaves me to believe that SA parameters are
> being fed from amavisd.conf file. Does this make sense to you guys?

This is just the setting in amavisd.conf taking precedence.  If you were
to comment out $sa_spam_subject_tag I *believe* the value in your
local.cf would then be used.

Re: Auto Learn Spam

Posted by Carlos Mennens <ca...@gmail.com>.

On Wed, Apr 28, 2010 at 12:10 PM, Dennis B. Hopp <dh...@coreps.com> wrote:
> Autolearn kicks in at certain scores.  I believe the default is 12.0 for
> spam and 0.1 for ham.  You can customize those settings in your local.cf
> file.
>
> bayes_auto_learn 1
> bayes_auto_learn_threshold_nonspam -3.0
> bayes_auto_learn_threshold_spam 12.0

I checked /etc/mail/spamassassin/local.cf just now and found only the following:

required_hits 5
report_safe 0
rewrite_header Subject [SPAM]

However I don't know if Amavisd-new is looking at local.cf because I
show parameters in my amavisd.conf file for SpamAssassin:

$sa_tag_level_deflt  = -999.0;  # add spam info headers if at, or
above that level
$sa_tag2_level_deflt = 5.0;     # add 'spam detected' headers at that level
$sa_kill_level_deflt = 8.0;     # triggers spam evasive actions (e.g.
blocks mail)
$sa_dsn_cutoff_level = 10;      # spam level beyond which a DSN is not sent
$sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off
$penpals_bonus_score = 8;    # (no effect without a @storage_sql_dsn database)
$penpals_threshold_high = $sa_kill_level_deflt;  # don't waste time on hi spam

$sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is larger
$sa_local_tests_only = 0;    # only tests which do not require internet access?
[...]
$sa_spam_subject_tag = '***SPAM*** ';
$defang_virus  = 1;  # MIME-wrap passed infected mail
$defang_banned = 1;  # MIME-wrap passed mail containing banned name
# for defanging bad headers only turn on certain minor contents categories:
$defang_by_ccat{+CC_BADH.",3"} = 1;  # NUL or CR character in header
$defang_by_ccat{+CC_BADH.",5"} = 1;  # header line longer than 998 characters

When I get a spam message that was scored by SA, it says ***SPAM***
and not [SPAM] so that leaves me to believe that SA parameters are
being fed from amavisd.conf file. Does this make sense to you guys?

>
> I changed the default value for nonspam because the majority of my users
> don't train bayes and so the default value could cause bayes to learn
> incorrectly if a spam message scored low (maybe no network rules or URI
> rules triggered the first few times).
>
>> X-Spam-Status: No, score=2.808 tagged_above=-999 required=5
>>     tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001,
>>     HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723,
>>     RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01]
>>     autolearn=no
>>
>
> This particular message scored a 2.808 so it's not high or low enough
> for bayes to know which way it should learn the message.
>
> --Dennis
>
>

Re: Auto Learn Spam

Posted by "Dennis B. Hopp" <dh...@coreps.com>.

On Wed, 2010-04-28 at 11:53 -0400, Carlos Mennens wrote:
> I noticed when reviewing headers today that there was a section for
> 'autolearn=no' and was wondering what exactly does this mean and
> wouldn't autolearn be a good thing? I use Amavisd-new which calls out
> to SpamAssassin modules but I don't have the spamd daemon running
> physically. The Amavisd-new daemon simply loads the modules for spamd
> and does the scoring directly saving my mail server from running more
> daemon's and system resources that it needs to. So below are the
> headers:
> 

Autolearn kicks in at certain scores.  I believe the default is 12.0 for
spam and 0.1 for ham.  You can customize those settings in your local.cf
file.

bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam -3.0
bayes_auto_learn_threshold_spam 12.0

I changed the default value for nonspam because the majority of my users
don't train bayes and so the default value could cause bayes to learn
incorrectly if a spam message scored low (maybe no network rules or URI
rules triggered the first few times).

> X-Spam-Status: No, score=2.808 tagged_above=-999 required=5
>     tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001,
>     HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723,
>     RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01]
>     autolearn=no
> 

This particular message scored a 2.808 so it's not high or low enough
for bayes to know which way it should learn the message.

--Dennis

Re: Auto Learn Spam

Posted by Michael Scheidell <sc...@secnap.net>.

On 4/28/10 11:53 AM, Carlos Mennens wrote:
> I noticed when reviewing headers today that there was a section for
> 'autolearn=no'
its a SPAMASSASSIN thing. (google)
it means the score was either not high enough for SA to learn as spam 
(bayes, and/or AWL) or was not low enough to learn as ham.

you should set the triggers high and low enough so that you don't 
accidentally learn a sneaky spam as ham, etc.

-- 
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best Anti-Spam Product 2008, Network Products Guide
    * King of Spam Filters, SC Magazine 2008

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________