You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Greg Troxel <gd...@ir.bbn.com> on 2008/05/23 16:42:19 UTC

How to get clarity on AWL?

A lot of my mail is tagged with AWL, and I am often baffled.  Here are
what I think are the relevent headers from a perplexing example:

  Return-Path: <dr...@gmail.com>
  X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on fnord.ir.bbn.com
  X-Spam-Status: Yes, score=6.8 required=1.0 tests=AWL,BAYES_95,DEAR_WINNER,
          HTML_MESSAGE,SUBJ_ALL_CAPS autolearn=spam version=3.2.4
  X-Spam-Report: 
          *  2.1 SUBJ_ALL_CAPS Subject is all capitals
          *  3.2 DEAR_WINNER BODY: DEAR_WINNER
          *  0.0 HTML_MESSAGE BODY: HTML included in message
          *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
          *      [score: 0.9582]
          * -1.5 AWL AWL: From: address is in the auto white-list
  From: "AUSTRALIAN LOTTERY INTL" <dr...@gmail.com>

Reading http://wiki.apache.org/spamassassin/AwlWrongWay, I realize I am
confused - this sender has a positive average, and this message was more
spammy, and thus given credit for somewhat-less-spammy previous mail.

I think that I should be able to infer that because this message was 8.3
before AWL, and AWL was -1.5, that the average is 5.3.  But if the message said

          * -1.5 AWL AWL: From: address is in the auto white-list at 5.3 for 12 messages

it would make things easier to follow.  Plus, the AutoWhitelist wiki
entry says that the key is also IP address that the mail "originated
at", and it would be nice to print that out, since it's non-obvious what
that means (last hop before trusted relay, or relying on maybe-forged
received lines?).

Somewhat separately, the spamassasin program has options to manipulate
whitelist, blacklist:

     -W, --add-to-whitelist            Add addresses in mail to persistent address whitelist
     --add-to-blacklist                Add addresses in mail to persistent address blacklist
     -R, --remove-from-whitelist       Remove all addresses found in mail from
                                       persistent address list
     --add-addr-to-whitelist=addr      Add addr to persistent address whitelist
     --add-addr-to-blacklist=addr      Add addr to persistent address blacklist
     --remove-addr-from-whitelist=addr Remove addr from persistent address list

but I don't see any to print out the lists and scores for inspection,
and I'm unclear on the AWL vs persistent white/black lists.  I think it would make sense to have

    --print-whitelist
    --print-blacklist
    --print-autowhitelist

or perhaps only one is needed, and also

    --lookup-in-whitelists=addr

to print the white/black/auto status of an address.

Re: How to get clarity on AWL?

Posted by Matt Kettler <mk...@verizon.net>.

Greg Troxel wrote:
> A lot of my mail is tagged with AWL, and I am often baffled.  Here are
> what I think are the relevent headers from a perplexing example:
>
>   Return-Path: <dr...@gmail.com>
>   X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on fnord.ir.bbn.com
>   X-Spam-Status: Yes, score=6.8 required=1.0 tests=AWL,BAYES_95,DEAR_WINNER,
>           HTML_MESSAGE,SUBJ_ALL_CAPS autolearn=spam version=3.2.4
>   X-Spam-Report: 
>           *  2.1 SUBJ_ALL_CAPS Subject is all capitals
>           *  3.2 DEAR_WINNER BODY: DEAR_WINNER
>           *  0.0 HTML_MESSAGE BODY: HTML included in message
>           *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
>           *      [score: 0.9582]
>           * -1.5 AWL AWL: From: address is in the auto white-list
>   From: "AUSTRALIAN LOTTERY INTL" <dr...@gmail.com>
>
> Reading http://wiki.apache.org/spamassassin/AwlWrongWay, I realize I am
> confused - this sender has a positive average, and this message was more
> spammy, and thus given credit for somewhat-less-spammy previous mail.
>
> I think that I should be able to infer that because this message was 8.3
> before AWL, and AWL was -1.5, that the average is 5.3.  But if the message said
>
>           * -1.5 AWL AWL: From: address is in the auto white-list at 5.3 for 12 messages
>
> it would make things easier to follow.  Plus, the AutoWhitelist wiki
> entry says that the key is also IP address that the mail "originated
> at", and it would be nice to print that out, since it's non-obvious what
> that means (last hop before trusted relay, or relying on maybe-forged
> received lines?).
>   
Agreed this would make things clearer.. either that or have a tag setup 
so you can add it to the report or an X-Spam-AWL header with these 
details, should you so choose.

> Somewhat separately, the spamassasin program has options to manipulate
> whitelist, blacklist:
>
>      -W, --add-to-whitelist            Add addresses in mail to persistent address whitelist
>      --add-to-blacklist                Add addresses in mail to persistent address blacklist
>      -R, --remove-from-whitelist       Remove all addresses found in mail from
>                                        persistent address list
>      --add-addr-to-whitelist=addr      Add addr to persistent address whitelist
>      --add-addr-to-blacklist=addr      Add addr to persistent address blacklist
>      --remove-addr-from-whitelist=addr Remove addr from persistent address list
>
> but I don't see any to print out the lists and scores for inspection,
> and I'm unclear on the AWL vs persistent white/black lists.  I think it would make sense to have
>   
All of the above pertains to the AWL only. Persistent white/black list 
entries in your local.cf or user_prefs will show up as separate rule 
hits like USER_IN_WHITELIST.

>     --print-whitelist
>     --print-blacklist
>     --print-autowhitelist
>
> or perhaps only one is needed, and also
>
>     --lookup-in-whitelists=addr
>
> to print the white/black/auto status of an address.
>   
There is a tool that does this, but it's not included in the 
distribution. The check_whitelist script is available from the SVN.

http://svn.apache.org/repos/asf/spamassassin/branches/3.2/tools/check_whitelist

However, this tool is a bit crude, and it would be much nicer if this 
was all built into a separate sa-learn-like utility that handled AWL 
learning, forgetting and dumping.

Re: How to get clarity on AWL?

Posted by Chris <cp...@embarqmail.com>.

On Friday 23 May 2008 9:42 am, Greg Troxel wrote:
> A lot of my mail is tagged with AWL, and I am often baffled.  Here are
> what I think are the relevent headers from a perplexing example:
>
>   Return-Path: <dr...@gmail.com>
>   X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
> fnord.ir.bbn.com X-Spam-Status: Yes, score=6.8 required=1.0
> tests=AWL,BAYES_95,DEAR_WINNER, HTML_MESSAGE,SUBJ_ALL_CAPS autolearn=spam
> version=3.2.4
>   X-Spam-Report:
>           *  2.1 SUBJ_ALL_CAPS Subject is all capitals
>           *  3.2 DEAR_WINNER BODY: DEAR_WINNER
>           *  0.0 HTML_MESSAGE BODY: HTML included in message
>           *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
>           *      [score: 0.9582]
>           * -1.5 AWL AWL: From: address is in the auto white-list
>   From: "AUSTRALIAN LOTTERY INTL" <dr...@gmail.com>
>
> Reading http://wiki.apache.org/spamassassin/AwlWrongWay, I realize I am
> confused - this sender has a positive average, and this message was more
> spammy, and thus given credit for somewhat-less-spammy previous mail.
>
> I think that I should be able to infer that because this message was 8.3
> before AWL, and AWL was -1.5, that the average is 5.3.  But if the message
> said
>
>           * -1.5 AWL AWL: From: address is in the auto white-list at 5.3
> for 12 messages
>
I use a little perl script that I got somewhere in 2004 that takes your AWL 
and makes a hashed and plain test version. The entries look like this:

7929be75889dbf08c8efc87d226a1974 2 82.058
iuuzn@msn.com|ip=220.81 2 82.058

Here is the explanation from the script itself:

# The keys of this hash are like
# pamela4701@eudoramail.com|ip=213.41|totscore
# and the values are like
# 8.7472
# test with values(%hash); and keys(%hash);
# every mail address has two entries:
# e.g.
# pamela4701@eudoramail.com|ip=213.41|totscore
# pamela4701@eudoramail.com|ip=213.41
# where totscore is the over-all score (value) and the
# value of the second line is the count
# of mails received from this sender
# write this to a file one entry per line and nice it a little bit
# replace | with ' '
# do it with a hash of hashes, keys are mailaddresses, subkeys are totalscore 
and score
# IMPORTANT: Every time the hash is accessed it returns the value
# key triples in a different order
# (the triples not the keys and values itself of course)
# just in case you are wondering

If this is something like you're looking for I could post it at a download 
site.

-- 
Chris
KeyID 0xE372A7DA98E6705C