You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Joe Casadonte <jc...@northbound-train.com> on 2007/03/25 16:09:54 UTC

Help with per-user sa-learn

I am using per-user Bayes DBs, and I'm not sure what good it's doing
me.  I initiated the DB with good and bad messages, and throw any
false-positives and false-negatives through sa-learn.  I've also taken
to feeding any spam through sa-learn, too, because I thought I
remembered reading that this would help reinforce which messages are
bad (and it would ignore any messages it had already learned from via
auto-learn, which I think is turned on).

So we've been doing this for about a year and I still have quite a
number of false-negatives (i.e. spam that gets through) - over 100 per
day.  Maybe I don't quite understand how it's supposed to work.
Here's an example:

>From pparadise@programmers.com  Mon Jun 12 23:24:55 2006
X-Spam-Status: No, score=1.9 required=5.0 tests=ALL_TRUSTED,BAYES_99,
        DNS_FROM_RFC_ABUSE,HTML_MESSAGE autolearn=no version=3.1.3
Reply-To: "Programmer's Paradise" <pp...@programmers.com>
From: "Programmer's Paradise" <pp...@programmers.com>


Mail from this user still gets through all the time:

>From bouncepub41@mo155.com  Wed Mar 21 14:23:05 2007
X-Spam-Status: No, score=1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_99,
        FROM_EXCESS_QP,HTML_MESSAGE autolearn=no version=3.1.8
Reply-To: pparadise@programmers.com
From: "=?iso-8859-1?Q?Programmer's_Paradise?=" <pp...@programmers.com>


I keep all of the spam/ham I've sent through sa-learn (why I'm not
sure, but I do have it) and I have had at least 52 of these emails,
yet they still get through:

</home/USER/mail> # grep From.*pparadise sa-spam.done | wc -l
52

</home/USER/mail> # grep From.*pparadise sa-ham.done | wc -l
0


I also get plenty of emails with obvious variations of spellings for
viagra and all of the other popular spam drugs, lots of spelling
variations for various body parts and sexual acts, and they still get
through.  I get very few false-positives, probably 1 a month or a
little less, so I'm happy in that regard.


Some details:

OS: FC5 (2.6.17)

SpamAssassin version 3.1.8
  running on Perl version 5.8.8

spamd: run via init.d script


SpamAssassin is invoked from .procmailrc via:
:0fw:
* < 256000
| spamc


sa-learn run nightly as root via cron job:

su USER -s /bin/sh -c 'sa-learn --spam --mbox --showdots ~/mail/sa-spam'


<~> # su USER -s /bin/sh -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0      17393          0  non-token data: nspam
0.000          0        565          0  non-token data: nham
0.000          0     145811          0  non-token data: ntokens
0.000          0 1173869033          0  non-token data: oldest atime
0.000          0 1174829674          0  non-token data: newest atime
0.000          0 1174826915          0  non-token data: last journal sync atime
0.000          0 1174559910          0  non-token data: last expiry atime
0.000          0     691200          0  non-token data: last expire atime delta
0.000          0      76338          0  non-token data: last expire reduction count


Any help in my understanding of what SA is supposed to do, as well as
what I may be doing wrong, is much appreciated.  Thanks!

--
Regards,


joe
Joe Casadonte
jcasadonte@northbound-train.com

Re: Help with per-user sa-learn

Posted by Joe Casadonte <jc...@northbound-train.com>.

On Sun, 25 Mar 2007, Theo Van Dinter wrote:

> Fix the ALL_TRUSTED hit.  I also don't see any network-rule hits, so
> if you're not using those you should be.  Also look at using
> sa-update if you aren't.

Thanks for pointing them out -- I'll look into them immediately.  I
have sa-update running via cron nightly or weekly (I forget which).

> As far as Bayes goes, you're hitting BAYES_99, so that part is
> working fine.

Thanks!

--
Regards,


joe
Joe Casadonte
jcasadonte@northbound-train.com

------------------------------------------------------------------------------
         Llama Fresh Farms => http://www.northbound-train.com
    Ramblings of a Gay Man => http://www.northbound-train.com/ramblings
               Emacs Stuff => http://www.northbound-train.com/emacs.html
          Music CD Trading => http://www.northbound-train.com/cdr.html
------------------------------------------------------------------------------
                       Live Free, that's the message!
------------------------------------------------------------------------------

Re: Help with per-user sa-learn

Posted by Theo Van Dinter <fe...@apache.org>.

On Sun, Mar 25, 2007 at 10:09:54AM -0400, Joe Casadonte wrote:
> X-Spam-Status: No, score=1.9 required=5.0 tests=ALL_TRUSTED,BAYES_99,
>         DNS_FROM_RFC_ABUSE,HTML_MESSAGE autolearn=no version=3.1.3
> X-Spam-Status: No, score=1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_99,
>         FROM_EXCESS_QP,HTML_MESSAGE autolearn=no version=3.1.8

Fix the ALL_TRUSTED hit.  I also don't see any network-rule hits, so if you're
not using those you should be.  Also look at using sa-update if you aren't.

As far as Bayes goes, you're hitting BAYES_99, so that part is working fine.

-- 
Randomly Selected Tagline:
"We are number one.  All others are number two or lower."
         - The Sphinx in Mystery Men