You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeff Ramsey <ra...@tubafor.com> on 2004/12/22 18:56:34 UTC

Bayes is letting too much spam through

	When spam gets through to my inbox, it almost always would have been
marked spam, and then I see BAYES_00. By my count, without the BAYES_00
flag, SA would have blocked the message. Should I turn BAYESIAN filters
off, or am I doing something wrong? I cannot imagine why anyone would
turn BAYESIAN filters on if this is the result, so I must be doing
something wrong.

-- 
Jeff Ramsey
MIS Administrator
Tubafor Mill, Inc.

Re: Bayes is letting too much spam through

Posted by Michael Parker <pa...@pobox.com>.
On Wed, Dec 22, 2004 at 08:26:35PM +0000, Jeff Ramsey wrote:
> > You realize that AWL also serves as a blacklist right?  I guess you
> > could remove it all, but I wouldn't recommend it.  Granted, it is
> > possible that if you're talking about mail that passed as ham, then
> > the address might have a positive AWL score that you don't really
> > want.  If anything here, I would call --add-addr-to-blacklist to give
> > them 100 points in AWL.
> 
> The only problem that I see with doing the '--add-addr-to-blacklist' is
> that it would 'blacklist' my email address as well. spamassassin -R
> --mbox reads the addresses in the header as well as any that are in the
> body. Is there a way that I can add the senders to blacklist, and not
> myself in the process?
> 

Hmmm...first off, obviously I meant --add-to-blacklist, but you knew
that.

I've never had it add my own address to AWL.  I think the docs are a
little liberal in this case.  I just double checked and sure enough it
works the way I suspected.

Michael


Re: Bayes is letting too much spam through

Posted by Jeff Ramsey <ra...@tubafor.com>.
On Thu, 2004-12-23 at 04:17, Michael Parker wrote:
> On Wed, Dec 22, 2004 at 07:47:35PM +0000, Jeff Ramsey wrote:
> > ssh $SERVER "
> >         echo ; echo 'Learning ham...' ; echo ;
> >         sa-learn --ham --showdots --mbox $TMPHDIR ;
> 
> Ok
> 
> >         echo 'Unlearning bad ham...' ; echo ;
> >         sa-learn --ham --forget --showdots --mbox $TMPSDIR ;
> 
> Ok, unless you consider the next line
> 
> >         echo 'Learning spam...' ; echo ;
> >         sa-learn --spam --showdots --mbox $TMPSDIR ;
> 
> Ok, but really if you are gonna do this then you don't need the
> --forget, bayes will do the right thing.

I removed the unlearn line. I was not sure if bayes would do the right
thing.

> 
> >         echo 'Removing spam senders from AWL...' ; echo ;
> >         spamassassin -R --mbox $TMPSDIR"
> 
> You realize that AWL also serves as a blacklist right?  I guess you
> could remove it all, but I wouldn't recommend it.  Granted, it is
> possible that if you're talking about mail that passed as ham, then
> the address might have a positive AWL score that you don't really
> want.  If anything here, I would call --add-addr-to-blacklist to give
> them 100 points in AWL.

The only problem that I see with doing the '--add-addr-to-blacklist' is
that it would 'blacklist' my email address as well. spamassassin -R
--mbox reads the addresses in the header as well as any that are in the
body. Is there a way that I can add the senders to blacklist, and not
myself in the process?


-- 
Jeff Ramsey
MIS Administrator
Tubafor Mill, Inc.

Re: Bayes is letting too much spam through

Posted by Michael Parker <pa...@pobox.com>.
On Wed, Dec 22, 2004 at 07:47:35PM +0000, Jeff Ramsey wrote:
> ssh $SERVER "
>         echo ; echo 'Learning ham...' ; echo ;
>         sa-learn --ham --showdots --mbox $TMPHDIR ;

Ok

>         echo 'Unlearning bad ham...' ; echo ;
>         sa-learn --ham --forget --showdots --mbox $TMPSDIR ;

Ok, unless you consider the next line

>         echo 'Learning spam...' ; echo ;
>         sa-learn --spam --showdots --mbox $TMPSDIR ;

Ok, but really if you are gonna do this then you don't need the
--forget, bayes will do the right thing.

>         echo 'Removing spam senders from AWL...' ; echo ;
>         spamassassin -R --mbox $TMPSDIR"

You realize that AWL also serves as a blacklist right?  I guess you
could remove it all, but I wouldn't recommend it.  Granted, it is
possible that if you're talking about mail that passed as ham, then
the address might have a positive AWL score that you don't really
want.  If anything here, I would call --add-addr-to-blacklist to give
them 100 points in AWL.

Michael

Re: Bayes is letting too much spam through

Posted by Robert Brooks <ro...@hyperlink-interactive.co.uk>.
Jeff Ramsey wrote:
> Per the advice of Loren, I have started my bayes db over. And so far so
> good. SA is working like I wanted it to. I have another question about
> my learnspam script. Here is the script:
> 
> ---------------- START SCRIPT --------------------
> 
> #!/bin/sh
> # learnspam v0.34
> 
> HAMBOX=~/evolution/local/HamLearn/mbox
> SPAMBOX=~/evolution/local/SpamLearn/mbox
> 
> SERVER=ramsejc@baaaaa.tubafor.com
> TMPHDIR=~/tmp/ham
> TMPSDIR=~/tmp/spam
> VERBOSE=1
> 
> 
> echo Synchronizing $HAMBOX and $SPAMBOX to $SERVER
> rsync --partial --progress -z -e ssh $HAMBOX $SERVER:$TMPHDIR
> rsync --partial --progress -z -e ssh $SPAMBOX $SERVER:$TMPSDIR
> 
> ssh $SERVER "
>         echo ; echo 'Learning ham...' ; echo ;
>         sa-learn --ham --showdots --mbox $TMPHDIR ;
>         echo 'Unlearning bad ham...' ; echo ;
>         sa-learn --ham --forget --showdots --mbox $TMPSDIR ;

this will be your problem, you've got --ham and --forget pointing at 
your temp spam directory, I'm betting the --forget argument is ignored 
and all your spam gets learnt as ham.

>         echo 'Learning spam...' ; echo ;
>         sa-learn --spam --showdots --mbox $TMPSDIR ;
>         echo 'Removing spam senders from AWL...' ; echo ;
>         spamassassin -R --mbox $TMPSDIR"
> ----------------------------- END SCRIPT --------------------------
> 
> I run this script via a cron event a couple of times per day, and I move
> ham to the ham mbox and spam to the spam mbox via Novell Evolution.
> 
> Do I have the sa-learn --forget line correct? Do I need it there at all?
> I placed it there because I wanted to make sure that all the junkmail
> not getting marked spam was not only being learned as spam, but
> unlearned as ham, just in case it was auto-learned as ham.
> 


-- 
Robert Brooks,           Network Manager,          Cable & Wireless UK
<ro...@hyperlink-interactive.co.uk> http://hyperlink-interactive.co.uk/
Tel: +44 (0)20 7339 8600                      Fax: +44 (0)20 7339 8601
-  Help Microsoft stamp out piracy.  Give Linux to a friend today!   -

Re: Bayes is letting too much spam through

Posted by Thomas Arend <ml...@arend-whv.info>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am Mittwoch, 22. Dezember 2004 20:47 schrieb Jeff Ramsey:
> Per the advice of Loren, I have started my bayes db over. And so far so
> good. SA is working like I wanted it to. I have another question about
> my learnspam script. Here is the script:
>
[..]
> echo Synchronizing $HAMBOX and $SPAMBOX to $SERVER
> rsync --partial --progress -z -e ssh $HAMBOX $SERVER:$TMPHDIR
> rsync --partial --progress -z -e ssh $SPAMBOX $SERVER:$TMPSDIR
>
> ssh $SERVER "
>         echo ; echo 'Learning ham...' ; echo ;
>         sa-learn --ham --showdots --mbox $TMPHDIR ;
>         echo 'Unlearning bad ham...' ; echo ;
>         sa-learn --ham --forget --showdots --mbox $TMPSDIR ;
>         echo 'Learning spam...' ; echo ;

I have tried to look into the sa-learn script. But I have in the moment no 
clue which option preceeds --ham or --forget. Maybe you are learnĂ­ng all spam 
as  ham?  "sa-learn --forget --mbox" should do.

>         sa-learn --spam --showdots --mbox $TMPSDIR ;

[..]

> Do I have the sa-learn --forget line correct? Do I need it there at all?
> I placed it there because I wanted to make sure that all the junkmail
> not getting marked spam was not only being learned as spam, but
> unlearned as ham, just in case it was auto-learned as ham.

SA keeps track which messages have been learned as what. 

Thomas
- -- 
icq:133073900
aim:tawhv
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFBypKGHe2ZLU3NgHsRAhmTAJwKYX/MEkxgLD8VblSgty1G86QJHACghUSE
Uhk+U6zIO6wypR1bTTmA8bw=
=QCHc
-----END PGP SIGNATURE-----

Re: Bayes is letting too much spam through

Posted by Jeff Ramsey <ra...@tubafor.com>.
Per the advice of Loren, I have started my bayes db over. And so far so
good. SA is working like I wanted it to. I have another question about
my learnspam script. Here is the script:

---------------- START SCRIPT --------------------

#!/bin/sh
# learnspam v0.34

HAMBOX=~/evolution/local/HamLearn/mbox
SPAMBOX=~/evolution/local/SpamLearn/mbox

SERVER=ramsejc@baaaaa.tubafor.com
TMPHDIR=~/tmp/ham
TMPSDIR=~/tmp/spam
VERBOSE=1


echo Synchronizing $HAMBOX and $SPAMBOX to $SERVER
rsync --partial --progress -z -e ssh $HAMBOX $SERVER:$TMPHDIR
rsync --partial --progress -z -e ssh $SPAMBOX $SERVER:$TMPSDIR

ssh $SERVER "
        echo ; echo 'Learning ham...' ; echo ;
        sa-learn --ham --showdots --mbox $TMPHDIR ;
        echo 'Unlearning bad ham...' ; echo ;
        sa-learn --ham --forget --showdots --mbox $TMPSDIR ;
        echo 'Learning spam...' ; echo ;
        sa-learn --spam --showdots --mbox $TMPSDIR ;
        echo 'Removing spam senders from AWL...' ; echo ;
        spamassassin -R --mbox $TMPSDIR"
----------------------------- END SCRIPT --------------------------

I run this script via a cron event a couple of times per day, and I move
ham to the ham mbox and spam to the spam mbox via Novell Evolution.

Do I have the sa-learn --forget line correct? Do I need it there at all?
I placed it there because I wanted to make sure that all the junkmail
not getting marked spam was not only being learned as spam, but
unlearned as ham, just in case it was auto-learned as ham.

-- 
Jeff Ramsey
MIS Administrator
Tubafor Mill, Inc.

Re: Bayes is letting too much spam through

Posted by Loren Wilton <lw...@earthlink.net>.
It isn't clear that you personally are doing something wrong, but it is
clear that something in your setup is wrong.  Bayes_00 says "this is ham,
absolutely guaranteed".  Thus it adds a negative score to offset any minor
infractions that may have been hit from other rules.

Since you say the message is spam, it should have something more like
bayes_99 than bayes_00.  Or if it is a random message bayes doesn't
recognize, it should be around bayes_50, which probably wouldn't even show
up as a rule hit.

So we can conclude that your bayes database is broken.  Probably if getting
bayes_00 on most all spam is common, it is so broken it would be best to
throw it away and start over clean.

However, before doing that, you need to figure out *why* bayes thinks your
spam is ham.  This would typically be the result of serious mis-training,
such as feeding all spam to Bayes and telling it that it is ham.  I would
look at how you are training bayes and figure out what is going wrong, fix
it, *then* blow away your current bayes database.

        Loren