You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matthew Lenz <ma...@nocturnal.org> on 2005/03/30 23:27:34 UTC

my girlfriend is getting ticked :)

my girlfriend has been bitching at me for quite some time now to figure
out why spamassassin isn't catching the spam like it used to.  I'm using
3.0.2 on a debian woody box.  Its from www.backports.org (great site).
Here is an example of the X-Virus/Spam headers from a spam that was
caught:

..............


X-Virus-Status: No
X-Virus-Checker-Version: clamassassin 1.2.2 with clamscan / ClamAV
0.83/795/Wed
Mar 30 03:58:09 2005
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on
        server.nocturnal.org
X-Spam-Report:
        *  2.9 UNRESOLVED_TEMPLATE Headers contain an unresolved
template
        *  4.1 MIME_BOUND_DD_DIGITS Spam tool pattern in MIME boundary
        *  4.2 X_MESSAGE_INFO Bulk email fingerprint (X-Message-Info)
found
        *  0.1 MPART_ALT_DIFF BODY: HTML and text parts are different
        *  3.2 DOMAIN_RATIO BODY: Message body mentions many internet
domains
        *  1.9 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
        *      [score: 1.0000]
        *  0.2 MIME_HTML_ONLY BODY: Message only has text/html MIME
parts
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.0 HTML_90_100 BODY: Message is 90% to 100% HTML
        *  3.3 HTML_IMAGE_ONLY_04 BODY: HTML: images with 0-400 bytes of
words
        *  2.4 MIME_HTML_ONLY_MULTI Multipart message only has text/html
MIME parts
X-Spam-Status: Yes, score=22.3 required=5.0 tests=BAYES_99,DOMAIN_RATIO,

HTML_90_100,HTML_IMAGE_ONLY_04,HTML_MESSAGE,MIME_BOUND_DD_DIGITS,
        MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MPART_ALT_DIFF,
        UNRESOLVED_TEMPLATE,X_MESSAGE_INFO autolearn=unavailable
version=3.0.2
X-Spam-Level: **********************

............

here is an example of the headers from an spam that wasn't caught

............

X-Virus-Status: No
X-Virus-Checker-Version: clamassassin 1.2.2 with clamscan / ClamAV
0.83/795/Wed
Mar 30 03:58:09 2005
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on
        server.nocturnal.org
X-Spam-Status: No, score=4.1 required=5.0 tests=BAYES_99,HTML_80_90,
        HTML_FONT_BIG,HTML_MESSAGE,HTML_TITLE_EMPTY,MIME_HTML_ONLY,
        MSGID_FROM_MTA_ID autolearn=no version=3.0.2
X-Spam-Level: ****

............

I'm only posting these to make sure its not obvious to the more
spamassassin learned than myself. :)

.spamassassin/user_prefs is empty (nothing uncommented rather)
There is also auto-whitelist (1.2M), bayes_journal (16k), bayes_seen
(332k), bayes_toks (5.0M).

her .forward contains:

"|IFS=' ' && exec /usr/bin/procmail -f- || exit 75 #HERUSERNAMEHERE"

her .procmailrc contains:

SHELL=/bin/sh
PATH=/bin:/usr/bin
PMDIR=$HOME/.procmail
LOGABSTRACT=all
MAILDIR=$HOME/mail
LOGFILE=$PMDIR/proclog
VERBOSE=off

:0fw
| /usr/local/bin/clamassassin

:0:
* ^X-Virus-Status: Yes
/dev/null

#Spamassassin start
:0fw: spamassassin.lock
| /usr/bin/spamc

:0:
* ^X-Spam-Status: Yes
Spam/Inbox
#Spamassassin end

here is the spread on how spamassassin is seeing most of the missed
spams (their scores) out of ~5000 missed spams since early February.

nil = 36
* = 331
** = 2008
*** = 1306
**** = 1227

spamassassin has caught ~4000 so its getting less than half.

Ideas where to start (other than having her change her email address
hehe)

Thanks in advance.


Re: my girlfriend is getting ticked :)

Posted by Steven Dickenson <st...@mrchuckles.net>.
Matthew Lenz wrote:

> X-Spam-Status: No, score=4.1 required=5.0 tests=BAYES_99,HTML_80_90,
>         HTML_FONT_BIG,HTML_MESSAGE,HTML_TITLE_EMPTY,MIME_HTML_ONLY,
>         MSGID_FROM_MTA_ID autolearn=no version=3.0.2

I see your false negative scored 99% on bayes.  The BAYES_99 rule has a 
much lower score in v3 than it did in v2.  My users started bitching 
after the upgrade the 3 because all the sudden spam was starting to get 
through.  Tweaking up the bayes scores a bit helped significantly.

Steven

Re: my girlfriend is getting ticked :)

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Wednesday, March 30, 2005 3:27 PM -0600 Matthew Lenz 
<ma...@nocturnal.org> wrote:

> here is an example of the headers from an spam that wasn't caught

Attach the whole message with headers to a list post.



Re: my girlfriend is getting ticked :)

Posted by Morris Jones <mo...@whiteoaks.com>.
Run an email through spamassassin with the -D debug flag and it will 
tell you evvvverything.

Mojo

Matthew Lenz wrote:
> I just installed backports perl-libnet-dns (.48, hope that is new
> enough .49 is the newest).  Is there anywhere I can check to see if
> 'network tests' (what the SURBL says needs to be enabled) are enabled?
> 
> On Wed, 2005-03-30 at 14:15 -0800, Morris Jones wrote:
> 
>>Matthew Lenz wrote:
>>
>>>my girlfriend has been bitching at me for quite some time now to figure
>>>out why spamassassin isn't catching the spam like it used to.  I'm using
>>>3.0.2 on a debian woody box.  Its from www.backports.org (great site).
>>>Here is an example of the X-Virus/Spam headers from a spam that was
>>>caught:
>>
>>Your bayes database looked to be reasonably trained.  The false-negative 
>>was labeled 99% spam by Bayes.
>>
>>I don't see any RBL checks, which might have made the difference on this 
>>one, if it's already been seen and flagged.  Do you have Net::DNS 
>>installed and the RLB tests enabled?  What happens if you feed it 
>>through spamassassin with the -D flag?
>>
>>Cheers,
>>Mojo


-- 
Morris Jones
Monrovia, CA
http://www.whiteoaks.com
Old Town Astronomers: http://www.otastro.org

Re: my girlfriend is getting ticked :)

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, March 30, 2005, 2:21:01 PM, Matthew Lenz wrote:
> I just installed backports perl-libnet-dns (.48, hope that is new
> enough .49 is the newest).  Is there anywhere I can check to see if
> 'network tests' (what the SURBL says needs to be enabled) are enabled?

Set your trust path correctly:

(quoteing Matt Kettler:)
> Please see the Wiki:
> http://wiki.apache.org/spamassassin/TrustPath/
> 
> and look up trusted_networks in man Mail::SpamAssassin::Conf

And enable network tests:

  http://www.surbl.org/faq.html#nettest

And things should work much better.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: my girlfriend is getting ticked :)

Posted by Matthew Lenz <ma...@nocturnal.org>.
I just installed backports perl-libnet-dns (.48, hope that is new
enough .49 is the newest).  Is there anywhere I can check to see if
'network tests' (what the SURBL says needs to be enabled) are enabled?

On Wed, 2005-03-30 at 14:15 -0800, Morris Jones wrote:
> Matthew Lenz wrote:
> > my girlfriend has been bitching at me for quite some time now to figure
> > out why spamassassin isn't catching the spam like it used to.  I'm using
> > 3.0.2 on a debian woody box.  Its from www.backports.org (great site).
> > Here is an example of the X-Virus/Spam headers from a spam that was
> > caught:
> 
> Your bayes database looked to be reasonably trained.  The false-negative 
> was labeled 99% spam by Bayes.
> 
> I don't see any RBL checks, which might have made the difference on this 
> one, if it's already been seen and flagged.  Do you have Net::DNS 
> installed and the RLB tests enabled?  What happens if you feed it 
> through spamassassin with the -D flag?
> 
> Cheers,
> Mojo


Re: my girlfriend is getting ticked :)

Posted by Matthew Lenz <ma...@nocturnal.org>.
On Wed, 2005-03-30 at 14:28 -0800, Morris Jones wrote:
> Mike Jackson wrote:
> > In my experience, it's more efficient to let the MTA handle the RBL 
> > checks instead of Spamassassin. I can't remember what MTA the OP was 
> > using, but it's trivial to set them up in Sendmail. On my employer's 
> > boxes, I use the spamhaus.org lists, but on my personal box (where I can 
> > be much more aggressive) I use a few of the rfc-ignorant.org lists and 
> > ws.surbl.org. The spamhaus lists are checked first, and they're highly 
> > effective.
> 
> Well of course this is true, but this opens the whole debate about
> scoring RBL checks.
> 
> Doing it in the MTA gives you a true or false result.
> 
> Doing it in Spamassassin applies the principals of fuzzy logic by
> assigning a score to the different black lists that should more
> precisely reflect the accuracy of the list.
> 
> If your black lists never have any false positives, then you're good to
> go.  :)
> 
> Mojo

I can vouch for that not being the case with spamhaus.  They blocked our
entire subnet (company I work for) with our provider just because one
customer (on another subnet) got hacked and was being used to send spam.
It took a LONG time to unlisted even though the problem was fixed as
soon as it was first noticed.


Re: my girlfriend is getting ticked :)

Posted by Morris Jones <mo...@whiteoaks.com>.
Mike Jackson wrote:
> In my experience, it's more efficient to let the MTA handle the RBL 
> checks instead of Spamassassin. I can't remember what MTA the OP was 
> using, but it's trivial to set them up in Sendmail. On my employer's 
> boxes, I use the spamhaus.org lists, but on my personal box (where I can 
> be much more aggressive) I use a few of the rfc-ignorant.org lists and 
> ws.surbl.org. The spamhaus lists are checked first, and they're highly 
> effective.

Well of course this is true, but this opens the whole debate about
scoring RBL checks.

Doing it in the MTA gives you a true or false result.

Doing it in Spamassassin applies the principals of fuzzy logic by
assigning a score to the different black lists that should more
precisely reflect the accuracy of the list.

If your black lists never have any false positives, then you're good to
go.  :)

Mojo
-- 
Morris Jones
Monrovia, CA
http://www.whiteoaks.com
Old Town Astronomers: http://www.otastro.org


Re: my girlfriend is getting ticked :)

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, March 30, 2005, 2:20:17 PM, Mike Jackson wrote:
>> Your bayes database looked to be reasonably trained.  The false-negative 
>> was labeled 99% spam by Bayes.
>>
>> I don't see any RBL checks, which might have made the difference on this 
>> one, if it's already been seen and flagged.  Do you have Net::DNS 
>> installed and the RLB tests enabled?  What happens if you feed it through 
>> spamassassin with the -D flag?

> In my experience, it's more efficient to let the MTA handle the RBL checks 
> instead of Spamassassin. I can't remember what MTA the OP was using, but 
> it's trivial to set them up in Sendmail. On my employer's boxes, I use the 
> spamhaus.org lists, but on my personal box (where I can be much more 
> aggressive)

I use sbl.spamhaus.org and list.dsbl.org on most of the MTAs I
have visibility on.

> I use a few of the rfc-ignorant.org lists and ws.surbl.org. The 
> spamhaus lists are checked first, and they're highly effective. 

Hmmmm, ws.surbl.org shouldn't be used as a regular RBL.  It has
very few IP addresses, and most of those are probably web
servers.  So it won't match most of the IP address RBL checks a
plain old MTA would do.  SURBLs are meant to match message body
URIs, not mail senders.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: my girlfriend is getting ticked :)

Posted by AltGrendel <al...@exit0.us>.
Matthew Lenz wrote:

> ----- Original Message ----- From: "AltGrendel"
> To: <us...@spamassassin.apache.org>
> Sent: Wednesday, March 30, 2005 8:50 PM
> Subject: Re: my girlfriend is getting ticked :)
>
>
>> Mike Jackson wrote:
>>
>>>> Your bayes database looked to be reasonably trained.  The 
>>>> false-negative was labeled 99% spam by Bayes.
>>>>
>>>> I don't see any RBL checks, which might have made the difference on 
>>>> this one, if it's already been seen and flagged.  Do you have 
>>>> Net::DNS installed and the RLB tests enabled?  What happens if you 
>>>> feed it through spamassassin with the -D flag?
>>>
>>>
>>>
>>> In my experience, it's more efficient to let the MTA handle the RBL 
>>> checks instead of Spamassassin. I can't remember what MTA the OP was 
>>> using, but it's trivial to set them up in Sendmail. On my employer's 
>>> boxes, I use the spamhaus.org lists, but on my personal box (where I 
>>> can be much more aggressive) I use a few of the rfc-ignorant.org 
>>> lists and ws.surbl.org. The spamhaus lists are checked first, and 
>>> they're highly effective.
>>
>>
>> Agreed, I setup my postfix to do the checks and it's made a world of 
>> difference. The OP never said what OS/MTA is being used.
>>
>
> actually i did in my first post
>
> "I'm using 3.0.2 on a debian woody box.  Its from www.backports.org 
> (great site)"
>
Ok, so you're using Spamassassin 3.0.2 on Debian. Are you using 
Sendmail, qmail, courier, or postfix? I honestly don't know that Debian 
uses as a default mailserver.

Re: my girlfriend is getting ticked :)

Posted by Matthew Lenz <ma...@nocturnal.org>.
----- Original Message ----- 
From: "AltGrendel"
To: <us...@spamassassin.apache.org>
Sent: Wednesday, March 30, 2005 8:50 PM
Subject: Re: my girlfriend is getting ticked :)


> Mike Jackson wrote:
>
>>> Your bayes database looked to be reasonably trained.  The false-negative 
>>> was labeled 99% spam by Bayes.
>>>
>>> I don't see any RBL checks, which might have made the difference on this 
>>> one, if it's already been seen and flagged.  Do you have Net::DNS 
>>> installed and the RLB tests enabled?  What happens if you feed it 
>>> through spamassassin with the -D flag?
>>
>>
>> In my experience, it's more efficient to let the MTA handle the RBL 
>> checks instead of Spamassassin. I can't remember what MTA the OP was 
>> using, but it's trivial to set them up in Sendmail. On my employer's 
>> boxes, I use the spamhaus.org lists, but on my personal box (where I can 
>> be much more aggressive) I use a few of the rfc-ignorant.org lists and 
>> ws.surbl.org. The spamhaus lists are checked first, and they're highly 
>> effective.
>
> Agreed, I setup my postfix to do the checks and it's made a world of 
> difference. The OP never said what OS/MTA is being used.
>

actually i did in my first post

"I'm using 3.0.2 on a debian woody box.  Its from www.backports.org (great 
site)"


Re: my girlfriend is getting ticked :)

Posted by AltGrendel <al...@exit0.us>.
Mike Jackson wrote:

>> Your bayes database looked to be reasonably trained.  The 
>> false-negative was labeled 99% spam by Bayes.
>>
>> I don't see any RBL checks, which might have made the difference on 
>> this one, if it's already been seen and flagged.  Do you have 
>> Net::DNS installed and the RLB tests enabled?  What happens if you 
>> feed it through spamassassin with the -D flag?
>
>
> In my experience, it's more efficient to let the MTA handle the RBL 
> checks instead of Spamassassin. I can't remember what MTA the OP was 
> using, but it's trivial to set them up in Sendmail. On my employer's 
> boxes, I use the spamhaus.org lists, but on my personal box (where I 
> can be much more aggressive) I use a few of the rfc-ignorant.org lists 
> and ws.surbl.org. The spamhaus lists are checked first, and they're 
> highly effective.

Agreed, I setup my postfix to do the checks and it's made a world of 
difference. The OP never said what OS/MTA is being used.

Re: my girlfriend is getting ticked :)

Posted by Mike Jackson <mj...@barking-dog.net>.
> Your bayes database looked to be reasonably trained.  The false-negative 
> was labeled 99% spam by Bayes.
>
> I don't see any RBL checks, which might have made the difference on this 
> one, if it's already been seen and flagged.  Do you have Net::DNS 
> installed and the RLB tests enabled?  What happens if you feed it through 
> spamassassin with the -D flag?

In my experience, it's more efficient to let the MTA handle the RBL checks 
instead of Spamassassin. I can't remember what MTA the OP was using, but 
it's trivial to set them up in Sendmail. On my employer's boxes, I use the 
spamhaus.org lists, but on my personal box (where I can be much more 
aggressive) I use a few of the rfc-ignorant.org lists and ws.surbl.org. The 
spamhaus lists are checked first, and they're highly effective. 


Re: my girlfriend is getting ticked :)

Posted by Morris Jones <mo...@whiteoaks.com>.
Matthew Lenz wrote:
> my girlfriend has been bitching at me for quite some time now to figure
> out why spamassassin isn't catching the spam like it used to.  I'm using
> 3.0.2 on a debian woody box.  Its from www.backports.org (great site).
> Here is an example of the X-Virus/Spam headers from a spam that was
> caught:

Your bayes database looked to be reasonably trained.  The false-negative 
was labeled 99% spam by Bayes.

I don't see any RBL checks, which might have made the difference on this 
one, if it's already been seen and flagged.  Do you have Net::DNS 
installed and the RLB tests enabled?  What happens if you feed it 
through spamassassin with the -D flag?

Cheers,
Mojo
-- 
Morris Jones
Monrovia, CA
http://www.whiteoaks.com
Old Town Astronomers: http://www.otastro.org

Re: my girlfriend is getting ticked :)

Posted by Matthew Lenz <ma...@nocturnal.org>.
On Wed, 2005-03-30 at 16:45 -0500, Tim Donahue wrote:
> On Wed, 2005-03-30 at 15:27 -0600, Matthew Lenz wrote:
> [snip spam info]
> > Ideas where to start (other than having her change her email address
> > hehe)
> 
> It doesn't look like you are using any of the SARE rulesets.  There are
> 3 things I would do to start off... First, assuming that the 5000
> messages that you classified as spam have been verified to actually be
> spam, I would run them through sa_learn so that bayes can learn from its
> mistakes.

She gets this volume of spam regularly and I have run all her good (--
ham), spam (--spam, to teach it what its doing correctly) and missed
spam (--spam) at least a dozen times on new data.  It doesn't help, in
fact its gotten worse.

> Second, if you haven't done so already, I would decrease the score that
> is assigned to the rule ALL_TRUSTED.  My false negatives were helped
> greatly from this.

I've never messed with any of the settings before.  Is this something I
have to do in the /etc/spamassassin/local.cf ?

> Finally, I would look at using some of the SARE rulesets and SURBL.  You
> may want to take a look at the Other Rules page as well, several
> rulesets on that page (especially backhair.cf, chickenpox.cf, weeds.cf,
> and 99_FVGT_Tripwire.cf) do a wonderful job at pushing those spams that
> are on the border over to the tagged/deleted side.

This is my current /etc/spamassassin/init.pre:

loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
loadplugin Mail::SpamAssassin::Plugin::Hashcash
loadplugin Mail::SpamAssassin::Plugin::SPF

If they help, why aren't they included with spamassassin by default?  SA
used to work awesome for me until I upgraded to the 3.x series.  Have
spammers just figured out how to get around everything since then? :)

I only used my GF's email as an example.  I'll get spam from the same
sender with similar subject lines quite often.  10 of them will match as
spam and one seemingly very similar spam will not get caught.

> Tim Donahue


Re: my girlfriend is getting ticked :)

Posted by Tim Donahue <td...@haynes-group.com>.
On Wed, 2005-03-30 at 15:27 -0600, Matthew Lenz wrote:
[snip spam info]
> Ideas where to start (other than having her change her email address
> hehe)

It doesn't look like you are using any of the SARE rulesets.  There are
3 things I would do to start off... First, assuming that the 5000
messages that you classified as spam have been verified to actually be
spam, I would run them through sa_learn so that bayes can learn from its
mistakes.

Second, if you haven't done so already, I would decrease the score that
is assigned to the rule ALL_TRUSTED.  My false negatives were helped
greatly from this.

Finally, I would look at using some of the SARE rulesets and SURBL.  You
may want to take a look at the Other Rules page as well, several
rulesets on that page (especially backhair.cf, chickenpox.cf, weeds.cf,
and 99_FVGT_Tripwire.cf) do a wonderful job at pushing those spams that
are on the border over to the tagged/deleted side.

Tim Donahue

Re: my girlfriend is getting ticked :)

Posted by Nels Lindquist <nl...@maei.ca>.
On 30 Mar 2005 at 15:27, Matthew Lenz wrote:

<snip>
 
> here is an example of the headers from an spam that wasn't caught

> X-Spam-Status: No, score=4.1 required=5.0 tests=BAYES_99,HTML_80_90,
>         HTML_FONT_BIG,HTML_MESSAGE,HTML_TITLE_EMPTY,MIME_HTML_ONLY,
>         MSGID_FROM_MTA_ID autolearn=no version=3.0.2

> Ideas where to start (other than having her change her email address
> hehe)

The first thing I did upon installing SA 3.x and running it for a few 
days was to restore some sanity to the BAYES_* scores.

The GA has a tendency to tune down the scores assigned for extreme 
bayes results because they tend to cluster with other positive tests 
(like SURBLs).

That has the unfortunate side effect that when a message comes 
through which for whatever reason fails to trigger much besides 
BAYES_99 (as your example false-positive did), then the assigned 
score will be lower than it should be if you trust bayes, which you 
should be *more* inclined to do for the extreme cases than not.

The default 3.x scores are as follows:

score BAYES_00 0 0 -1.665 -2.599
score BAYES_05 0 0 -0.925 -0.413
score BAYES_20 0 0 -0.730 -1.951
score BAYES_40 0 0 -0.276 -1.096
score BAYES_50 0 0 1.567 0.001
score BAYES_60 0 0 3.515 0.372
score BAYES_80 0 0 3.608 2.087
score BAYES_95 0 0 3.514 2.063
score BAYES_99 0 0 4.070 1.886

Notice that for the fourth column (bayes + network tests enabled) 
BAYES_99 actually scores *lower* than BAYES_80!

I've added the following lines into my local spamassassin 
configuration, based on the scores from SA 2.6x and my own 
experience:

score        BAYES_00 0 0 -4.901 -4.900
score        BAYES_05 0 0 -0.925 -2.599
score        BAYES_20 0 0 -0.730 -1.951
score        BAYES_40 0 0 -0.276 -1.096
score        BAYES_50 0 0 1.567 0.001
score        BAYES_60 0 0 3.515 1.592
score        BAYES_80 0 0 3.608 2.087
score        BAYES_95 0 0 3.514 3.514
score        BAYES_99 0 0 4.070 5.400

Making this single change would have caught your sample false 
positive based solely on the BAYES_99 result.

----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.