You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2010/09/16 21:30:54 UTC

Re: Identifying the real problem (was: Re: Blacklist for spam-words)

On Thu, 2010-09-16 at 11:32 -0700, franc wrote:
> > ... Do you train *both*, spam *and* ham? Any chance these
> > have been trained incorrectly before? What Bayes score do they actually
> > get? The X-Spam-Status header would be sufficient to see.
> > 
> > The few lines of 'sa-learn --dump magic' would be good, too. Oh, and you
> > are training Bayes as the same user SA checks the mail for, right?
> 
> Yes, i trained both. By the way, i use spamassassin with amavis. 
> This is my bayes result:

So you trained (manually) as the amavis user, using the system-wide
Bayes DB, right?

> ~# sa-learn --dbpath /var/lib/amavis/.spamassassin/bayes --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0       3270          0  non-token data: nspam
> 0.000          0       8809          0  non-token data: nham
> 0.000          0     120576          0  non-token data: ntokens

You need to train on more spam.

> I know, that just some blacklisted words are really not the solution. So i
> put the threshold of spam lower in amavis conf:
> 
> $sa_tag_level_deflt  = undef;
> $sa_tag2_level_deflt = 6.31;	
> $sa_kill_level_deflt = 15; 		
> $sa_dsn_cutoff_level = 25; 		
> 
> A typical score of a "Uhren"-mail is:
> 
> X-Virus-Scanned: Debian amavisd-new at ew6.org
> X-Amavis-Alert: BAD HEADER, Duplicate header field: "Cc"
> X-Spam-Flag: NO
> X-Spam-Score: 12.989

Err... a SA score of ~13 and status not spam. *sigh*  See, you just
needed to identify your real problem. *THIS* is it.

The SA default spam threshold is 5. Everything exceeding that threshold
is classified spam. Five. So this example would have been caught no
problem by vanilla SA.

The scores of the individual rules have been set with that default
threshold of 5 in mind. Raising it *slightly* is OK, if you want to stay
even more on the FP-safe side. Raising it like the above shows is just
plain wrong. And it is the reason for your problem of not catching this
spam.

> X-Spam-Level: ************
> X-Spam-Status: No, score=12.989 required=15 tests=[BAYES_99=3.5,
> 	DNS_FROM_OPENWHOIS=1.13, HTML_MESSAGE=0.001, PYZOR_CHECK=3.7,
> 	RCVD_IN_PBL=0.905, RCVD_IN_SORBS_HTTP=0.001, RCVD_IN_SORBS_WEB=0.619,
> 	RCVD_IN_XBL=3.033, RDNS_NONE=0.1]

No URI DNSBL hits here, but that does not necessarily indicate an issue.
DNSBL hits, so DNS works for you.

BAYES_99 means, the Bayes sub-system considers it spam with a value of
0.99 or higher -- where 0.0 means ham, 0.5 neutral, and 1.0 being the
highest, pure evil spam. Bayes has sufficiently been trained with this
kind of spam.

This also means, that Bayes obviously considers the words you wanted to
blacklist as spam already -- and results in a partial score of 3.5 (of
5.0 by default, again) for Bayes alone. That's 70% there of being marked
as spam...

> So with "$sa_tag2_level_deflt = 6.31" it is ok. Before i had 15. Above 6.31
> the mails are directly put to the Spam-folder, so with IMAP, the user can
> still look at them.

Not an Amavis user -- isn't 6.31 the amavis default? Why did you raise
the threshold in the first place!? Again, that is (was) your problem.


> Anyway, do you think i need to update to 3.3.x or is 3.2 still OK?

3.2 is less effective than 3.3, but as long as you're still happy with
the results, there is no immediate need to upgrade. Using a sane spam
threshold, mind you. You would have seen pretty much the exact same
"problem" with SA 3.3 and the threshold raised to 15.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Identifying the real problem (was: Re: Blacklist for spam-words)

Posted by Benny Pedersen <me...@junc.org>.
On tor 16 sep 2010 23:19:34 CEST, franc wrote
> OK, i put now till i am sure there is no more FP the threshold on -, 5, 10,
> 15 so between 5 and 10 it is delivered into the spam-folder, and with 10 it
> is bounced.

rejected please, eg dont accept and bouce

-- 
xpoint http://www.unicom.com/pw/reply-to-harmful.html


Re: Identifying the real problem

Posted by mouss <mo...@ml.netoyen.net>.
  Le 17/09/2010 00:34, Karsten Bräckelmann a écrit :
> [snip]
>> I had in amavis-conf:
>>
>> $final_spam_destiny       = D_BOUNCE;
>> $final_banned_destiny     = D_BOUNCE;
>>
>> should be much better like this:
>>
>> $final_spam_destiny       = D_REJECT;
>> $final_banned_destiny     = D_REJECT;
>>
>> It was default with D_BOUNCE so i used this. But you are very right, the
>> bounce is old (according to the Postfixbook from heinlein) and i put reject
>> now. Thanks again!
> Thank you for fixing this. :)  One less backscatter source on the net.
>
>

not sure. if his amavisd runs after mail was queued (for example, if it 
was run as a content_filter in postfix), then D_REJECT will cause _his_ 
MTA to send a bounce, thus the backscatter.

So most probably, he is still a potential outscatter source.
Unless he is using amavisd-new to filter mail during the smtp 
transaction (with the remote/foreign client), which is uncommon, the 
only possible choices are pass, quarantine or discard.


Re: Identifying the real problem (was: Re: Blacklist for spam-words)

Posted by franc <fr...@gmx.net>.
The next thing i just discovered is:

$final_bad_header_destiny = D_PASS; 

with this rule, each Subject, containing 8-Bit, is sent to the quarantine
folder.
I didn't know this and now i am discovering many emails in the quarantine
which were no spam at all :-)

I commented it out:

# $final_bad_header_destiny = D_PASS; 

and i think now the bad-header-mails are sent to the postbox and not to the
orkus. i hope.


-- 
View this message in context: http://old.nabble.com/Blacklist-for-spam-words-tp29726548p29733698.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Identifying the real problem (was: Re: Blacklist for spam-words)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2010-09-16 at 15:10 -0700, franc wrote:
> > I seriously hope you just mis-worded that. Bounce!? That would be after
> > *accepting* a message, and with spam generally will be bounced to a
> > forged, innocent bystander -- not the spammer. So please, tell me you
> > actually meant to say REJECT. That is, not accept by the MX.
> 
> No, i didn't know it better, i had D_BOUNCE indeed!

Well, I don't really know Amavis, so I don't know what this does
precisely, but in general...

Bounce, also known as backscatter in the context of spam -- just in case
you need more search terms. ;)

The important difference is, that REJECTing on the MX (the outside, evil
network facing SMTP) will just not ACCEPT the message. Once you accepted
a message, you take responsibility for it. You are free to review that
crap, or even route it straight to the bin bucket. It's yours, and the
ball is on your side. However, bouncing it "back" to some address you
cannot possibly know is the real sender...


> I had in amavis-conf:
> 
> $final_spam_destiny       = D_BOUNCE;
> $final_banned_destiny     = D_BOUNCE;
> 
> should be much better like this:
> 
> $final_spam_destiny       = D_REJECT;
> $final_banned_destiny     = D_REJECT;
> 
> It was default with D_BOUNCE so i used this. But you are very right, the
> bounce is old (according to the Postfixbook from heinlein) and i put reject
> now. Thanks again!

Thank you for fixing this. :)  One less backscatter source on the net.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Identifying the real problem (was: Re: Blacklist for spam-words)

Posted by franc <fr...@gmx.net>.
> I seriously hope you just mis-worded that. Bounce!? That would be after
> *accepting* a message, and with spam generally will be bounced to a
> forged, innocent bystander -- not the spammer. So please, tell me you
> actually meant to say REJECT. That is, not accept by the MX.

No, i didn't know it better, i had D_BOUNCE indeed!

I had in amavis-conf:

$final_spam_destiny       = D_BOUNCE;
$final_banned_destiny     = D_BOUNCE;

should be much better like this:

$final_spam_destiny       = D_REJECT;
$final_banned_destiny     = D_REJECT;

It was default with D_BOUNCE so i used this. But you are very right, the
bounce is old (according to the Postfixbook from heinlein) and i put reject
now. Thanks again!

-- 
View this message in context: http://old.nabble.com/Blacklist-for-spam-words-tp29726548p29733474.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Identifying the real problem (was: Re: Blacklist for spam-words)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2010-09-16 at 14:19 -0700, franc wrote:
> OK, i put now till i am sure there is no more FP the threshold on -, 5, 10,
> 15 so between 5 and 10 it is delivered into the spam-folder, and with 10 it
> is bounced.
> 
> I think after a while i will know if i can put 2,5,6.31,10 or something like
> this.

Well, I would (and actually do on a couple systems still happily running
3.2) use the default threshold of 5.

For classifying as spam, just as you do, and delivery into a dedicated
spam folder for users to review the stuff. And rescue FPs -- though
honestly, the only one I've seen in years is the occasional PayPal
general terms and conditions update.

FWIW, a threshold of 2 would be too low, and will result in FPs.

I guess I would be too paranoid to reject on a threshold of 10. I used
to think 15, but recently tend to lean towards 12 as the cut-off.
Anyway... ;)

I seriously hope you just mis-worded that. Bounce!? That would be after
*accepting* a message, and with spam generally will be bounced to a
forged, innocent bystander -- not the spammer. So please, tell me you
actually meant to say REJECT. That is, not accept by the MX.


> Thank you for the hints!

NP. And just for next time, if you're having issues with some particular
software, try to explain the issue. After figuring out the root cause,
the collective audience most likely can tell you what to do.

Asking how to do $something, which does not directly tackle your issue,
usually will only serve as a band-aid. Not a fix.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Identifying the real problem (was: Re: Blacklist for spam-words)

Posted by franc <fr...@gmx.net>.
OK, i put now till i am sure there is no more FP the threshold on -, 5, 10,
15 so between 5 and 10 it is delivered into the spam-folder, and with 10 it
is bounced.

I think after a while i will know if i can put 2,5,6.31,10 or something like
this.

Thank you for the hints!
-- 
View this message in context: http://old.nabble.com/Blacklist-for-spam-words-tp29726548p29733116.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.