You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Aran <ar...@organicdesign.co.nz> on 2009/04/01 23:15:53 UTC

Best practices and procedures for good results?

Hi, I've been using spamassassin with exim4 for a few months now and after
the first month of building up the baysean rules it began working very well
only letting about 1 spam in 50 through.

We have it set up such that we file our false positives and negatives in
"Spam" and "Not spam" IMAP folders and each day sa-learn runs across them
and processes/deletes them and we run sa-update on a monthly cronjob.

This has worked well for a few months but recently it has gone down hill and
now lets about 80% of spam through :-( the baysean learning accumulates
tokens every day as usual but to no effect. Its learned 6000 spams and 2500
hams with 130K tokens.

I'm just wondering what your advice is on the best practices and procedures
to have in place to ensure that we can build up good filtering results, and
ensure that it remains good over time.


Re: Best practices and procedures for good results?

Posted by Aran <ar...@organicdesign.co.nz>.
No you don't need to say anything :-( our packages were being held back 
due to some dependencies and its fixed now and updated to 3.2.5 and 
seems to be working very well again! Thanks for your help, it's much 
appreciated.

Karsten Bräckelmann wrote:
> On Thu, 2009-04-02 at 13:23 +1300, Aran wrote:
>   
>> SpamAssassin version 3.1.7-deb3
>>     
>                        ^^^^^
>   
>>   running on Perl version 5.10.0
>>
>> Karsten Bräckelmann wrote:
>>     
>>> On Thu, 2009-04-02 at 11:51 +1300, Aran wrote:
>>>   
>>>       
>>>> We use the most recent version and keep it up to date.
>>>>         
>>> That's not a version. I've seen that refer to a years old release...
>>>       
>
> Do I really need to say anything about this?
>
> Anyway, there's no possible reason I can think of, that could drop spam
> detection rate as drastically as down to 20%, which wouldn't also result
> in massive FPs. Well, other than a major configuration screw up or
> flaming all your Perl modules.
>
> That said, running Perl 5.10, yet still sticking to some ancient code as
> SA 3.1.x does indeed sound strange. Did you by chance recently update
> your Perl, and forgot to update all those Perl modules you installed?
>
>
> Once again, not following best-practices does NOT explain it. There's
> something way more serious broken on your system. And without more
> information provided by you, it probably will be impossible to find out
> for us...
>
>
>   


Re: Best practices and procedures for good results?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2009-04-02 at 13:23 +1300, Aran wrote:
> SpamAssassin version 3.1.7-deb3
                       ^^^^^
>   running on Perl version 5.10.0
> 
> Karsten Bräckelmann wrote:
> > On Thu, 2009-04-02 at 11:51 +1300, Aran wrote:
> >   
> > > We use the most recent version and keep it up to date.
> >
> > That's not a version. I've seen that refer to a years old release...

Do I really need to say anything about this?

Anyway, there's no possible reason I can think of, that could drop spam
detection rate as drastically as down to 20%, which wouldn't also result
in massive FPs. Well, other than a major configuration screw up or
flaming all your Perl modules.

That said, running Perl 5.10, yet still sticking to some ancient code as
SA 3.1.x does indeed sound strange. Did you by chance recently update
your Perl, and forgot to update all those Perl modules you installed?


Once again, not following best-practices does NOT explain it. There's
something way more serious broken on your system. And without more
information provided by you, it probably will be impossible to find out
for us...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Best practices and procedures for good results?

Posted by LuKreme <kr...@kreme.com>.
On 1-Apr-2009, at 18:23, Aran wrote:
> SpamAssassin version 3.1.7-deb3


This is not even close to being the latest version.  This is *years*  
old.

Still, like Karsten said, your problem is likely something else.  Have  
you provided a pastebin of any false negatives yet?

-- 
Ah, you're a Penguin too? Pilgrim, my son. Pilgrim. Yes, of the
	Hare Krishnas. Hairy Fishnuts.


Re: Best practices and procedures for good results?

Posted by John Hardin <jh...@impsec.org>.
On Thu, 2 Apr 2009, Aran wrote:

> SpamAssassin version 3.1.7-deb3
>  running on Perl version 5.10.0

You should upgrade. Current is 3.2.5, and older SAs perform more poorly as 
time passes due to the changing nature of spam.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You are in a maze of twisty little protocols,
   all written by Microsoft.
----------------------------------------------------------------------
  Today: April Fools' day

Re: Best practices and procedures for good results?

Posted by Aran <ar...@organicdesign.co.nz>.
SpamAssassin version 3.1.7-deb3
  running on Perl version 5.10.0

Karsten Bräckelmann wrote:
> On Thu, 2009-04-02 at 11:51 +1300, Aran wrote:
>   
>> We use the most recent version and keep it up to date.
>>     
>
> That's not a version. I've seen that refer to a years old release...
>
>
>   
>> You say there's a very long list of practices and that the list is way
>> too long - well where is this list? why is it so hard to configure?
>>     
>
> That list would be in my (well, our collectives ;) head. There's not a
> single way how to integrate SA, how to run SA, and there's no single
> demand. So there's no single "do this to be happy".
>
> More seriously, what I really said is, that failing to follow best
> practices hardly is the reason for 80% of your spam in-stream slipping
> through. Your problem is something else.
>
> Frankly, a vanilla SA install will perform better.
>
> IMHO even a vanilla SA, no best-practices tweaking whatsoever, with 80%
> spam caught indicates a severely broken setup. Let alone all of a sudden
> going down to 20%, with 80% FNs. See why I told you that's not your
> problem?
>
>
>   


Re: Best practices and procedures for good results?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2009-04-02 at 11:51 +1300, Aran wrote:
> We use the most recent version and keep it up to date.

That's not a version. I've seen that refer to a years old release...


> You say there's a very long list of practices and that the list is way
> too long - well where is this list? why is it so hard to configure?

That list would be in my (well, our collectives ;) head. There's not a
single way how to integrate SA, how to run SA, and there's no single
demand. So there's no single "do this to be happy".

More seriously, what I really said is, that failing to follow best
practices hardly is the reason for 80% of your spam in-stream slipping
through. Your problem is something else.

Frankly, a vanilla SA install will perform better.

IMHO even a vanilla SA, no best-practices tweaking whatsoever, with 80%
spam caught indicates a severely broken setup. Let alone all of a sudden
going down to 20%, with 80% FNs. See why I told you that's not your
problem?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Best practices and procedures for good results?

Posted by John Hardin <jh...@impsec.org>.
On Thu, 2 Apr 2009, Aran wrote:

> We use the most recent version and keep it up to date. We only have 5 
> users and all of us care about the spam and file it properly. It was 
> working really well and in the last month has failed worse and worse. I 
> don't see any patterns in the spams that get through, just many 
> different kinds, like viagra, meds and weight loss etc all the usual - 
> and they seem similar to the ones that get filtered...

Can you post a few of the FNs (that you'd expect to have filtered) to a 
pastebin somewhere for us to see?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It's easy to be noble with other people's money.
                                    -- John McKay, _The Welfare State:
                                       No Mercy for the Middle Class_
-----------------------------------------------------------------------
  Today: April Fools' day

Re: Best practices and procedures for good results?

Posted by Aran <ar...@organicdesign.co.nz>.
We use the most recent version and keep it up to date. We only have 5
users and all of us care about the spam and file it properly. It was
working really well and in the last month has failed worse and worse. I
don't see any patterns in the spams that get through, just many
different kinds, like viagra, meds and weight loss etc all the usual -
and they seem similar to the ones that get filtered...

You say there's a very long list of practices and that the list is way
too long - well where is this list? why is it so hard to configure? how
does someone running a server who wishes to use spamassassin know how
best to set it up? I just want to create a procedure so whenever I set
up a new server I can follow the procedure to install spamassassin into
exim4 and know the mail on it will be able to filter spam reasonably
well - I don't need it to work brilliantly, just quite well would be good.

Here's my current procedure, perhaps I need to add a lot to it yet but
what exactly?
http://www.organicdesign.co.nz/Configure_mail_server#Spam_Assassin

Thanks,
Aran

Karsten Bräckelmann wrote:
> On Thu, 2009-04-02 at 10:15 +1300, Aran wrote:
>   
>> Hi, I've been using spamassassin with exim4 for a few months now and after
>> the first month of building up the baysean rules it began working very well
>> only letting about 1 spam in 50 through.
>>     
>
> And your SA version is...?
>
>
>   
>> We have it set up such that we file our false positives and negatives in
>> "Spam" and "Not spam" IMAP folders and each day sa-learn runs across them
>> and processes/deletes them and we run sa-update on a monthly cronjob.
>>     
>
> Do *ALL* your users do that? Are there perhaps some users who just don't
> care, delete missed spam, and you may end up with falsely learned spam?
>
> Also, I'd recommend running sa-update not less than once a week, daily
> is just fine. But that's just a side note and not your problem.
>
>
>   
>> This has worked well for a few months but recently it has gone down hill and
>> now lets about 80% of spam through :-( the baysean learning accumulates
>> tokens every day as usual but to no effect. Its learned 6000 spams and 2500
>> hams with 130K tokens.
>>     
>
> Please keep in mind that Bayes is just one sub-system of SA.
>
> Anyway, 80% of spam slipping through indicates some *REAL*, gross issues
> somewhere. Even a 3+ years old SA 3.1.x without updates and without
> Bayes should perform much, much better than that.
>
> Alas, your post doesn't have any information, about what could possibly
> have gone wrong.
>
>
>   
>> I'm just wondering what your advice is on the best practices and procedures
>> to have in place to ensure that we can build up good filtering results, and
>> ensure that it remains good over time.
>>     
>
> That list would be too long...
>
> Anyway, it isn't your problem. An accuracy problem like THAT is not
> related to missing some best practices. Sounds more like a heavily
> borked install or config.
>
> For example, do you whitelist your own domain?
>
>
> Checking SA headers and rules' hit of spam that slipped through, what do
> you see? Any pattern, anything that sticks out?
>
>
>   



Re: Best practices and procedures for good results?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2009-04-02 at 10:15 +1300, Aran wrote:
> Hi, I've been using spamassassin with exim4 for a few months now and after
> the first month of building up the baysean rules it began working very well
> only letting about 1 spam in 50 through.

And your SA version is...?


> We have it set up such that we file our false positives and negatives in
> "Spam" and "Not spam" IMAP folders and each day sa-learn runs across them
> and processes/deletes them and we run sa-update on a monthly cronjob.

Do *ALL* your users do that? Are there perhaps some users who just don't
care, delete missed spam, and you may end up with falsely learned spam?

Also, I'd recommend running sa-update not less than once a week, daily
is just fine. But that's just a side note and not your problem.


> This has worked well for a few months but recently it has gone down hill and
> now lets about 80% of spam through :-( the baysean learning accumulates
> tokens every day as usual but to no effect. Its learned 6000 spams and 2500
> hams with 130K tokens.

Please keep in mind that Bayes is just one sub-system of SA.

Anyway, 80% of spam slipping through indicates some *REAL*, gross issues
somewhere. Even a 3+ years old SA 3.1.x without updates and without
Bayes should perform much, much better than that.

Alas, your post doesn't have any information, about what could possibly
have gone wrong.


> I'm just wondering what your advice is on the best practices and procedures
> to have in place to ensure that we can build up good filtering results, and
> ensure that it remains good over time.

That list would be too long...

Anyway, it isn't your problem. An accuracy problem like THAT is not
related to missing some best practices. Sounds more like a heavily
borked install or config.

For example, do you whitelist your own domain?


Checking SA headers and rules' hit of spam that slipped through, what do
you see? Any pattern, anything that sticks out?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}