You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by howard chen <ho...@gmail.com> on 2008/07/01 06:51:10 UTC

Spam Rate

Hello,

I am new to SA, I have read through some of the faq and wiki, so far
can't find the average spam rate % detected by SA. I know it is not
the same for everyone, but I want to get the feel of general
statistics (If you don't mind to share)

1. How many Spam detection rate if I am using default 3.2
configuration you would expect?

2. If fine tuned according to the wiki, e.g. running sa-update, more
rules set, how many % you would expect then?

3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?

Thanks...

Re: Spam Rate

Posted by howard chen <ho...@gmail.com>.
Hi

On Tue, Jul 1, 2008 at 3:42 PM, Matus UHLAR - fantomas
<uh...@fantomas.sk> wrote:
> What do you mean "spam rate"? the amount of spam your mailservers will
> receive is quite independent on version of spamassassin.
>


Yes, I would like to know if somewhere can find these info or anyone can share?

Sure it is vary from case to case, but it would give us some
benchmarking data to see if we are under perform too many.


Thanks.

Re: Spam Rate

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 01.07.08 12:51, howard chen wrote:
> I am new to SA, I have read through some of the faq and wiki, so far
> can't find the average spam rate % detected by SA. I know it is not
> the same for everyone, but I want to get the feel of general
> statistics (If you don't mind to share)
> 
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
> 
> 2. If fine tuned according to the wiki, e.g. running sa-update, more
> rules set, how many % you would expect then?
> 
> 3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?

What do you mean "spam rate"? the amount of spam your mailservers will
receive is quite independent on version of spamassassin.

What is most important is the percentage of false positives and false
negatives. Newer versions use to behave better, but proper configuration
(plugins, trust_path) can still make it better.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Boost your system's speed by 500% - DEL C:\WINDOWS\*.*

Re: Spam Rate

Posted by Martin Gregorie <ma...@gregorie.org>.
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
> 
I run the default configuration with language filtering (DE FR GB are
OK_Languages) plus some personally developed rules. Bayes is purely
auto-learn. 

%non-spam varies from day to day, within the range 20% to 40% non-spam.
  
I've been on SA 3.2 since I started collecting statistics, so have no
idea about earlier versions.

Martin



Re: Spam Rate

Posted by jdow <jd...@earthlink.net>.
From: "Matt Kettler" <mk...@verizon.net>
Sent: Tuesday, 2008, July 01 01:55


> Certainly. Using an older release of SA against recent spam will result 
> in significantly lower detection rates. The code really does matter 
> quite a lot to detection rate. Things like tweaks to the HTML parser 
> that deal with spammer obfuscations and improve accuracy are made in the 
> code, not the rules. If you're using an older SA, you're missing out on 
> these tweaks.

Speaking of which I ran across what might perhaps be a new obfuscation,
today.

http://XXX.YYd5carnelian0&#178;5.YYcom/ZZZZ

Remove the YYs and use random letters for XXX. And ZZZZ was more or less
random stuff. The gibberish, "&#178;", translated into "2".

{^_^}   Joanne


Re: Spam Rate

Posted by Matt Kettler <mk...@verizon.net>.
howard chen wrote:
> Hello,
>
> I am new to SA, I have read through some of the faq and wiki, so far
> can't find the average spam rate % detected by SA. I know it is not
> the same for everyone, but I want to get the feel of general
> statistics (If you don't mind to share)
>
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
>   
Depends on your settings, ie: are you using bayes/network tests or not. 
However, anywhere from 92% to 98% of spam should be detected out of the box.

See also: STATISTICS-set*.txt in the rules directory of the tarball for 
your release.

> 2. If fine tuned according to the wiki, e.g. running sa-update, more
> rules set, how many % you would expect then?
>   
Well, depends on how you tune. You can easily make SA have 100% 
detection rate for spam, but your false-positive (FP) rate will also be 
100% :)

That said, a well tuned, well trained, well maintained SA should be able 
to detect 99% of spam with a less than 0.1% FP rate.
> 3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?
>   
Certainly. Using an older release of SA against recent spam will result 
in significantly lower detection rates. The code really does matter 
quite a lot to detection rate. Things like tweaks to the HTML parser 
that deal with spammer obfuscations and improve accuracy are made in the 
code, not the rules. If you're using an older SA, you're missing out on 
these tweaks.

Also, generally speaking, sa-updates aren't made for older release 
families. There's usually a period of overlap when a new family comes 
out where both the current and previous versions get updates pushed, but 
that generally comes to a stop once development shifts full-bore to the 
next release.

At this point anyone using 3.1.x is stuck with rules from October 2007. 
The 3.2.x rules (at the time of this writing) were last updated in June 
16, 2008.

> Thanks...
>
>   


Re: Spam Rate

Posted by Alex Woick <al...@wombaz.de>.
> the same for everyone, but I want to get the feel of general
> statistics (If you don't mind to share)
> 
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?

 > 2. If fine tuned according to the wiki, e.g. running sa-update, more
 > rules set, how many % you would expect then?

Every calculation depends on how many spam you already reject at MTA 
level. If you pass everything on to SA, you have a higher spam:ham ratio 
than if you can reject the trivial spam at MTA level and pass on only 
the difficult-to-detect ones.

I am running a tiny (5 users) Postfix system that rejects many malformed 
delivery attempts as well as unknown sender or recipient. It also does 
greylisting and rbl checking (dsbl, spamhaus). Approximatelys 30% of all 
mail is accepted and given to SA. From 100 mails, I get ~1 false 
negative, that yields 99% accuracy with spam detection. It is many 
months ago I got the last false positive, so I would say 0.01% accuracy 
with false positives.
This configuration is already very well tuned with Bayes-learning, ZMI 
rules for german spam, sought rules, and of course DCC, razor and pyzor. 
sa-update is run once a day.

For the default ruleset I guess an accuracy of perhaps 95-97% accuracy 
and same false positive rate as above.

> 3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?

Older versions yield significant lower accuracy, since the spam 
structure changes every week and the code of SA is modified constantly 
to accomodate this. In many cases, simple rule changes are not 
sufficiant to catch up.

Tschau
Alex