You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by howard chen <ho...@gmail.com> on 2008/07/01 06:51:10 UTC
Spam Rate
Hello,
I am new to SA, I have read through some of the faq and wiki, so far
can't find the average spam rate % detected by SA. I know it is not
the same for everyone, but I want to get the feel of general
statistics (If you don't mind to share)
1. How many Spam detection rate if I am using default 3.2
configuration you would expect?
2. If fine tuned according to the wiki, e.g. running sa-update, more
rules set, how many % you would expect then?
3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?
Thanks...
Re: Spam Rate
Posted by howard chen <ho...@gmail.com>.
Hi
On Tue, Jul 1, 2008 at 3:42 PM, Matus UHLAR - fantomas
<uh...@fantomas.sk> wrote:
> What do you mean "spam rate"? the amount of spam your mailservers will
> receive is quite independent on version of spamassassin.
>
Yes, I would like to know if somewhere can find these info or anyone can share?
Sure it is vary from case to case, but it would give us some
benchmarking data to see if we are under perform too many.
Thanks.
Re: Spam Rate
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 01.07.08 12:51, howard chen wrote:
> I am new to SA, I have read through some of the faq and wiki, so far
> can't find the average spam rate % detected by SA. I know it is not
> the same for everyone, but I want to get the feel of general
> statistics (If you don't mind to share)
>
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
>
> 2. If fine tuned according to the wiki, e.g. running sa-update, more
> rules set, how many % you would expect then?
>
> 3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?
What do you mean "spam rate"? the amount of spam your mailservers will
receive is quite independent on version of spamassassin.
What is most important is the percentage of false positives and false
negatives. Newer versions use to behave better, but proper configuration
(plugins, trust_path) can still make it better.
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Boost your system's speed by 500% - DEL C:\WINDOWS\*.*
Re: Spam Rate
Posted by Martin Gregorie <ma...@gregorie.org>.
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
>
I run the default configuration with language filtering (DE FR GB are
OK_Languages) plus some personally developed rules. Bayes is purely
auto-learn.
%non-spam varies from day to day, within the range 20% to 40% non-spam.
I've been on SA 3.2 since I started collecting statistics, so have no
idea about earlier versions.
Martin
Re: Spam Rate
Posted by jdow <jd...@earthlink.net>.
From: "Matt Kettler" <mk...@verizon.net>
Sent: Tuesday, 2008, July 01 01:55
> Certainly. Using an older release of SA against recent spam will result
> in significantly lower detection rates. The code really does matter
> quite a lot to detection rate. Things like tweaks to the HTML parser
> that deal with spammer obfuscations and improve accuracy are made in the
> code, not the rules. If you're using an older SA, you're missing out on
> these tweaks.
Speaking of which I ran across what might perhaps be a new obfuscation,
today.
http://XXX.YYd5carnelian0²5.YYcom/ZZZZ
Remove the YYs and use random letters for XXX. And ZZZZ was more or less
random stuff. The gibberish, "²", translated into "2".
{^_^} Joanne
Re: Spam Rate
Posted by Matt Kettler <mk...@verizon.net>.
howard chen wrote:
> Hello,
>
> I am new to SA, I have read through some of the faq and wiki, so far
> can't find the average spam rate % detected by SA. I know it is not
> the same for everyone, but I want to get the feel of general
> statistics (If you don't mind to share)
>
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
>
Depends on your settings, ie: are you using bayes/network tests or not.
However, anywhere from 92% to 98% of spam should be detected out of the box.
See also: STATISTICS-set*.txt in the rules directory of the tarball for
your release.
> 2. If fine tuned according to the wiki, e.g. running sa-update, more
> rules set, how many % you would expect then?
>
Well, depends on how you tune. You can easily make SA have 100%
detection rate for spam, but your false-positive (FP) rate will also be
100% :)
That said, a well tuned, well trained, well maintained SA should be able
to detect 99% of spam with a less than 0.1% FP rate.
> 3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?
>
Certainly. Using an older release of SA against recent spam will result
in significantly lower detection rates. The code really does matter
quite a lot to detection rate. Things like tweaks to the HTML parser
that deal with spammer obfuscations and improve accuracy are made in the
code, not the rules. If you're using an older SA, you're missing out on
these tweaks.
Also, generally speaking, sa-updates aren't made for older release
families. There's usually a period of overlap when a new family comes
out where both the current and previous versions get updates pushed, but
that generally comes to a stop once development shifts full-bore to the
next release.
At this point anyone using 3.1.x is stuck with rules from October 2007.
The 3.2.x rules (at the time of this writing) were last updated in June
16, 2008.
> Thanks...
>
>
Re: Spam Rate
Posted by Alex Woick <al...@wombaz.de>.
> the same for everyone, but I want to get the feel of general
> statistics (If you don't mind to share)
>
> 1. How many Spam detection rate if I am using default 3.2
> configuration you would expect?
> 2. If fine tuned according to the wiki, e.g. running sa-update, more
> rules set, how many % you would expect then?
Every calculation depends on how many spam you already reject at MTA
level. If you pass everything on to SA, you have a higher spam:ham ratio
than if you can reject the trivial spam at MTA level and pass on only
the difficult-to-detect ones.
I am running a tiny (5 users) Postfix system that rejects many malformed
delivery attempts as well as unknown sender or recipient. It also does
greylisting and rbl checking (dsbl, spamhaus). Approximatelys 30% of all
mail is accepted and given to SA. From 100 mails, I get ~1 false
negative, that yields 99% accuracy with spam detection. It is many
months ago I got the last false positive, so I would say 0.01% accuracy
with false positives.
This configuration is already very well tuned with Bayes-learning, ZMI
rules for german spam, sought rules, and of course DCC, razor and pyzor.
sa-update is run once a day.
For the default ruleset I guess an accuracy of perhaps 95-97% accuracy
and same false positive rate as above.
> 3. Is the % vary from SA version? e.g. 3.0, 3.1 and 3.2?
Older versions yield significant lower accuracy, since the spam
structure changes every week and the code of SA is modified constantly
to accomodate this. In many cases, simple rule changes are not
sufficiant to catch up.
Tschau
Alex