You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jaka Jančar <ja...@kubje.org> on 2004/07/12 18:47:34 UTC
Why Razor gets only +1.6
Hi,
how come that mails listed in Razor get only +1.6 points? Is razor not
to be trusted? I'd think, that if the hash is listed, it's 100% spam,
and as that it should get like +100 :/
Another thing: How well does language recognition in SA work? I'd like
to give mail that's written in Slovene, say, -20 points, since it's
almost never spam.
Thank you,
Jaka Jancar
Re: Why Razor gets only +1.6
Posted by Jaka Jančar <ja...@kubje.org>.
Thank you very much for your reply!
Actually I am using 2.62, and this is what I get in a typical spam/virus
message:
* 1.6 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence between 51
and 100
* [cf: 100]
* 0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
Basically, would I be doing a huge mistake if a raised the 51-100% point
to, say 4?
Kelson Vibber wrote:
> At 09:47 AM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>
>> how come that mails listed in Razor get only +1.6 points? Is razor not
>> to be trusted? I'd think, that if the hash is listed, it's 100% spam,
>> and as that it should get like +100 :/
>
>
> I assume you're talking about SA 3.0, since 1.6 doesn't match any of the
> scores in my SA 2.63 install, but assuming they haven't changed the
> *setup*, there are actually three Razor rules that, when combined, can
> score anywhere from 0.9 to 2.4.
>
> There's one rule, RAZOR2_CHECK, that just looks at whether Razor
> considers the message as spam. Then there are two other rules,
> RAZOR2_CF_RANGE_11_50 and RAZOR2_CF_RANGE_51_100, that check the
> confidence level Razor assigns the message. So if Razor thinks there's
> a 90% chance the message is spam, it gets a higher score than if Razor
> thinks there's a 20% chance.
>
> As for Razor's trustworthiness, there have been some issues in the past
> with false positives, but they're generally rare. There was a brief
> spike last week with the release of the new razor program, and someone
> keeps reporting Mandrake Linux security advisories, but for the most
> part I've found it quite effective.
>
> As for how the scores are chosen, check out
> http://wiki.apache.org/spamassassin/HowScoresAreAssigned
>
> Kelson Vibber
> SpeedGate Communications <www.speed.net>
>
>
>
Re: Why Razor gets only +1.6
Posted by Kelson Vibber <ke...@speed.net>.
At 09:47 AM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>how come that mails listed in Razor get only +1.6 points? Is razor not to
>be trusted? I'd think, that if the hash is listed, it's 100% spam, and as
>that it should get like +100 :/
I assume you're talking about SA 3.0, since 1.6 doesn't match any of the
scores in my SA 2.63 install, but assuming they haven't changed the
*setup*, there are actually three Razor rules that, when combined, can
score anywhere from 0.9 to 2.4.
There's one rule, RAZOR2_CHECK, that just looks at whether Razor considers
the message as spam. Then there are two other rules, RAZOR2_CF_RANGE_11_50
and RAZOR2_CF_RANGE_51_100, that check the confidence level Razor assigns
the message. So if Razor thinks there's a 90% chance the message is spam,
it gets a higher score than if Razor thinks there's a 20% chance.
As for Razor's trustworthiness, there have been some issues in the past
with false positives, but they're generally rare. There was a brief spike
last week with the release of the new razor program, and someone keeps
reporting Mandrake Linux security advisories, but for the most part I've
found it quite effective.
As for how the scores are chosen, check out
http://wiki.apache.org/spamassassin/HowScoresAreAssigned
Kelson Vibber
SpeedGate Communications <www.speed.net>
Re: Why Razor gets only +1.6
Posted by Jaka Jančar <ja...@kubje.org>.
>> Another thing: How well does language recognition in SA work? I'd like
>> to give mail that's written in Slovene, say, -20 points, since it's
>> almost never spam.
>
>
> It's done using a tripplets dictionary IIRC, however I don't think
> there's any support for any kind of "white" language rules.
>
> You could theoretically hack the rule and make the
> UNDESIRED_LANGUAGE_BODY rule into a DESIRED_LANGUAGE_BODY rule instead,
> by negating it and reversing the score. However, you'd have to eliminate
> the original rule to do it (otherwise everything that isn't Slovene
> would get positive points)
>
Could you give me an example of what to modify, if it isn't too much
work.... :)
Re: Why Razor gets only +1.6
Posted by Matt Kettler <mk...@evi-inc.com>.
At 12:47 PM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>how come that mails listed in Razor get only +1.6 points? Is razor not to
>be trusted? I'd think, that if the hash is listed, it's 100% spam, and as
>that it should get like +100 :/
If you think razor is 100% spam-only you obviously haven't used razor very
much. I've had many false positives on razor over the years. Mostly on
oddball mailing lists for tech vendors. For example about 2 years ago,
every product release letter Versalogic sent out was listed in razor.
(Versalogic is a maker of embedded computer boards, and spamming would not
be very useful to them)
The problem seems to largely be the result of broken spamtraps containing
old addressees that weren't removed from legit mailing lists. Also more
recently the introduction of e8 has caused a spike in FP rate due to lack
of sufficient reports to stabilize the values.
If you don't belive me that razor does have FP's, check the STATISTICS.txt
numbers.. The S/O's are pretty high, but they are NOT 1.0. Razor isn't
perfect, and it DOES hit on legitimate mass-mailings at times.
As for the score, keep in mind that RAZOR2_CHECK and RAZOR2_CF_RANGE are
additive. In SA 2.63 an email with a razor CF score of 51 or up will get a
total 2.451 or 2.148 points, depending on wether or not you use bayes.
>Another thing: How well does language recognition in SA work? I'd like to
>give mail that's written in Slovene, say, -20 points, since it's almost
>never spam.
It's done using a tripplets dictionary IIRC, however I don't think there's
any support for any kind of "white" language rules.
You could theoretically hack the rule and make the UNDESIRED_LANGUAGE_BODY
rule into a DESIRED_LANGUAGE_BODY rule instead, by negating it and
reversing the score. However, you'd have to eliminate the original rule to
do it (otherwise everything that isn't Slovene would get positive points)
Re: Why Razor gets only +1.6
Posted by Theo Van Dinter <fe...@kluge.net>.
On Wed, Jul 14, 2004 at 11:43:03AM -0700, Kelson Vibber wrote:
> >Meaning what? That there are razor blacklisted messages that are false
> >positives ?
>
> Yes. Yesterday's announcement of the first Fedora Core 3 test release, for
> instance.
For questions like this, looking at the rules/STATISTICS* files from
the SA distro is useful. In this case, set1 (network, no bayes) shows:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
495260 343948 151312 0.694 0.00 0.00 (all messages)
100.000 69.4480 30.5520 0.694 0.00 0.00 (all messages as %)
47.918 68.8645 0.3033 0.996 1.00 1.55 RAZOR2_CF_RANGE_51_100
54.243 77.7344 0.8459 0.989 0.99 0.90 RAZOR2_CHECK
6.294 8.8711 0.4362 0.953 0.81 0.56 RAZOR2_CF_RANGE_11_50
which basically says that out of the 344k spam checked, razor caught
~77.7% of them, but also caught ~0.85% of the 151k ham, for a rough
overall accuracy (S/O) of ~98.9%.
So, much like Bayes, the messages that are really spam probably hit a
bunch of other rules as well, so the score generator can lower the razor
score to remove some false positives while still having a sum score for
the mails above the threshold of 5.
In short: So far, no anti-spam method is 100% accurate for everyone.
If you have good luck (aka: no false positives), then feel free to up
the score to whatever you want. But don't complain when this bites you
in the a**.
Example: a mailing list I'm on, but rarely post to, decided to do "score
HABEAS_SWE 20". which is up to them, since they apparently don't receive
mails with the SWE in them. however, I use the SWE in my mails, and so
my posts are always flagged as spam when posting to the list.
--
Randomly Generated Tagline:
"Don't bite the mailman." - Dave Matthews
Re: Why Razor gets only +1.6
Posted by Kelson Vibber <ke...@speed.net>.
At 09:58 AM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>Meaning what? That there are razor blacklisted messages that are false
>positives ?
Yes. Yesterday's announcement of the first Fedora Core 3 test release, for
instance.
Kelson Vibber
SpeedGate Communications <www.speed.net>
Re: Why Razor gets only +1.6
Posted by Jaka Jančar <ja...@kubje.org>.
I still don't understand it...
>SA's scores are assigned using a genetic algorithm (GA), to optimise
>their efficiency and minimise false positives and false negatives.
Meaning what? That there are razor blacklisted messages that are false
positives ?
Theo Van Dinter wrote:
> On Mon, Jul 12, 2004 at 06:47:34PM +0200, Jaka Jan?ar wrote:
>
>>how come that mails listed in Razor get only +1.6 points? Is razor not
>>to be trusted? I'd think, that if the hash is listed, it's 100% spam,
>>and as that it should get like +100 :/
>
>
> http://wiki.apache.org/spamassassin/HowScoresAreAssigned
>
Re: Why Razor gets only +1.6
Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Jul 12, 2004 at 06:47:34PM +0200, Jaka Jan?ar wrote:
> how come that mails listed in Razor get only +1.6 points? Is razor not
> to be trusted? I'd think, that if the hash is listed, it's 100% spam,
> and as that it should get like +100 :/
http://wiki.apache.org/spamassassin/HowScoresAreAssigned
--
Randomly Generated Tagline:
"Yea, it's gone." - Prof. Farr