You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Jaka Jančar <ja...@kubje.org> on 2004/07/12 18:47:34 UTC

Why Razor gets only +1.6

Hi,

how come that mails listed in Razor get only +1.6 points? Is razor not 
to be trusted? I'd think, that if the hash is listed, it's 100% spam, 
and as that it should get like +100 :/

Another thing: How well does language recognition in SA work? I'd like 
to give mail that's written in Slovene, say, -20 points, since it's 
almost never spam.

Thank you,
  Jaka Jancar

Re: Why Razor gets only +1.6

Posted by Jaka Jančar <ja...@kubje.org>.

Thank you very much for your reply!

Actually I am using 2.62, and this is what I get in a typical spam/virus 
message:

*  1.6 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence between 51 
and 100
*      [cf: 100]
*  0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)

Basically, would I be doing a huge mistake if a raised the 51-100% point 
to, say 4?

Kelson Vibber wrote:

> At 09:47 AM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
> 
>> how come that mails listed in Razor get only +1.6 points? Is razor not 
>> to be trusted? I'd think, that if the hash is listed, it's 100% spam, 
>> and as that it should get like +100 :/
> 
> 
> I assume you're talking about SA 3.0, since 1.6 doesn't match any of the 
> scores in my SA 2.63 install, but assuming they haven't changed the 
> *setup*, there are actually three Razor rules that, when combined, can 
> score anywhere from 0.9 to 2.4.
> 
> There's one rule, RAZOR2_CHECK, that just looks at whether Razor 
> considers the message as spam.  Then there are two other rules, 
> RAZOR2_CF_RANGE_11_50 and RAZOR2_CF_RANGE_51_100, that check the 
> confidence level Razor assigns the message.  So if Razor thinks there's 
> a 90% chance the message is spam, it gets a higher score than if Razor 
> thinks there's a 20% chance.
> 
> As for Razor's trustworthiness, there have been some issues in the past 
> with false positives, but they're generally rare.  There was a brief 
> spike last week with the release of the new razor program, and someone 
> keeps reporting Mandrake Linux security advisories, but for the most 
> part I've found it quite effective.
> 
> As for how the scores are chosen, check out 
> http://wiki.apache.org/spamassassin/HowScoresAreAssigned
> 
> Kelson Vibber
> SpeedGate Communications <www.speed.net>
> 
> 
>

Re: Why Razor gets only +1.6

Posted by Kelson Vibber <ke...@speed.net>.

At 09:47 AM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>how come that mails listed in Razor get only +1.6 points? Is razor not to 
>be trusted? I'd think, that if the hash is listed, it's 100% spam, and as 
>that it should get like +100 :/

I assume you're talking about SA 3.0, since 1.6 doesn't match any of the 
scores in my SA 2.63 install, but assuming they haven't changed the 
*setup*, there are actually three Razor rules that, when combined, can 
score anywhere from 0.9 to 2.4.

There's one rule, RAZOR2_CHECK, that just looks at whether Razor considers 
the message as spam.  Then there are two other rules, RAZOR2_CF_RANGE_11_50 
and RAZOR2_CF_RANGE_51_100, that check the confidence level Razor assigns 
the message.  So if Razor thinks there's a 90% chance the message is spam, 
it gets a higher score than if Razor thinks there's a 20% chance.

As for Razor's trustworthiness, there have been some issues in the past 
with false positives, but they're generally rare.  There was a brief spike 
last week with the release of the new razor program, and someone keeps 
reporting Mandrake Linux security advisories, but for the most part I've 
found it quite effective.

As for how the scores are chosen, check out 
http://wiki.apache.org/spamassassin/HowScoresAreAssigned

Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: Why Razor gets only +1.6

Posted by Jaka Jančar <ja...@kubje.org>.

>> Another thing: How well does language recognition in SA work? I'd like 
>> to give mail that's written in Slovene, say, -20 points, since it's 
>> almost never spam.
> 
> 
> It's done using a tripplets dictionary IIRC, however I don't think 
> there's any support for any kind of "white" language rules.
> 
> You could theoretically hack the rule and make the 
> UNDESIRED_LANGUAGE_BODY rule into a DESIRED_LANGUAGE_BODY rule instead, 
> by negating it and reversing the score. However, you'd have to eliminate 
> the original rule to do it (otherwise everything that isn't Slovene 
> would get positive points)
>
Could you give me an example of what to modify, if it isn't too much 
work.... :)

Re: Why Razor gets only +1.6

Posted by Matt Kettler <mk...@evi-inc.com>.

At 12:47 PM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>how come that mails listed in Razor get only +1.6 points? Is razor not to 
>be trusted? I'd think, that if the hash is listed, it's 100% spam, and as 
>that it should get like +100 :/

If you think razor is 100% spam-only you obviously haven't used razor very 
much. I've had many false positives on razor over the years. Mostly on 
oddball mailing lists for tech vendors. For example about 2 years ago, 
every product release letter Versalogic sent out was listed in razor. 
(Versalogic is a maker of embedded computer boards, and spamming would not 
be very useful to them)

The problem seems to largely be the result of broken spamtraps containing 
old addressees that weren't removed from legit mailing lists. Also more 
recently the introduction of e8 has caused a spike in FP rate due to lack 
of sufficient reports to stabilize the values.

If you don't belive me that razor does have FP's, check the STATISTICS.txt 
numbers.. The S/O's are pretty high, but they are NOT 1.0. Razor isn't 
perfect, and it DOES hit on legitimate mass-mailings at times.

As for the score, keep in mind that RAZOR2_CHECK and RAZOR2_CF_RANGE are 
additive. In SA 2.63 an email with a razor CF score of 51 or up will get a 
total 2.451 or 2.148 points, depending on wether or not you use bayes.

>Another thing: How well does language recognition in SA work? I'd like to 
>give mail that's written in Slovene, say, -20 points, since it's almost 
>never spam.

It's done using a tripplets dictionary IIRC, however I don't think there's 
any support for any kind of "white" language rules.

You could theoretically hack the rule and make the UNDESIRED_LANGUAGE_BODY 
rule into a DESIRED_LANGUAGE_BODY rule instead, by negating it and 
reversing the score. However, you'd have to eliminate the original rule to 
do it (otherwise everything that isn't Slovene would get positive points)

Re: Why Razor gets only +1.6

Posted by Theo Van Dinter <fe...@kluge.net>.

On Wed, Jul 14, 2004 at 11:43:03AM -0700, Kelson Vibber wrote:
> >Meaning what? That there are razor blacklisted messages that are false 
> >positives ?
> 
> Yes.  Yesterday's announcement of the first Fedora Core 3 test release, for 
> instance.

For questions like this, looking at the rules/STATISTICS* files from
the SA distro is useful.  In this case, set1 (network, no bayes) shows:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 495260   343948   151312    0.694   0.00    0.00  (all messages)
100.000  69.4480  30.5520    0.694   0.00    0.00  (all messages as %)
 47.918  68.8645   0.3033    0.996   1.00    1.55  RAZOR2_CF_RANGE_51_100
 54.243  77.7344   0.8459    0.989   0.99    0.90  RAZOR2_CHECK
  6.294   8.8711   0.4362    0.953   0.81    0.56  RAZOR2_CF_RANGE_11_50

which basically says that out of the 344k spam checked, razor caught
~77.7% of them, but also caught ~0.85% of the 151k ham, for a rough
overall accuracy (S/O) of ~98.9%.

So, much like Bayes, the messages that are really spam probably hit a
bunch of other rules as well, so the score generator can lower the razor
score to remove some false positives while still having a sum score for
the mails above the threshold of 5.

In short: So far, no anti-spam method is 100% accurate for everyone.
If you have good luck (aka: no false positives), then feel free to up
the score to whatever you want.  But don't complain when this bites you
in the a**.

Example: a mailing list I'm on, but rarely post to, decided to do "score
HABEAS_SWE 20".  which is up to them, since they apparently don't receive
mails with the SWE in them.  however, I use the SWE in my mails, and so
my posts are always flagged as spam when posting to the list.

-- 
Randomly Generated Tagline:
"Don't bite the mailman." - Dave Matthews

Re: Why Razor gets only +1.6

Posted by Kelson Vibber <ke...@speed.net>.

At 09:58 AM 7/12/2004, =?ISO-8859-2?Q?Jaka_Jan=E8ar?= wrote:
>Meaning what? That there are razor blacklisted messages that are false 
>positives ?

Yes.  Yesterday's announcement of the first Fedora Core 3 test release, for 
instance.

Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: Why Razor gets only +1.6

Posted by Jaka Jančar <ja...@kubje.org>.

I still don't understand it...

 >SA's scores are assigned using a genetic algorithm (GA), to optimise 
 >their efficiency and minimise false positives and false negatives.

Meaning what? That there are razor blacklisted messages that are false 
positives ?

Theo Van Dinter wrote:

> On Mon, Jul 12, 2004 at 06:47:34PM +0200, Jaka Jan?ar wrote:
> 
>>how come that mails listed in Razor get only +1.6 points? Is razor not 
>>to be trusted? I'd think, that if the hash is listed, it's 100% spam, 
>>and as that it should get like +100 :/
> 
> 
> http://wiki.apache.org/spamassassin/HowScoresAreAssigned
>

Re: Why Razor gets only +1.6

Posted by Theo Van Dinter <fe...@kluge.net>.

On Mon, Jul 12, 2004 at 06:47:34PM +0200, Jaka Jan?ar wrote:
> how come that mails listed in Razor get only +1.6 points? Is razor not 
> to be trusted? I'd think, that if the hash is listed, it's 100% spam, 
> and as that it should get like +100 :/

http://wiki.apache.org/spamassassin/HowScoresAreAssigned

-- 
Randomly Generated Tagline:
"Yea, it's gone."               - Prof. Farr