You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michael Monnerie <mi...@is.it-management.at> on 2011/12/13 08:34:39 UTC

Re: Using ZMI_GERMAN ruleset

On Montag, 31. Oktober 2011 Axb wrote:
> tried it and dumped due to low hit rate
> 
> stuff like
> 
> body     ZMIde_JOBSEARCH6 /Dank sehr grossen Angagement, aber auch
> der  Umsetzung verschiedener Inovationen, konnte unsere Firma schon
> nach vier Jahren auf die internationalen Ebene hinaufsteigen/
> 
> is not efficient

Its "efficient" in terms of "filtering only spam with zero false 
positives", which is top priority for this ruleset. And you picked a 
very old and very long rule. Most rules nowadays are just one or even 
only part of a sentence, and it prooves very efficient. Stuff like the 
__ZMIde_JOBEARN1-28 rules move false positives to 0, and I'm constantly 
adding stuff.

I've now tried to remove all old cruft, that means single-line rules. 
Rulesize went from 350KB to 296KB, that should save some RAM and CPU.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

Re: Using ZMI_GERMAN ruleset

Posted by Michael Monnerie <mi...@is.it-management.at>.
On Dienstag, 13. Dezember 2011 Axb wrote:
> patterns with >120 characters are not really efficient, in terms of 
> speed and hit rate. They are very specific to certain campaigns and 
> minimal template changes will render them useless as in:
> 
> body     __ZMIde_STOCK34 /Wir sind .{0,2}berzeugt, dass der
> Zeitpunkt  sich an einem Unternehmen zu beteiligen, welches
> erfolgreich im Edelmetallhandel t.{0,2}tig ist, nicht besser sein
> k.{0,2}nnte/
> 
> or
> 
> body     __ZMIde_SALE5 /In den letzten 5 Jahren hatte ich .{0,2}ber
> drei  dutzend gut funktionierende Strategien, um die Zahl meiner
> Webseitenbesucher drastisch zu erh.{0,2}hen und dadurch meinen
> Umsatz  anzukurbeln/

Since they get hits, no need to change them. Once I get reports about a 
sentence that has been modified, I apply that to the rules. I need 
feedback for those, of course.
And it should still be fast in terms of CPU as if there's (rule 
__ZMIde_SALE5) no "In den l" in the message, the regex shouldn't have to 
search too much, right? At least I'd guess it's an optimized search 
which compares in 64bit steps, which is 8 chars at once?

Look for the "Krankenkassa" ruleset, this has been very active these 
last weeks. All the time modifications from them, I get reports and 
modify the rules accordingly.

And not to forget: Long sentences mean chance for a false positive drops

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

Re: Using ZMI_GERMAN ruleset

Posted by Axb <ax...@gmail.com>.
On 2011-12-13 8:34, Michael Monnerie wrote:
> On Montag, 31. Oktober 2011 Axb wrote:
>> tried it and dumped due to low hit rate
>>
>> stuff like
>>
>> body     ZMIde_JOBSEARCH6 /Dank sehr grossen Angagement, aber auch
>> der  Umsetzung verschiedener Inovationen, konnte unsere Firma schon
>> nach vier Jahren auf die internationalen Ebene hinaufsteigen/
>>
>> is not efficient
>
> Its "efficient" in terms of "filtering only spam with zero false
> positives", which is top priority for this ruleset. And you picked a
> very old and very long rule. Most rules nowadays are just one or even
> only part of a sentence, and it prooves very efficient. Stuff like the
> __ZMIde_JOBEARN1-28 rules move false positives to 0, and I'm constantly
> adding stuff.
>
> I've now tried to remove all old cruft, that means single-line rules.
> Rulesize went from 350KB to 296KB, that should save some RAM and CPU.
>

patterns with >120 characters are not really efficient, in terms of 
speed and hit rate. They are very specific to certain campaigns and 
minimal template changes will render them useless as in:

body     __ZMIde_STOCK34 /Wir sind .{0,2}berzeugt, dass der Zeitpunkt 
sich an einem Unternehmen zu beteiligen, welches erfolgreich im 
Edelmetallhandel t.{0,2}tig ist, nicht besser sein k.{0,2}nnte/

or

body     __ZMIde_SALE5 /In den letzten 5 Jahren hatte ich .{0,2}ber drei 
dutzend gut funktionierende Strategien, um die Zahl meiner 
Webseitenbesucher drastisch zu erh.{0,2}hen und dadurch meinen Umsatz 
anzukurbeln/