You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Axb <ax...@gmail.com> on 2012/02/27 15:47:03 UTC

__FILL_THIS_FORM_SHORT* rule slowness

These two rules seem to use as lot of processing time:

__FILL_THIS_FORM_SHORT2    0.8015
__FILL_THIS_FORM_LONG2    0.7652

nearly a second/rule which is only used in metas seems a bit heavy.

John, could you optimize them somewhat?

Thx

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by John Hardin <jh...@impsec.org>.
On Mon, 27 Feb 2012, darxus@chaosreigns.com wrote:

> On 02/27, Axb wrote:
>> These two rules seem to use as lot of processing time:
>>
>> __FILL_THIS_FORM_SHORT2    0.8015
>> __FILL_THIS_FORM_LONG2    0.7652
>
> Please remind me, how can I check these times?

the HitFreqsRuleTiming plugin

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A superior gunman is one who uses his superior judgment to keep
   himself out of situations that would require the use of his
   superior skills.
-----------------------------------------------------------------------
  15 days until Albert Einstein's 133rd Birthday

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by da...@chaosreigns.com.
On 02/27, Axb wrote:
> These two rules seem to use as lot of processing time:
> 
> __FILL_THIS_FORM_SHORT2    0.8015
> __FILL_THIS_FORM_LONG2    0.7652

Please remind me, how can I check these times?

-- 
"Wash daily from nose-tip to tail-tip; drink deeply, but never too deep;
And remember the night is for hunting, and forget not the day is for sleep."
- The Law of the Jungle, Rudyard Kipling
http://www.ChaosReigns.com

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by John Hardin <jh...@impsec.org>.
On Tue, 28 Feb 2012, darxus@chaosreigns.com wrote:

> Somebody came into IRC a few days ago saying SA was hanging on some emails.
> It turns out it was just taking "over 2 minutes".  Not sure about the
> hardware.  But when I checked the timings on the two samples he provided, I
> got this:
>
> T        __FILL_THIS_FORM_SHORT2    5.9169    5.9169    1
> T         __FILL_THIS_FORM_LONG2    5.9041    5.9041    1
> T      RAZOR2_CF_RANGE_E8_51_100    5.0010    5.0010    1
> T  __FILL_THIS_FORM_FRAUD_PHISH1    2.1362    2.1362    1
>
> T        __FILL_THIS_FORM_SHORT2    6.9687    6.9687    1
> T         __FILL_THIS_FORM_LONG2    6.8891    6.8891    1
> T  __FILL_THIS_FORM_FRAUD_PHISH1    2.5187    2.5187    1
> T         __FILL_THIS_FORM_LOAN1    1.6947    1.6947    1
>
> The examples are here:
> http://pastebin.com/RJBRQqjJ
> http://pastebin.com/54pptaPb

I will take a look. Thanks!

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The difference is that Unix has had thirty years of technical
   types demanding basic functionality of it. And the Macintosh has
   had fifteen years of interface fascist users shaping its progress.
   Windows has the hairpin turns of the Microsoft marketing machine
   and that's all.                                    -- Red Drag Diva
-----------------------------------------------------------------------
  14 days until Albert Einstein's 133rd Birthday

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by John Hardin <jh...@impsec.org>.
On Tue, 28 Feb 2012, Axb wrote:

> On 02/28/2012 03:27 PM, John Hardin wrote:
>
>>  If there is community consensus that detecting fill-in-the-form spams
>>  isn't valuable enough to justify the overhead of running these
>>  (admittedly complex) rules, then I can disable them in my sandbox; but
>>  I'd like a consensus before doing that.
>
> ruleqa's overlaps should give us more data.
>
> If we can agree a small group to review hits, I'm pretty sure we'd find lots 
> of cases where rules are redundant, merely by autopromoting similar rules 
> from various sandboxes.

Oh, definitely. I have noticed that before but haven't put a lot of effort 
into reducing dupes.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A superior gunman is one who uses his superior judgment to keep
   himself out of situations that would require the use of his
   superior skills.
-----------------------------------------------------------------------
  14 days until Albert Einstein's 133rd Birthday

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by Axb <ax...@gmail.com>.
On 02/28/2012 03:27 PM, John Hardin wrote:
> On Tue, 28 Feb 2012, Axb wrote:
>
>>> T __FILL_THIS_FORM_SHORT2 5.9169 5.9169 1
>>> T __FILL_THIS_FORM_LONG2 5.9041 5.9041 1
>>> T RAZOR2_CF_RANGE_E8_51_100 5.0010 5.0010 1
>>> T __FILL_THIS_FORM_FRAUD_PHISH1 2.1362 2.1362 1
>>>
>>> T __FILL_THIS_FORM_SHORT2 6.9687 6.9687 1
>>> T __FILL_THIS_FORM_LONG2 6.8891 6.8891 1
>>> T __FILL_THIS_FORM_FRAUD_PHISH1 2.5187 2.5187 1
>>> T __FILL_THIS_FORM_LOAN1 1.6947 1.6947 1
>>
>> THANKS! you made my day.
>>
>> now: is this ReplaceTags slowness or .....
>
> Replacetags by itself is lightweight, it's just string substitution.
>
>> John,
>> what can you do to solve this? kill the rules?
>
> Well, jeeze, you could give me a chance to look at them first... :)
>
> You're always welcome to zero the scores of these rules if their
> filtering performance for your mail stream doesn't justify their
> overhead. If there is community consensus that detecting
> fill-in-the-form spams isn't valuable enough to justify the overhead of
> running these (admittedly complex) rules, then I can disable them in my
> sandbox; but I'd like a consensus before doing that.

ruleqa's overlaps should give us more data.

If we can agree a small group to review hits, I'm pretty sure we'd find 
lots of cases where rules are redundant, merely by autopromoting similar 
rules from various sandboxes.

To be honest with you, I'd rather we put emphasis in frequent auto rule 
generation than collecting heaps of low scoring rules/metas which aren't 
doing speed a favour.








Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by John Hardin <jh...@impsec.org>.
On Tue, 28 Feb 2012, Axb wrote:

>>  T        __FILL_THIS_FORM_SHORT2    5.9169    5.9169    1
>>  T         __FILL_THIS_FORM_LONG2    5.9041    5.9041    1
>>  T      RAZOR2_CF_RANGE_E8_51_100    5.0010    5.0010    1
>>  T  __FILL_THIS_FORM_FRAUD_PHISH1    2.1362    2.1362    1
>>
>>  T        __FILL_THIS_FORM_SHORT2    6.9687    6.9687    1
>>  T         __FILL_THIS_FORM_LONG2    6.8891    6.8891    1
>>  T  __FILL_THIS_FORM_FRAUD_PHISH1    2.5187    2.5187    1
>>  T         __FILL_THIS_FORM_LOAN1    1.6947    1.6947    1
>
> THANKS!  you made my day.
>
> now: is this ReplaceTags slowness or .....

Replacetags by itself is lightweight, it's just string substitution.

> John,
> what can you do to solve this? kill the rules?

Well, jeeze, you could give me a chance to look at them first... :)

You're always welcome to zero the scores of these rules if their filtering 
performance for your mail stream doesn't justify their overhead. If there 
is community consensus that detecting fill-in-the-form spams isn't 
valuable enough to justify the overhead of running these (admittedly 
complex) rules, then I can disable them in my sandbox; but I'd like 
a consensus before doing that.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The difference is that Unix has had thirty years of technical
   types demanding basic functionality of it. And the Macintosh has
   had fifteen years of interface fascist users shaping its progress.
   Windows has the hairpin turns of the Microsoft marketing machine
   and that's all.                                    -- Red Drag Diva
-----------------------------------------------------------------------
  14 days until Albert Einstein's 133rd Birthday

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by Axb <ax...@gmail.com>.
On 02/28/2012 03:03 PM, darxus@chaosreigns.com wrote:
> On 02/27, John Hardin wrote:
>>> These two rules seem to use as lot of processing time:
>>>
>>> __FILL_THIS_FORM_SHORT2    0.8015
>>> __FILL_THIS_FORM_LONG2    0.7652
>
>> Is this consistent, or do you have specific messages where it's
>> unusually slow? There are some edge cases that have triggered bad
>> behavior in the past that I have corrected for.
>
> Somebody came into IRC a few days ago saying SA was hanging on some emails.
> It turns out it was just taking "over 2 minutes".  Not sure about the
> hardware.  But when I checked the timings on the two samples he provided, I
> got this:
>
> T        __FILL_THIS_FORM_SHORT2    5.9169    5.9169    1
> T         __FILL_THIS_FORM_LONG2    5.9041    5.9041    1
> T      RAZOR2_CF_RANGE_E8_51_100    5.0010    5.0010    1
> T  __FILL_THIS_FORM_FRAUD_PHISH1    2.1362    2.1362    1
>
> T        __FILL_THIS_FORM_SHORT2    6.9687    6.9687    1
> T         __FILL_THIS_FORM_LONG2    6.8891    6.8891    1
> T  __FILL_THIS_FORM_FRAUD_PHISH1    2.5187    2.5187    1
> T         __FILL_THIS_FORM_LOAN1    1.6947    1.6947    1
>
> The examples are here:
> http://pastebin.com/RJBRQqjJ
> http://pastebin.com/54pptaPb
>
> (My timings were with an "AMD Phenom(tm) II X4 965 Processor" processor,
> otherwise idle - home workstation.)

THANKS!  you made my day.

now: is this ReplaceTags slowness or .....

John,
what can you do to solve this? kill the rules?




Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by da...@chaosreigns.com.
On 02/27, John Hardin wrote:
> >These two rules seem to use as lot of processing time:
> >
> >__FILL_THIS_FORM_SHORT2    0.8015
> >__FILL_THIS_FORM_LONG2    0.7652

> Is this consistent, or do you have specific messages where it's
> unusually slow? There are some edge cases that have triggered bad
> behavior in the past that I have corrected for.

Somebody came into IRC a few days ago saying SA was hanging on some emails.
It turns out it was just taking "over 2 minutes".  Not sure about the
hardware.  But when I checked the timings on the two samples he provided, I
got this:

T        __FILL_THIS_FORM_SHORT2    5.9169    5.9169    1
T         __FILL_THIS_FORM_LONG2    5.9041    5.9041    1
T      RAZOR2_CF_RANGE_E8_51_100    5.0010    5.0010    1
T  __FILL_THIS_FORM_FRAUD_PHISH1    2.1362    2.1362    1

T        __FILL_THIS_FORM_SHORT2    6.9687    6.9687    1
T         __FILL_THIS_FORM_LONG2    6.8891    6.8891    1
T  __FILL_THIS_FORM_FRAUD_PHISH1    2.5187    2.5187    1
T         __FILL_THIS_FORM_LOAN1    1.6947    1.6947    1

The examples are here:
http://pastebin.com/RJBRQqjJ
http://pastebin.com/54pptaPb

(My timings were with an "AMD Phenom(tm) II X4 965 Processor" processor,
otherwise idle - home workstation.)

-- 
"A ship in a port is safe, but that's not what ships are built for."
-Grace Murray Hopper
http://www.ChaosReigns.com

Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by Axb <ax...@gmail.com>.
On 02/27/2012 04:11 PM, John Hardin wrote:
> On Mon, 27 Feb 2012, Axb wrote:
>
>> These two rules seem to use as lot of processing time:
>>
>> __FILL_THIS_FORM_SHORT2 0.8015
>> __FILL_THIS_FORM_LONG2 0.7652
>>
>> nearly a second/rule which is only used in metas seems a bit heavy.
>>
>> John, could you optimize them somewhat?
>
> Is this consistent, or do you have specific messages where it's
> unusually slow? There are some edge cases that have triggered bad
> behavior in the past that I have corrected for.

I don't have any saved sample I could hand out but I mostly see it in 
large msgs from corp users / lots of replies .
(ppl using Email as IM)
The common denominator is Outlook/Exchnage and many "parts"








Re: __FILL_THIS_FORM_SHORT* rule slowness

Posted by John Hardin <jh...@impsec.org>.
On Mon, 27 Feb 2012, Axb wrote:

> These two rules seem to use as lot of processing time:
>
> __FILL_THIS_FORM_SHORT2    0.8015
> __FILL_THIS_FORM_LONG2    0.7652
>
> nearly a second/rule which is only used in metas seems a bit heavy.
>
> John, could you optimize them somewhat?

Is this consistent, or do you have specific messages where it's unusually 
slow? There are some edge cases that have triggered bad behavior in the 
past that I have corrected for.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  15 days until Albert Einstein's 133rd Birthday