You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/09/11 00:47:40 UTC

[Bug 5644] New: Message sends BodyEval::check_stock_info into hard loop

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644

           Summary: Message sends BodyEval::check_stock_info into hard loop
           Product: Spamassassin
           Version: 3.2.1
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P3
         Component: Plugins
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: windhamg@email.arizona.edu


The attached message appears to be causing the _check_stock_info() routine in
Plugin/BodyEval.pm to go into a hard loop (consuming all available CPU until
killed).  The problem appears to be related to a massive amount of whitespace in
the body of the message; when this text is removed the problem disappears.

We are running SA 3.2.1 under Perl 5.8.8 on OpenSuSE 10.1



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644


Bug 5644 depends on bug 5717, which changed state.

Bug 5717 Summary: revert 'rawbody' rule type to split paragraph blocks at 1024 chars, to avoid DOS problems
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5717

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644





------- Additional Comments From jm@jmason.org  2007-11-13 00:53 -------
(In reply to comment #4)
> On a general note: I'm observing occasional similar degenarete cases
> (as are also reported on a mailing list from time to time) ever since
> the change was made from one-line-at-a-time rule application, to
> per-paragraph rule application. Such cases are not frequent, but when
> they hit, it is not unusual they cause a massive disruption in mail flow,
> mostly because such mail comes in multiple similar instances at about
> the same period. Admittedly it is often the mainstream SARE rules that
> take the worst hit, but the problem is not exclusive to SARE rules.
> 
> When SpamAssassin takes more then a period a client is willing
> to wait (depending on a setup), a timed-out mail may stay in a
> MTA queue for a retry, aggreviating the situation.
> 
> The situation is quite unfortunate. If someone should want to cause
> a DoS, it should not be too hard to target a couple of problematic
> rules and devise a crafted message to purposely cause lengthy regexp
> evaluation. I wonder if this is a good situation for a reputation of
> a service that more and more folks depend upon to run mostly unattended.
> 
> Apart from reverting to per-line regexps (at the expense of accuracy),
> I don't have a good solution. Perhaps limiting paragraphs in size,
> maybe compressing spans of 3+ occurrences of same characters before
> applying rules, ... ?

I think we should discuss reverting back to per-line regexps, as
I agree with your thoughts regarding reliability etc. 

Shall I open a bug?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644





------- Additional Comments From jm@jmason.org  2007-09-11 02:44 -------
the rule still gets good hits:

0.00000 	 0.5299  3543 of 668466 messages  	 0.0137  18 of 131397 messages  	
0.975 	 0.81 	 4.20 	TVD_STOCK1 	 	

http://ruleqa.spamassassin.org/?daterev=20070910-r574178-n&rule=%2FTVD_STOCK&srcpath=&g=Change

so we can't just delete it...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644





------- Additional Comments From Mark.Martinec@ijs.si  2007-11-12 16:44 -------
Indeed it is quite terrible. The 3.3 produces the following timing report:

01:19:26.771 112.544 0.001 [65209] dbg: timing: total 111805 ms -
init: 6131 (5.5%), parse: 31 (0.0%), extract_message_metadata: 373 (0.3%),
get_uri_detail_list: 32 (0.0%), tests_pri_-1000: 55 (0.0%),
tests_pri_-950: 4 (0.0%), tests_pri_-900: 5 (0.0%),
tests_pri_-400: 1051 (0.9%), check_bayes: 814 (0.7%),
tests_pri_0: 103731 (92.8%), check_spf: 365 (0.3%), poll_dns_idle: 97 (0.1%),
check_dkim_signature: 309 (0.3%), check_razor2: 840 (0.8%),
check_pyzor: 0.09 (0.0%), check_dcc: 333 (0.3%), tests_pri_100: 6 (0.0%),
tests_pri_500: 291 (0.3%), tests_pri_1000: 83 (0.1%), total_awl: 77 (0.1%), 
check_awl: 21 (0.0%), check_awl_reput: 3 (0.0%), update_awl: 3 (0.0%)

A minute and 45 seconds of CPU-intensive grinding for priority-0 rules.

On a general note: I'm observing occasional similar degenarete cases
(as are also reported on a mailing list from time to time) ever since
the change was made from one-line-at-a-time rule application, to
per-paragraph rule application. Such cases are not frequent, but when
they hit, it is not unusual they cause a massive disruption in mail flow,
mostly because such mail comes in multiple similar instances at about
the same period. Admittedly it is often the mainstream SARE rules that
take the worst hit, but the problem is not exclusive to SARE rules.

When SpamAssassin takes more then a period a client is willing
to wait (depending on a setup), a timed-out mail may stay in a
MTA queue for a retry, aggreviating the situation.

The situation is quite unfortunate. If someone should want to cause
a DoS, it should not be too hard to target a couple of problematic
rules and devise a crafted message to purposely cause lengthy regexp
evaluation. I wonder if this is a good situation for a reputation of
a service that more and more folks depend upon to run mostly unattended.

Apart from reverting to per-line regexps (at the expense of accuracy),
I don't have a good solution. Perhaps limiting paragraphs in size,
maybe compressing spans of 3+ occurrences of same characters before
applying rules, ... ?





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644





------- Additional Comments From windhamg@email.arizona.edu  2007-09-10 15:49 -------
Created an attachment (id=4120)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4120&action=view)
message that triggers the loop in _check_stock_info()




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2008-03-05 08:53 -------
this is fixed in 3.3.0 due to the fix for bug 5717, which splits the 'rawbody'
representation into chunks of sizes between 1-2KB.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644





------- Additional Comments From matt@nightrealms.com  2007-11-08 11:36 -------
Created an attachment (id=4185)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4185&action=view)
Avoid problem by applying regexp to one line at a time

Avoid pathological cases where the regexp takes a massive amount of time by
applying the regexp to only one line at a time.  This change prevents the rule
from catching instances where the whitespace in "keyword\s*:\s*value" contains
a newline; I don't know if this happens often enough to be a problem.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5644] Message sends BodyEval::check_stock_info into hard loop

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |5717




------- Additional Comments From jm@jmason.org  2007-11-13 02:21 -------
ok, opened, bug 5717.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.