You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Matt Elson <me...@fastmail.net> on 2011/03/22 14:58:42 UTC

Fwd: Bug 6558 Investigations/Info

Hey all,

I've been doing some investigation of Bug 6558 (__PILL_PRICE_[1-3] + 
Compiled Rulesets == endless loop_ on my end and want to share the 
results - I'm not super familiar with SpamAssassin's code base, so 
apologies if I misread anything or am totally off.

Long story short since this email has gotten a little bit lengthy, I 
think the problem lies in a function being created around line 97 in 
OneLineBodyRuleType that Rule2XSBody later uses in *some* cases.

Lengthy analysis below:

At this point, I have a test machine (x64) where I have removed *all 
rules* except the rules I'm testing and disabled every plugin except 
Rule2XSBody.pm and Check.pm.

First, I've played around with the regexes and found that something as 
simple as:

body        LOCAL_TEST         /pill/
tflags      LOCAL_TEST         multiple

will cause the problem (when run on the short artificial email attached 
to the bugzilla).

Interestingly enough, if I make this case insensitive

body        LOCAL_TEST         /pill/i
tflags      LOCAL_TEST         multiple

The problem goes away.

So at that point I started poking around the code for Rule2XSBody 
because I was curious... and this is where I'm probably a bit out of my 
depth.  But, it looks like the reason the case insensitive rule does 
*not* hit the problem is because the results of the CompiledRegexps scan 
is flagged as "non lossy" (l=0) and gets hits by the if statement around 
line 243 in Rule2XSBody.pm.  Case sensitive rules are flagged as lossy 
(l=1) by the CompiledRegexps and have to move on.  They get up to the 
stanza at line 261 - if (!&{$fn} ($scanner, $line) && $do_dbg).. and 
this is where things are getting stuck for me.  This is where it got 
interesting - when I added in my debugging and ran through the original 
__PILL_PRICE_[1-3] rules that created it - they're all flagged as lossy.

$fn seems to be a dynamically created function that Rule2XSBody (by way 
of OneLineBodyRuleType.pm) creates - unfortunately I can't quite 
decipher the code  - line 142 in OneLineBodyRuleType.pm is where it's 
made.  While I can't make out what the function's supposed to do, it is 
worth noting that when the rule it's being created for has a tflag of 
"multiple", the function has a while condition: i.e.

while ($_[1] =~ '.$pat.'g) {

Whereas if the tflag is NOT multiple, it's just an if condition

if ($_[1] =~ '.$pat.') {

I'm not quite sure what's supposed to break out of the while loop, but 
I'm fairly sure it's not getting correctly broken and is where 
everything's getting stuck.  I changed the "while" to an if just to test 
this theory and once I do this.. the problem goes away for me, 
completely on all regexes, both my simple /pill/ and the more elaborate 
original ones (and rewrites).  I'd imagine not a real solution, but good 
for testing. (simple patch attached in case I was unclear about the
change).

This doesn't quite explain why the problem doesn't emerge for everyone 
using compiled rules (though maybe the difference is whether or not the 
CompiledRegexpsModule is flagging the rules as lossy; that might differ 
from architecture to architecture and environment to environment and 
when the rules are NOT lossy, they don't get to the bit of code that 
seems to be causing the problem).

For further information, here's what the dynamic function function looks 
like when I spit it out with some debugging.

  sub JUST_PILLS_one_line_body_test { {
       pos $_[1] = 0;

#line 1 "/var/lib/spamassassin/3.003001/local.cf, rule JUST_PILLS,"
       while ($_[1] =~ /pill/g) {
         my $self = $_[0];
         $self->got_hit(q{JUST_PILLS}, "BODY: ", ruletype => 
"one_line_body");

         dbg("rules: ran one_line_body rule JUST_PILLS ======> got hit: 
\"" . ($&|| "negative match") . "\"");

       }
       } }


(notice that that's the debug statement that you see repeated over and 
over; the comments before ${fn} is called suggest that this is running 
the real regex).

Like I said, I'm having trouble making sense of it ($_ was never a 
friend of mine) and for the life of me I don't know how the loop is 
supposed to end.

Another little hack I did that seems to fix it (though goodness knows at 
what cost) is to add an s at front

(i.e. making it while $_[1] =~ s/pill/g).

Again, not suggesting that as a real solution since modifying variables 
arbitrarily seems.. unwise, but maybe it will help troubleshoot/debug 
further.

Anyway, hope this helps out!

Matt

-- 
  Matt Elson
  melson@fastmail.net