You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Matt Elson <me...@fastmail.net> on 2011/03/22 14:58:42 UTC
Fwd: Bug 6558 Investigations/Info
Hey all,
I've been doing some investigation of Bug 6558 (__PILL_PRICE_[1-3] +
Compiled Rulesets == endless loop_ on my end and want to share the
results - I'm not super familiar with SpamAssassin's code base, so
apologies if I misread anything or am totally off.
Long story short since this email has gotten a little bit lengthy, I
think the problem lies in a function being created around line 97 in
OneLineBodyRuleType that Rule2XSBody later uses in *some* cases.
Lengthy analysis below:
At this point, I have a test machine (x64) where I have removed *all
rules* except the rules I'm testing and disabled every plugin except
Rule2XSBody.pm and Check.pm.
First, I've played around with the regexes and found that something as
simple as:
body LOCAL_TEST /pill/
tflags LOCAL_TEST multiple
will cause the problem (when run on the short artificial email attached
to the bugzilla).
Interestingly enough, if I make this case insensitive
body LOCAL_TEST /pill/i
tflags LOCAL_TEST multiple
The problem goes away.
So at that point I started poking around the code for Rule2XSBody
because I was curious... and this is where I'm probably a bit out of my
depth. But, it looks like the reason the case insensitive rule does
*not* hit the problem is because the results of the CompiledRegexps scan
is flagged as "non lossy" (l=0) and gets hits by the if statement around
line 243 in Rule2XSBody.pm. Case sensitive rules are flagged as lossy
(l=1) by the CompiledRegexps and have to move on. They get up to the
stanza at line 261 - if (!&{$fn} ($scanner, $line) && $do_dbg).. and
this is where things are getting stuck for me. This is where it got
interesting - when I added in my debugging and ran through the original
__PILL_PRICE_[1-3] rules that created it - they're all flagged as lossy.
$fn seems to be a dynamically created function that Rule2XSBody (by way
of OneLineBodyRuleType.pm) creates - unfortunately I can't quite
decipher the code - line 142 in OneLineBodyRuleType.pm is where it's
made. While I can't make out what the function's supposed to do, it is
worth noting that when the rule it's being created for has a tflag of
"multiple", the function has a while condition: i.e.
while ($_[1] =~ '.$pat.'g) {
Whereas if the tflag is NOT multiple, it's just an if condition
if ($_[1] =~ '.$pat.') {
I'm not quite sure what's supposed to break out of the while loop, but
I'm fairly sure it's not getting correctly broken and is where
everything's getting stuck. I changed the "while" to an if just to test
this theory and once I do this.. the problem goes away for me,
completely on all regexes, both my simple /pill/ and the more elaborate
original ones (and rewrites). I'd imagine not a real solution, but good
for testing. (simple patch attached in case I was unclear about the
change).
This doesn't quite explain why the problem doesn't emerge for everyone
using compiled rules (though maybe the difference is whether or not the
CompiledRegexpsModule is flagging the rules as lossy; that might differ
from architecture to architecture and environment to environment and
when the rules are NOT lossy, they don't get to the bit of code that
seems to be causing the problem).
For further information, here's what the dynamic function function looks
like when I spit it out with some debugging.
sub JUST_PILLS_one_line_body_test { {
pos $_[1] = 0;
#line 1 "/var/lib/spamassassin/3.003001/local.cf, rule JUST_PILLS,"
while ($_[1] =~ /pill/g) {
my $self = $_[0];
$self->got_hit(q{JUST_PILLS}, "BODY: ", ruletype =>
"one_line_body");
dbg("rules: ran one_line_body rule JUST_PILLS ======> got hit:
\"" . ($&|| "negative match") . "\"");
}
} }
(notice that that's the debug statement that you see repeated over and
over; the comments before ${fn} is called suggest that this is running
the real regex).
Like I said, I'm having trouble making sense of it ($_ was never a
friend of mine) and for the life of me I don't know how the loop is
supposed to end.
Another little hack I did that seems to fix it (though goodness knows at
what cost) is to add an s at front
(i.e. making it while $_[1] =~ s/pill/g).
Again, not suggesting that as a real solution since modifying variables
arbitrarily seems.. unwise, but maybe it will help troubleshoot/debug
further.
Anyway, hope this helps out!
Matt
--
Matt Elson
melson@fastmail.net