You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/03/09 12:52:25 UTC
[Bug 4822] New: make 'rawbody' rules match the full message string, not line-by-line
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
Summary: make 'rawbody' rules match the full message string, not
line-by-line
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: major
Priority: P5
Component: Libraries
AssignedTo: dev@spamassassin.apache.org
ReportedBy: jm@jmason.org
let's make a bug!
Loren: (?)
> > If nothing else, I am for simply changing the way rawbody rules are
> > evaluated... Because the current line by line evaluation is too
> > restrictive, and using a handfull of rules and meta'ing them together to
> > match something that wraps across multiple lines is kludgly at best.
Justin:
> That is definitely a good idea.
Justin:
> Are there any rawbody rules left anywhere that this would break? I think
> it's likely to be only an improvement.
Theo: 'Hard to say, though I tend to agree. In our case, there are few
rawbody rules (26), and fewer which aren't evals (18). There's only one
(HTML_TINY_FONT) which has a ".*" which would need some help, and via
discussion about the HTML*TINY* rules it could either be replaced or
removed without issue.'
....
Just so we're all clear... It seems like the proposal would be to change
M::SA::Message::get_decoded_body_text_array() such that:
push(@{$self->{text_decoded}},
split_into_array_of_short_lines($parts[$pt]->decode()));
becomes (corrected by Justin):
my $text = $parts[$pt]->decode();
$text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace => space
$self->{text_decoded} = [ $text ];
...
Justin:
> It does introduce the danger of algorithmic complexity attacks
> if .* is used instead of .{0,20} though -- but we may be able to help
> this if we spot that kind of thing in --lint.
Theo: '<shrug> I worry more about full than rawbody in this case since the
full text is always going to be larger than rawbody, so the potential
for problems is greater. Even with the above code, the decoded portion
is split to be under 1k, full is the size of the message.'
On Wed, Mar 08, 2006 at 11:24:59PM +0000, Justin Mason wrote:
> I think it'd be without split_into_array_of_short_lines() -- we
> want to offer the entire body as a string, not split at all.
Theo: 'Well, here's the rule differences from original to the non-split version:
< 0.149 0.1540 0.1215 0.559 BLANK_LINES_70_80
> 0.006 0.0072 0.0000 1.000 BLANK_LINES_70_80
< 0.003 0.0036 0.0000 1.000 BLANK_LINES_80_90
> 0.000 0.0000 0.0000 0.500 BLANK_LINES_80_90
< 0.021 0.0107 0.0810 0.117 HIDE_WIN_STATUS
> 0.024 0.0143 0.0810 0.150 HIDE_WIN_STATUS
< 1.041 1.2213 0.0202 0.984 INTERRUPTUS
> 1.272 1.4936 0.0202 0.987 INTERRUPTUS
< 0.030 0.0251 0.0607 0.292 OBFUSCATING_COMMENT
> 0.043 0.0322 0.1012 0.242 OBFUSCATING_COMMENT
< 0.003 0.0000 0.0202 0.000 __HS_FUNNY_BODY_FROM
> 0.000 0.0000 0.0000 0.500 __HS_FUNNY_BODY_FROM
< 0.003 0.0036 0.0000 1.000 __HS_NONDEFAULT_OE_QUOTE
> 0.055 0.0036 0.3441 0.010 __HS_NONDEFAULT_OE_QUOTE
< 0.009 0.0000 0.0607 0.000 __HS_ODD_ORIGINAL_MESSAGE
> 0.000 0.0000 0.0000 0.500 __HS_ODD_ORIGINAL_MESSAGE
< 3.055 0.3080 18.5830 0.016 __HS_QUOTE
> 0.222 0.1110 0.8502 0.116 __HS_QUOTE
< 0.015 0.0179 0.0000 1.000 __OBFUSCATING_COMMENT_A
> 0.030 0.0358 0.0000 1.000 __OBFUSCATING_COMMENT_A
< 0.186 0.0537 0.9312 0.055 __OBFUSCATING_COMMENT_B
> 0.332 0.2185 0.9717 0.184 __OBFUSCATING_COMMENT_B
in general, good rules are improved, bad rules stay bad rules. Timing is
basically unchanged. So it's fine with me to move forward on making
the change for 3.2.'
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From jm@jmason.org 2006-11-03 10:34 -------
this has been in trunk for months now, without issues -- and I for one have been
running it "live" on my server. I think we can close this bug.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From felicity@apache.org 2006-03-10 20:45 -------
(In reply to comment #11)
> Question on this change vs QP encoding.
>
> Will the QP be correctly removed from rawbody, and will raw lines ending in =
> correctly be joined to the next line?
The parts will be decoded, yes.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From duncf@debian.org 2006-03-10 02:47 -------
-0.999 for 3.1 branch (technically not a veto)
We *can't* make backwards incompatible changes in point releases. That's the
point of a point release: bug fixes only.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From lwilton@earthlink.net 2006-03-10 05:58 -------
Question on this change vs QP encoding.
Will the QP be correctly removed from rawbody, and will raw lines ending in =
correctly be joined to the next line?
One of the standard spammer tricks over the last 8 months or so is to use QP
and break the key phrases at arbitrary points with =\n. If we still have to
deal with this in the concatenated rawbody it won't be good, since the bloody
=\n can be pretty much anywhere, and all checks would have to be fuzzy matches.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From lwilton@earthlink.net 2006-03-09 13:03 -------
> Loren: (?)
Dallas, just to correct the attribution.
>> Are there any rawbody rules left anywhere that this would break? I think
I know of a decent handful of SARE rules where I've taken advantage of knowing
the rawbody rule started at the front of a line, and this change will probably
break some of those rules. <<Shrug.>> We'll deal with it, not a problem.
Getting this change in is FAR more important than worrying about having to
tweak a few rules as a consequence of the change.
(For that matter, we have any number of 'full' rules that will now be able to
be changed to rawbody rules and probably get increased hitrates in the bargan.
Good deal all around.)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From jgmyers@proofpoint.com 2006-03-09 17:25 -------
+1 on the version without whitespace canonicalization.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
felicity@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mail@lohrere.com
------- Additional Comments From felicity@apache.org 2006-12-03 19:04 -------
*** Bug 5219 has been marked as a duplicate of this bug. ***
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From felicity@apache.org 2006-03-09 22:06 -------
v3.2:
Sending lib/Mail/SpamAssassin/Message.pm
Transmitting file data .
Committed revision 384629.
depending on the results we may want to look at applying to 3.1, but I don't know how comfortable I/we
would be with that.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From jm@jmason.org 2006-03-10 10:09 -------
I agree that this should not be put in 3.1.1. 3.1.2, not sure yet, but
definitely not 3.1.1 due to its lateness.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From felicity@apache.org 2006-03-10 07:13 -------
(In reply to comment #13)
> I'm thinking that if it does get included in 3.1, it should wait until 3.1.2
> (which hopefully isn't too far off given that there are still problems with spamd).
yes, absolutely. as far as I'm concerned, 3.1.1 is ready to be released -- the last patch needs the requisite
amount of time for review before it can be applied, and that'll be tomorrow. and I really want to see us
not wait 6 months for a 3.1.2, damn it. ;)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From felicity@apache.org 2006-12-03 20:32 -------
*** Bug 5219 has been marked as a duplicate of this bug. ***
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From felicity@apache.org 2006-03-09 15:37 -------
(In reply to comment #2)
> the rawbody rule started at the front of a line, and this change will probably
> break some of those rules.
hrm. just had a thought: if the mime-part is essentially decoded but left as a big scalar, people will want
to remember to use /m and /s as appropriate (see perlre POD but /m allows ^ and $ to match inside the
string around newlines, and /s allows "." to match "\n".)
otherwise, the patch to do this is trivial, I'll put it up in a minute for people to discuss before we apply it.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From spamassassin@dostech.ca 2006-03-10 07:00 -------
I'm thinking that if it does get included in 3.1, it should wait until 3.1.2
(which hopefully isn't too far off given that there are still problems with spamd).
The decision/change is awfully last minute for 3.1.1.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From felicity@apache.org 2006-03-09 15:42 -------
Created an attachment (id=3403)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3403&action=view)
suggested patch
just take the split out of the function, leaves the whole decoded mime part as
a scalar.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From quinlan@pathname.com 2006-03-10 19:14 -------
-1 for 3.1. No way can we change this on stable branch. Someone could easily
have a rule go bonkers performance-wise or go down the tubes efficacy-wise.
For 3.2, I think we need to seriously consider performance implications. I
like the idea of being able to match more than a line at a time, but consider
the performance problems we have had doing complex regular expressions (not
just .*) on long strings.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From duncf@debian.org 2006-03-10 06:55 -------
> "as owner or at least author of probably 98% of the rawbody rules that will be
affected by this change"
You can't possibly know how many are out there that are not public. Neither can
I. That's the point. :-)
(I would have vetoed this, except I don't feel I've done enough work around here
recently to merit my right to veto... So if those that do the work around here
feel this should go in 3.1.1, by all means, out-vote me.)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.2.0
------- Additional Comments From jm@jmason.org 2006-03-09 11:53 -------
aiming at 3.2.0.
+1 from me on the proposed change.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From sidney@sidney.com 2006-03-09 23:31 -------
> we may want to look at applying to 3.1
It's time to decide that now if it is going to happen. I'm not clear as to how
much this would affect people upgrading from 3.1 to 3.1.1 -- As I understand it
would only have a positive effect on the standard rules but might mess up some
number of third party rules?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2006-11-03 10:34 -------
er, close this bug, I said
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From jm@jmason.org 2006-03-09 21:59 -------
+1 on 3403
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4822] make 'rawbody' rules match the full message string, not line-by-line
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
------- Additional Comments From lwilton@earthlink.net 2006-03-10 05:54 -------
> -0.999 for 3.1 branch (technically not a veto)
> We *can't* make backwards incompatible changes in point releases. That's the
> point of a point release: bug fixes only.
While I agree with the principle, as owner or at least author of probably 98%
of the rawbody rules that will be affected by this change, I'm strongly in
favor of getting this in early and often to every release possible. There is a
good chance that a lot of the rules where I took advantage of single-line
evaluation will "just work" anyway with this change, and fixing the remainder
will be trivial.
Thus, I'll vote (pretending I have a vote) +.999 to counter Duncan. :-)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.