You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2019/07/01 11:50:25 UTC
[Bug 7729] New: body rules to match body only, not including the
Subject
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Bug ID: 7729
Summary: body rules to match body only, not including the
Subject
Product: Spamassassin
Version: unspecified
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Rules
Assignee: dev@spamassassin.apache.org
Reporter: guenther@rudersport.de
Target Milestone: Undefined
The current body rule does not only match the actual body, but also includes
the Subject as pseudo-part of the body.
While this behaviour is clearly documented, people do get caught be this (meta
rule to match some phrases in the body AND subject). It is non-trivial to write
a rule matching the body only, or actually matching both the body and the
Subject.
M:SA:Conf
"The message Subject header is considered part of the body
and becomes the first paragraph when running the rules."
We should implement a tflags test flag for body rules to exclude the Subject
from matching.
Along with the next major branch 4.x we may consider making this the default,
with Subject prepending a tflags option.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|body rules to match body |[review] body rules to
|only, not including the |match body only, not
|Subject |including the Subject
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #21 from Kevin A. McGrail <km...@apache.org> ---
Thanks Henrik. I appreciate all the extra work!
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #10 from Kevin A. McGrail <km...@apache.org> ---
Thanks for the clarification. I rescind my veto if the patch is different in
that 3.4 requires a tflag not to eval the subject as part of the body and 4.0
by default requires a tlag to eval the subject, that's ok.
How do you predict that each will handle the tflags for the other? Assuming
harmlessly ignored?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] body rules to match body only, not including the Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #2 from Henrik Krohns <ap...@hege.li> ---
Only thing required to fix after that would probably be the schemeless uri
parser, which currently grabs uris and emails from the Subject. It's a one line
fix to code to add it back to it.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #7 from Henrik Krohns <ap...@hege.li> ---
Kevin, are you allowed to decide things with only 2 votes? It's currently +1
-1.
The latest patch is 100% working, guaranteed.
Please someone else vote.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #8 from Henrik Krohns <ap...@hege.li> ---
Created attachment 5667
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5667&action=edit
Nosubject patch for 3.4
Patch for 3.4 attached. It 100% identical, just the sub has_tflags_nosubject {
1 } had to be put in place manually.
3.4 and trunk code are 100% identical the way the eval stuff works. There is
nothing this patch does differently between them.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
RW <rw...@googlemail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rwmaillists@googlemail.com
--- Comment #11 from RW <rw...@googlemail.com> ---
(In reply to Kevin A. McGrail from comment #10)
> 4.0 by default requires a tlag to eval the subject, that's ok.
I think that's a bad idea. Few body rules need the subject excluded, most
benefit from it, so practically all body rules will end-up needing a tflags
line.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #15 from Henrik Krohns <ap...@hege.li> ---
It seems "nosubject" is accepted for trunk, committing to get it out of the
way.
Sending trunk/lib/Mail/SpamAssassin/Conf.pm
Sending trunk/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1864791.
For 3.4 it's already +2 I think, I'll let Kevin say his final words, but I
think it's accepted regardless?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
John Hardin <jh...@impsec.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jhardin@impsec.org
--- Comment #13 from John Hardin <jh...@impsec.org> ---
-1 for removing body-matches-subject-too behavior entirely from any version
+1 for adding "tflags nosubject" to 4.0
Undecided about adding it to 3.4 as well; if it breaks 3.3 and earlier then
definitely -1, but if it's just ignored then the concern becomes the effects of
a rules getting interpreted (slightly) differently by different versions.
If we feel we want it in 3.4 then something like
"can(Mail::SpamAssassin::Conf::body_tflag_nosubject)" might be a good idea.
Since rules updates are generated from trunk, then that can() might be a good
idea regardless...
so: +1 for adding it to 3.4 along with
can(Mail::SpamAssassin::Conf::body_tflag_nosubject)
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #18 from Henrik Krohns <ap...@hege.li> ---
- 3.4 is not touched
- trunk has nosubject flag
I will commit 3.4 now unless there's any real objections, there's already +1
from me and John.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #20 from Henrik Krohns <ap...@hege.li> ---
All done now.
I'm not working on anything on 3.4 anymore, so Kevin, feel free to try to put
rc4 out. :-)
Sending spamassassin-3.4/UPGRADE
Sending spamassassin-3.4/lib/Mail/SpamAssassin/Conf.pm
Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ...done
Committing transaction...
Committed revision 1864877.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #19 from Kevin A. McGrail <km...@apache.org> ---
Yes, I'm +1 on both after discussion. I had hoped to change the behavior for
4.0 but such is life.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] body rules to match body only, not including the Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #3 from Henrik Krohns <ap...@hege.li> ---
Ok vetoing my vote. Tested which rules hit subject, and there are maaaany.
Thinking further, some people will probably stick with 3.4 for years and years,
it would be nightmare to just remove Subject and try to maintain compatibility
for all..
Instead here is the extremely simple code required to implement a "nosubject"
tflag. Just add a small documentation blurt, and it can be committed to 3.4.3
very safely, please vote, +1 me.
--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -827,6 +827,9 @@
$loopid++;
my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
$max = untaint_var($max);
+ if ($conf->{tflags}->{$rulename} =~ /\bnosubject\b/) {
+ $sub .= "\n".'shift @_;'; # remove subject line
+ }
$sub .= '
$hits = 0;
body_'.$loopid.': foreach my $l (@_) {
@@ -844,6 +847,9 @@
else {
# omitting the "pos" call, "body_loopid" label, use of while()
# instead of if() etc., shaves off 8 perl OPs.
+ if (($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/) {
+ $sub .= "\n".'shift @_;'; # remove subject line
+ }
$sub .= '
foreach my $l (@_) {
'.$self->hash_line_for_rule($pms, $rulename).'
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] body rules to match body only, not including the Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Karsten Bräckelmann <gu...@rudersport.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Hardware|PC |All
Target Milestone|Undefined |4.0.0
OS|Linux |All
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.0.0 |3.4.3
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #9 from Henrik Krohns <ap...@hege.li> ---
Honestly I don't care if 3.4 is nuked, but this patch is super simple. If we
leave 3.4 without the feature, we might as well never use it, since 3.4 will be
around forever.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #17 from Kevin A. McGrail <km...@apache.org> ---
Henrik, can you recap what the current state is for 3.4 and for 4.0? Are both
the same and you have a tflag for nosubject?
I just want to be clear on testing so I can get 3.4.3 to continue moving along.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #22 from Kevin A. McGrail <km...@apache.org> ---
Just a note that with rc4 or greater for 3.4.3, it worked as expected.
Here's how I tested:
Add this rule and an email with subject example and example not in the body
#Testing Rule for NoSubject Rules - See note 58246
if (version >= 3.004003)
#SHOULD HIT
body NOSUBJECT_TEST_HIT /example/i
describe NOSUBJECT_TEST_HIT This should hit on an email
with example in the subject but not in the body because
subjects are automatically prepending for testing.
#SHOULD NOT HIT
body NOSUBJECT_TEST_FAIL /example/i
describe NOSUBJECT_TEST_FAIL This should NOT hit on an email
with example in the subject not not in the body beca
use the tflag nosubject will stop the automatic prepending of subjects for
testing.
tflags NOSUBJECT_TEST_FAIL nosubject
endif
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Kevin A. McGrail <km...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kmcgrail@apache.org
Target Milestone|3.4.3 |4.0.0
--- Comment #6 from Kevin A. McGrail <km...@apache.org> ---
There is no way this should go into 3.4 to be clear. Changing the milestone and
I will -1 vote for a 3.4. TOTALLY +1 for 4.0 though I need to test the patch
more and I don't have 4.0 setup at the moment, just 3.4 RC's.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] body rules to match body only, not including the Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #4 from Henrik Krohns <ap...@hege.li> ---
Slightly revised patch, should not modify @_ as it's used elsewhere..
Eval string will include two extra nosubj codelines if flagged
if ($scoresptr->{q{FOO}}) {
my $nosubj = 1; ### IF FLAGGED
foreach my $l (@_) {
if ($nosubj) { $nosubj = 0; next; } ### IF FLAGGED
if ($l =~ /$qrptr->{q{FOO}}/o) {
$self->got_hit(q{FOO}, "BODY: ", ruletype => "body");
last;
}
}
}
--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -821,6 +821,12 @@
dbg("rules-all: running body rule %s", q{'.$rulename.'});
';
}
+ my $nosubject = ($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/;
+ if ($nosubject) {
+ $sub .= '
+ my $nosubj = 1;
+ ';
+ }
if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
{
# support multiple matches
@@ -830,6 +836,13 @@
$sub .= '
$hits = 0;
body_'.$loopid.': foreach my $l (@_) {
+ ';
+ if ($nosubject) {
+ $sub .= '
+ if ($nosubj) { $nosubj = 0; next; }
+ ';
+ }
+ $sub .= '
pos $l = 0;
'.$self->hash_line_for_rule($pms, $rulename).'
while ($l =~ /$qrptr->{q{'.$rulename.'}}/go'. ($max? ' && $hits++ <
'.$max:'') .') {
@@ -846,6 +859,13 @@
# instead of if() etc., shaves off 8 perl OPs.
$sub .= '
foreach my $l (@_) {
+ ';
+ if ($nosubject) {
+ $sub .= '
+ if ($nosubj) { $nosubj = 0; next; }
+ ';
+ }
+ $sub .= '
'.$self->hash_line_for_rule($pms, $rulename).'
if ($l =~ /$qrptr->{q{'.$rulename.'}}/o) {
$self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "body");
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #16 from Henrik Krohns <ap...@hege.li> ---
Forgot docs..
Sending trunk/UPGRADE
Transmitting file data .done
Committing transaction...
Committed revision 1864792.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #14 from Henrik Krohns <ap...@hege.li> ---
Already has the has_tflags_nosubject as mentioned.
No SpamAssassin version lints tflags, you can put any crap you want in them and
it won't croak.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] [review] body rules to match body only, not including the
Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #12 from Henrik Krohns <ap...@hege.li> ---
(In reply to Kevin A. McGrail from comment #10)
> Thanks for the clarification. I rescind my veto if the patch is different
> in that 3.4 requires a tflag not to eval the subject as part of the body and
> 4.0 by default requires a tlag to eval the subject, that's ok.
No, I think I will always vote -1 for 4.0 if you are suggesting to drop subject
by default.
People are still using even 3.3, this subject stuff has been in so long there
that it would be probably silly to change now. And even sillier to commit 3.4
with a "nosubject" flag and 4.0 with a "subject" flag.
I think very few rules actually would NOT want to match subject. So it makes
sense to have a "nosubject" flag, instead of "subject" flag.
But I'm working on so many things right now, it's hard to concentrate, I
reserve the right to change my mind later. :-)
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] body rules to match body only, not including the Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
--- Comment #5 from Henrik Krohns <ap...@hege.li> ---
Created attachment 5666
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5666&action=edit
Complete tflags nosubject patch
Here complete patch along with documentation and
M::SA::Conf::has_tflags_nosubject.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7729] body rules to match body only, not including the Subject
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |apache@hege.li
--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
Yes it seems quite pointless shortcut to "maybe" reduce the amount of duplicate
rules.
It makes no sense to add separate tflag for this. If one wants to match
Subject, then it should be a separate header rule!
I vote +1 to remove legacy subject matching from body.
--
You are receiving this mail because:
You are the assignee for the bug.