You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2019/07/01 11:50:25 UTC

[Bug 7729] New: body rules to match body only, not including the Subject

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

            Bug ID: 7729
           Summary: body rules to match body only, not including the
                    Subject
           Product: Spamassassin
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
          Assignee: dev@spamassassin.apache.org
          Reporter: guenther@rudersport.de
  Target Milestone: Undefined

The current body rule does not only match the actual body, but also includes
the Subject as pseudo-part of the body.

While this behaviour is clearly documented, people do get caught be this (meta
rule to match some phrases in the body AND subject). It is non-trivial to write
a rule matching the body only, or actually matching both the body and the
Subject.

M:SA:Conf
  "The message Subject header is considered part of the body
   and becomes the first paragraph when running the rules."

We should implement a tflags test flag for body rules to exclude the Subject
from matching.

Along with the next major branch 4.x we may consider making this the default,
with Subject prepending a tflags option.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|body rules to match body    |[review] body rules to
                   |only, not including the     |match body only, not
                   |Subject                     |including the Subject

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #21 from Kevin A. McGrail <km...@apache.org> ---
Thanks Henrik.  I appreciate all the extra work!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #10 from Kevin A. McGrail <km...@apache.org> ---
Thanks for the clarification.  I rescind my veto if the patch is different in
that 3.4 requires a tflag not to eval the subject as part of the body and 4.0
by default requires a tlag to eval the subject, that's ok.

How do you predict that each will handle the tflags for the other?  Assuming
harmlessly ignored?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #2 from Henrik Krohns <ap...@hege.li> ---
Only thing required to fix after that would probably be the schemeless uri
parser, which currently grabs uris and emails from the Subject. It's a one line
fix to code to add it back to it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #7 from Henrik Krohns <ap...@hege.li> ---
Kevin, are you allowed to decide things with only 2 votes? It's currently +1
-1.

The latest patch is 100% working, guaranteed.

Please someone else vote.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #8 from Henrik Krohns <ap...@hege.li> ---
Created attachment 5667
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5667&action=edit
Nosubject patch for 3.4

Patch for 3.4 attached. It 100% identical, just the sub has_tflags_nosubject {
1 } had to be put in place manually.

3.4 and trunk code are 100% identical the way the eval stuff works. There is
nothing this patch does differently between them.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #11 from RW <rw...@googlemail.com> ---
(In reply to Kevin A. McGrail from comment #10)

> 4.0 by default requires a tlag to eval the subject, that's ok.

I think that's a bad idea. Few body rules need the subject excluded, most
benefit from it, so practically all body rules will end-up needing a tflags
line.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #15 from Henrik Krohns <ap...@hege.li> ---
It seems "nosubject" is accepted for trunk, committing to get it out of the
way.

Sending        trunk/lib/Mail/SpamAssassin/Conf.pm
Sending        trunk/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1864791.

For 3.4 it's already +2 I think, I'll let Kevin say his final words, but I
think it's accepted regardless?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

John Hardin <jh...@impsec.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jhardin@impsec.org

--- Comment #13 from John Hardin <jh...@impsec.org> ---
-1 for removing body-matches-subject-too behavior entirely from any version

+1 for adding "tflags nosubject" to 4.0

Undecided about adding it to 3.4 as well; if it breaks 3.3 and earlier then
definitely -1, but if it's just ignored then the concern becomes the effects of
a rules getting interpreted (slightly) differently by different versions.

If we feel we want it in 3.4 then something like
"can(Mail::SpamAssassin::Conf::body_tflag_nosubject)" might be a good idea.

Since rules updates are generated from trunk, then that can() might be a good
idea regardless...

so: +1 for adding it to 3.4 along with
can(Mail::SpamAssassin::Conf::body_tflag_nosubject)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #18 from Henrik Krohns <ap...@hege.li> ---
- 3.4 is not touched
- trunk has nosubject flag

I will commit 3.4 now unless there's any real objections, there's already +1
from me and John.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #20 from Henrik Krohns <ap...@hege.li> ---

All done now.

I'm not working on anything on 3.4 anymore, so Kevin, feel free to try to put
rc4 out. :-)


Sending        spamassassin-3.4/UPGRADE
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Conf.pm
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ...done
Committing transaction...
Committed revision 1864877.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #19 from Kevin A. McGrail <km...@apache.org> ---
Yes, I'm +1 on both after discussion.  I had hoped to change the behavior for
4.0 but such is life.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #3 from Henrik Krohns <ap...@hege.li> ---
Ok vetoing my vote. Tested which rules hit subject, and there are maaaany.
Thinking further, some people will probably stick with 3.4 for years and years,
it would be nightmare to just remove Subject and try to maintain compatibility
for all..

Instead here is the extremely simple code required to implement a "nosubject"
tflag. Just add a small documentation blurt, and it can be committed to 3.4.3
very safely, please vote, +1 me.

--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -827,6 +827,9 @@
       $loopid++;
       my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
       $max = untaint_var($max);
+      if ($conf->{tflags}->{$rulename} =~ /\bnosubject\b/) {
+        $sub .= "\n".'shift @_;'; # remove subject line
+      }
       $sub .= '
       $hits = 0;
       body_'.$loopid.': foreach my $l (@_) {
@@ -844,6 +847,9 @@
     else {
       # omitting the "pos" call, "body_loopid" label, use of while()
       # instead of if() etc., shaves off 8 perl OPs.
+      if (($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/) {
+        $sub .= "\n".'shift @_;'; # remove subject line
+      }
       $sub .= '
       foreach my $l (@_) {
         '.$self->hash_line_for_rule($pms, $rulename).'

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Karsten Bräckelmann <gu...@rudersport.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Hardware|PC                          |All
   Target Milestone|Undefined                   |4.0.0
                 OS|Linux                       |All

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.0                       |3.4.3

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #9 from Henrik Krohns <ap...@hege.li> ---
Honestly I don't care if 3.4 is nuked, but this patch is super simple. If we
leave 3.4 without the feature, we might as well never use it, since 3.4 will be
around forever.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #17 from Kevin A. McGrail <km...@apache.org> ---
Henrik, can you recap what the current state is for 3.4 and for 4.0?  Are both
the same and you have a tflag for nosubject?

I just want to be clear on testing so I can get 3.4.3 to continue moving along.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #22 from Kevin A. McGrail <km...@apache.org> ---
Just a note that with rc4 or greater for 3.4.3, it worked as expected.

Here's how I tested:
Add this rule and an email with subject example and example not in the body

#Testing Rule for NoSubject Rules - See note 58246
if (version >= 3.004003)
        #SHOULD HIT
        body            NOSUBJECT_TEST_HIT      /example/i
        describe        NOSUBJECT_TEST_HIT      This should hit on an email
with example in the subject but not in the body because
subjects are automatically prepending for testing.

        #SHOULD NOT HIT
        body            NOSUBJECT_TEST_FAIL     /example/i
        describe        NOSUBJECT_TEST_FAIL     This should NOT hit on an email
with example in the subject not not in the body beca
use the tflag nosubject will stop the automatic prepending of subjects for
testing.
        tflags          NOSUBJECT_TEST_FAIL     nosubject
endif

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Kevin A. McGrail <km...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@apache.org
   Target Milestone|3.4.3                       |4.0.0

--- Comment #6 from Kevin A. McGrail <km...@apache.org> ---
There is no way this should go into 3.4 to be clear. Changing the milestone and
I will -1 vote for a 3.4.  TOTALLY +1 for 4.0 though I need to test the patch
more and I don't have 4.0 setup at the moment, just 3.4 RC's.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #4 from Henrik Krohns <ap...@hege.li> ---
Slightly revised patch, should not modify @_ as it's used elsewhere..

Eval string will include two extra nosubj codelines if flagged

    if ($scoresptr->{q{FOO}}) {
      my $nosubj = 1; ### IF FLAGGED
      foreach my $l (@_) {
        if ($nosubj) { $nosubj = 0; next; } ### IF FLAGGED
        if ($l =~ /$qrptr->{q{FOO}}/o) {
          $self->got_hit(q{FOO}, "BODY: ", ruletype => "body");
          last;
        }
      }
    }


--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -821,6 +821,12 @@
       dbg("rules-all: running body rule %s", q{'.$rulename.'});
       ';
     }
+    my $nosubject = ($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/;
+    if ($nosubject) {
+      $sub .= '
+      my $nosubj = 1;
+      ';
+    }
     if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
     {
       # support multiple matches
@@ -830,6 +836,13 @@
       $sub .= '
       $hits = 0;
       body_'.$loopid.': foreach my $l (@_) {
+      ';
+      if ($nosubject) {
+        $sub .= '
+        if ($nosubj) { $nosubj = 0; next; }
+        ';
+      }
+      $sub .= '
         pos $l = 0;
         '.$self->hash_line_for_rule($pms, $rulename).'
         while ($l =~ /$qrptr->{q{'.$rulename.'}}/go'. ($max? ' && $hits++ <
'.$max:'') .') {
@@ -846,6 +859,13 @@
       # instead of if() etc., shaves off 8 perl OPs.
       $sub .= '
       foreach my $l (@_) {
+      ';
+      if ($nosubject) {
+        $sub .= '
+        if ($nosubj) { $nosubj = 0; next; }
+        ';
+      }
+      $sub .= '
         '.$self->hash_line_for_rule($pms, $rulename).'
         if ($l =~ /$qrptr->{q{'.$rulename.'}}/o) {
           $self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "body");

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #16 from Henrik Krohns <ap...@hege.li> ---
Forgot docs..

Sending        trunk/UPGRADE
Transmitting file data .done
Committing transaction...
Committed revision 1864792.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #14 from Henrik Krohns <ap...@hege.li> ---
Already has the has_tflags_nosubject as mentioned.

No SpamAssassin version lints tflags, you can put any crap you want in them and
it won't croak.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] [review] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #12 from Henrik Krohns <ap...@hege.li> ---
(In reply to Kevin A. McGrail from comment #10)
> Thanks for the clarification.  I rescind my veto if the patch is different
> in that 3.4 requires a tflag not to eval the subject as part of the body and
> 4.0 by default requires a tlag to eval the subject, that's ok.

No, I think I will always vote -1 for 4.0 if you are suggesting to drop subject
by default.

People are still using even 3.3, this subject stuff has been in so long there
that it would be probably silly to change now. And even sillier to commit 3.4
with a "nosubject" flag and 4.0 with a "subject" flag.

I think very few rules actually would NOT want to match subject. So it makes
sense to have a "nosubject" flag, instead of "subject" flag.

But I'm working on so many things right now, it's hard to concentrate, I
reserve the right to change my mind later. :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #5 from Henrik Krohns <ap...@hege.li> ---
Created attachment 5666
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5666&action=edit
Complete tflags nosubject patch

Here complete patch along with documentation and
M::SA::Conf::has_tflags_nosubject.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7729] body rules to match body only, not including the Subject

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |apache@hege.li

--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
Yes it seems quite pointless shortcut to "maybe" reduce the amount of duplicate
rules.

It makes no sense to add separate tflag for this. If one wants to match
Subject, then it should be a separate header rule!

I vote +1 to remove legacy subject matching from body.

-- 
You are receiving this mail because:
You are the assignee for the bug.