You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2014/06/27 20:22:22 UTC

[Bug 7063] New: Test for text/plain claiming to be ASCII but isn't

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

            Bug ID: 7063
           Summary: Test for text/plain claiming to be ASCII but isn't
           Product: Spamassassin
           Version: 3.4.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: philipp@redfish-solutions.com

Created attachment 5210
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5210&action=edit
Initial attempt at test logic

If a MIME part claims to be text/plain or text/plain;charset=us-ascii and the
Content-Transfer-Encoding is 7bit (either explicitly or by default), then we
should enforce the actual text being only TAB, NL, SPACE through TILDE, i.e.
all 7bit characters excluding NO-WS-CTL (per RFC-2822).

All mainstream MTA's get this right.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Quanah Gibson-Mount <qu...@zimbra.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |quanah@zimbra.com

--- Comment #14 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #12)
> Created attachment 5216 [details]
> Don't need /a qualifier after all if we use octal escapes

Shouldn't these lines be part of a dbg statement?

Something like:

if(would_log('dbg', 'check')) {
           my $str = $_;
           $str =~ s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eag;
          dbg("check_for_ascii_text_illegal: Saw match " . $str . "\n");
}

Not sure "check" is the right facility, but seems like it.

For the dbg line:

dbg("check: check_for_ascii_text_illegal: Saw match " . $str . "\n");

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

paul.stead@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |paul.stead@gmail.com

--- Comment #42 from paul.stead@gmail.com ---
Created attachment 5224
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5224&action=edit
Should have been caught?

Hi,

I assumed this patch would catch emails similar to the attached - it doesn't
seem to fire. The rule is firing for other emails through my system though -
has the method been adjusted to get around this patch or is it my patch
application? (3.3x)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5210|0                           |1
        is obsolete|                            |

--- Comment #4 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5212
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5212&action=edit
Redux of patch

Posted wrong (buggy) earlier version of patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5215|0                           |1
        is obsolete|                            |

--- Comment #12 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5216
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5216&action=edit
Don't need /a qualifier after all if we use octal escapes

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #41 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #40)
> (In reply to Quanah Gibson-Mount from comment #39)
> > I've seen it hit twice so far in the last 12 hours or so.  Both seem to be
> > valid email.  Example, postfix list digest:
> > 
> > Jul 12 07:14:53 edge02 amavis[59326]: (59326-09) spam-tag,
> > <ow...@cloud9.net> -> <xy...@zimbra.com>, No, score=1.251
> > tagged_above=-10 required=3 tests=[BAYES_00=-0.05,
> > PP_MIME_FAKE_ASCII_TEXT=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01,
> > RP_MATCHES_RCVD=-0.8, SPF_PASS=-0.001, URI_HEX=1.122] autolearn=no
> > autolearn_force=no
> 
> Can you attach the FP's to this bug so I can figure out why we're seeing
> them?  Thanks.

I'll see if I can get copies from the users.  I.e., I'm running this on my
production mail system, the emails are not necessarily ones I receive. ;)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5214|0                           |1
        is obsolete|                            |

--- Comment #9 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5215
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5215&action=edit
Fix typo and include NL (the message hasn't been chomp'd)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #13 from Philip Prindeville <ph...@redfish-solutions.com> ---
Kevin: any chance of this getting into the next release?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #7 from RW <rw...@googlemail.com> ---
(In reply to Philip Prindeville from comment #0)
> Created attachment 5210 [details]
> Initial attempt at test logic
> 
> If a MIME part claims to be text/plain or text/plain;charset=us-ascii and
> the Content-Transfer-Encoding is 7bit (either explicitly or by default),
> then we should enforce the actual text being only TAB, NL, SPACE through
> TILDE, i.e. all 7bit characters excluding NO-WS-CTL (per RFC-2822).

That's not how body text is defined in RFC-2822.

The definition of 7bit in mime is in RFC 2045: 

"2.7.  7bit Data

   "7bit data" refers to data that is all represented as relatively
   short lines with 998 octets or less between CRLF line separation
   sequences [RFC-821].  No octets with decimal values greater than 127
   are allowed and neither are NULs (octets with decimal value 0).  CR
   (decimal value 13) and LF (decimal value 10) octets only occur as
   part of CRLF line separation sequences."

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #33 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Quanah Gibson-Mount from comment #32)
> New patch loads successfully, thanks!

So I guess the last issue is that rule updates will kill this, right?  Or once
this is committed, will it be part of rule updates for 3.4.0?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #15 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Quanah Gibson-Mount from comment #14)
> (In reply to Philip Prindeville from comment #12)
> > Created attachment 5216 [details]
> > Don't need /a qualifier after all if we use octal escapes
> 
> Shouldn't these lines be part of a dbg statement?

Well, yes, except that Plugin/MIMEEval.pm doesn't include Logger.pm by default.

> Something like:
> 
> if(would_log('dbg', 'check')) {
>            my $str = $_;
>            $str =~ s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eag;
>           dbg("check_for_ascii_text_illegal: Saw match " . $str . "\n");
> }
> 
> Not sure "check" is the right facility, but seems like it.

It would be 'eval', I think.  But would_log() is also part of that module.

> For the dbg line:
> 
> dbg("check: check_for_ascii_text_illegal: Saw match " . $str . "\n");

BodyEval.pm and HeaderEval.pm both use 'eval:' as their facility.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #34 from Kevin A. McGrail <km...@pccc.com> ---
For trunk:


svn commit -m 'Added code for check_for_ascii_text_illegal in MIMEEval and
added test rule to sandbox'        
Sending        lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Adding         rulesrc/sandbox/kmcgrail/20_bug_7063.cf
Transmitting file data ..
Committed revision 1607729.


NOTE: Rule added with additional plugin check to ensure MIMEEval is enabled.

ifplugin Mail::SpamAssassin::Plugin::MIMEEval
  if
can(Mail::SpamAssassin::Plugin::MIMEEval::has_check_for_ascii_text_illegal)
    body     PP_MIME_FAKE_ASCII_TEXT 
eval:check_for_ascii_text_illegal('mime_fake_ascii_text')
    describe PP_MIME_FAKE_ASCII_TEXT  MIME text/plain claims to be ASCII but
isn't
    score    PP_MIME_FAKE_ASCII_TEXT  1.0
    tflags   PP_MIME_FAKE_ASCII_TEXT  publish
  endif
endif

Next week, I'll check on RuleQA and consider adding this on our production
servers for more feedback.

But the 1.0 should be a ceiling for masscheck to determine the best score if it
is promotable.

Regards,
KAM

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #47 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Kevin A. McGrail from comment #46)
> Reopening because the associated rule appears to have introduced a saw
> ampersand problem which can slow down regexp in perl prior to 5.17.7
> 
> This is caused by line 405
> lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
> $str =~ s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eg;
> 
> Can we re-engineer this line without the $&?
> 
> Philip?  This is a blocker to 3.4.1.

Would parens and $1 be faster?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #39 from Quanah Gibson-Mount <qu...@zimbra.com> ---
I've seen it hit twice so far in the last 12 hours or so.  Both seem to be
valid email.  Example, postfix list digest:

Jul 12 07:14:53 edge02 amavis[59326]: (59326-09) spam-tag,
<ow...@cloud9.net> -> <xy...@zimbra.com>, No, score=1.251
tagged_above=-10 required=3 tests=[BAYES_00=-0.05, PP_MIME_FAKE_ASCII_TEXT=1,
RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.8,
SPF_PASS=-0.001, URI_HEX=1.122] autolearn=no autolearn_force=no

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5216|0                           |1
        is obsolete|                            |

--- Comment #19 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5217
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5217&action=edit
Commit-ready diff against svn trunk including test

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #6 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Kevin A. McGrail from comment #2)
> I'm interested in seeing what these does on ruleqa.  i.e. how much
> real-world ham has this indicator.

Make sure you try the right version... doh!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #40 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Quanah Gibson-Mount from comment #39)
> I've seen it hit twice so far in the last 12 hours or so.  Both seem to be
> valid email.  Example, postfix list digest:
> 
> Jul 12 07:14:53 edge02 amavis[59326]: (59326-09) spam-tag,
> <ow...@cloud9.net> -> <xy...@zimbra.com>, No, score=1.251
> tagged_above=-10 required=3 tests=[BAYES_00=-0.05,
> PP_MIME_FAKE_ASCII_TEXT=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01,
> RP_MATCHES_RCVD=-0.8, SPF_PASS=-0.001, URI_HEX=1.122] autolearn=no
> autolearn_force=no

Can you attach the FP's to this bug so I can figure out why we're seeing them? 
Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #43 from Steadramon <pa...@gmail.com> ---
Seen https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7068

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #22 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Quanah Gibson-Mount from comment #20)
> Comment on attachment 5217 [details]
> Commit-ready diff against svn trunk including test
> 
> The new patch seems to miss the use statement for the Logger module?

It turns out that 3.4.0 (which I was testing against) didn't yet have it but
the latest commit to trunk added it in:

r1567777 | kmcgrail | 2014-02-12 15:32:19 -0700 (Wed, 12 Feb 2014) | 1 line

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #23 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #22)
> (In reply to Quanah Gibson-Mount from comment #20)
> > Comment on attachment 5217 [details]
> > Commit-ready diff against svn trunk including test
> > 
> > The new patch seems to miss the use statement for the Logger module?
> 
> It turns out that 3.4.0 (which I was testing against) didn't yet have it but
> the latest commit to trunk added it in:

Ah, cool.  I want to apply this to my 3.4.0 installation, so I'll just add it
in manually. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #26 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Quanah Gibson-Mount from comment #25)
> (In reply to Philip Prindeville from comment #24)
> > (In reply to Quanah Gibson-Mount from comment #23)
> > 
> > > Ah, cool.  I want to apply this to my 3.4.0 installation, so I'll just add
> > > it in manually. :)
> > 
> > Great! Let us know how well it works out for you.
> 
> Sadly it doesn't:
> 
> Starting amavisd...Bareword found where operator expected at
> /opt/zimbra/zimbramon/lib/Mail/SpamAssassin/Plugin/MIMEEval.pm line 360,
> near "s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eag"
> fetch_modules: error loading optional module
> Mail/SpamAssassin/Plugin/MIMEEval.pm:
>   syntax error at
> /opt/zimbra/zimbramon/lib/Mail/SpamAssassin/Plugin/MIMEEval.pm line 360,
> near "s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eag"
>   Compilation failed in require at /opt/zimbra/amavisd/sbin/amavisd line 207.
> done.

Bugger.  I forgot to remove the 'a' qualifier from the regex: turns out we
don't need it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5217|0                           |1
        is obsolete|                            |

--- Comment #29 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5218
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5218&action=edit
Per Quanah's comment we don't need the /a qualifier

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #17 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Quanah Gibson-Mount from comment #16)

> I see, none of the plugins load Logger.pm.  Oh well. :)

Quite the opposite: all of them seem to EXCEPT for MIMEEval.pm. Not sure why
that is.

% egrep -L '^use.*Logger;$' Plugin/*
Plugin/AntiVirus.pm
Plugin/FreeMail.pm
Plugin/HTMLEval.pm
Plugin/MIMEEval.pm
Plugin/Test.pm
Plugin/WhiteListSubject.pm
%

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #37 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to Quanah Gibson-Mount from comment #36)
> Ok, I just added this to my own local rules file then:
> 
> ifplugin Mail::SpamAssassin::Plugin::MIMEEval
>   if
> can(Mail::SpamAssassin::Plugin::MIMEEval::has_check_for_ascii_text_illegal)
>     body PP_MIME_FAKE_ASCII_TEXT     
> eval:check_for_ascii_text_illegal('mime_fake_ascii_text')
>     describe PP_MIME_FAKE_ASCII_TEXT  MIME text/plain claims to be ASCII but
> isn't
>     score    PP_MIME_FAKE_ASCII_TEXT  1.0
>   endif
> endif
> 
> thanks!

Yep.  Please report back your experience.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |philipp@redfish-solutions.c
                   |                            |om

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #49 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to Philip Prindeville from comment #48)
> Created attachment 5261 [details]
> Replace $& with $1 and parens
> 
> According to:
> 
> http://www.effectiveperlprogramming.com/2010/06/detect-regular-expression-
> match-variable-setting-in-your-code/
> 
> using parens and $1 is the way to go.

Looks like a plan, thanks.

svn commit -m 'removing sawampersand thanks to Philip Prindeville - bug 7063'
lib/Mail/SpamAssassin/Plugin/MIMEEval.pm 
Sending        lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Transmitting file data .
Committed revision 1646143.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #45 from Kevin A. McGrail <km...@pccc.com> ---
Per bug 7068 comment 6, the eval for this rule doesn't require the name of the
rule.

Removing that and improving the documentation.

 svn commit -m 'Cleaned up documentation and removed rule name parameter that
was not needed on the rule'
Sending        lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Sending        rulesrc/sandbox/kmcgrail/20_bug_7063.cf
Transmitting file data ..
Committed revision 1614685.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #28 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #27)
> Also, what version of Perl are you running on?
> 
> Fedora 20 uses 5.18.2 so I didn't notice the 5.14-ism was still there.

I support multiple platforms (RHEL6, RHEL7, UBUNTU10, UBUNTU12, UBUNTU14, and
SLES11).  They have a variety of perl versions:

[build@zre-rhel6-64 ~]$ perl --version

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi

[build@zre-rhel7-64 ~]$ perl --version

This is perl 5, version 16, subversion 3 (v5.16.3) built for
x86_64-linux-thread-multi

build@zre-sles11-64:~> perl --version

This is perl, v5.10.0 built for x86_64-linux-thread-multi

build@zre-ubuntu10-64:~$ perl --version

This is perl, v5.10.1 (*) built for x86_64-linux-gnu-thread-multi

build@zre-ubuntu12-64:~$ perl --version

This is perl 5, version 14, subversion 2 (v5.14.2) built for
x86_64-linux-gnu-thread-multi

build@zre-ubuntu14-64:~$ perl --version

This is perl 5, version 18, subversion 2 (v5.18.2) built for
x86_64-linux-gnu-thread-multi

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #35 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to Quanah Gibson-Mount from comment #33)
> (In reply to Quanah Gibson-Mount from comment #32)
> > New patch loads successfully, thanks!
> 
> So I guess the last issue is that rule updates will kill this, right?  Or
> once this is committed, will it be part of rule updates for 3.4.0?

It will become part of the update rule package assuming it meets S/O thresholds
for rule promotion.

But the rule will only fire for those running trunk because the code doesn't
exist otherwise.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---
   Target Milestone|Undefined                   |3.4.1
           Severity|normal                      |blocker

--- Comment #46 from Kevin A. McGrail <km...@pccc.com> ---
Reopening because the associated rule appears to have introduced a saw
ampersand problem which can slow down regexp in perl prior to 5.17.7

This is caused by line 405
lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
$str =~ s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eg;

Can we re-engineer this line without the $&?

Philip?  This is a blocker to 3.4.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #18 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #17)
> (In reply to Quanah Gibson-Mount from comment #16)
> 
> > I see, none of the plugins load Logger.pm.  Oh well. :)
> 
> Quite the opposite: all of them seem to EXCEPT for MIMEEval.pm. Not sure why
> that is.
> 
> % egrep -L '^use.*Logger;$' Plugin/*
> Plugin/AntiVirus.pm
> Plugin/FreeMail.pm
> Plugin/HTMLEval.pm
> Plugin/MIMEEval.pm
> Plugin/Test.pm
> Plugin/WhiteListSubject.pm
> %

Heh, oops, had my grep wrong.  Maybe make it part of your patch then? ;)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #3 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Kevin A. McGrail from comment #2)
> Do you have a rule you wrote that uses this too?

body L_FAKE_ASCII               
eval:check_for_ascii_text_illegal('L_FAKE_ASCII')
describe L_FAKE_ASCII           Claims to be ascii but isn't
score L_FAKE_ASCII              6.5


> We might need to write a has_xyz function to conditionalize the rule.

Where xyz would be testing for... what?

Only requirement I can think of is:

require v5.14.0;


> I'm interested in seeing what these does on ruleqa.  i.e. how much
> real-world ham has this indicator.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #38 from Kevin A. McGrail <km...@pccc.com> ---
Rule was promoted and hit the 1.0 score ceiling in ruleqa with a pretty good
s/o but based largely on AXB's corpora.  Definitely some interesting results!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #21 from Kevin A. McGrail <km...@pccc.com> ---
Will take a look at committing this shortly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #24 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Quanah Gibson-Mount from comment #23)

> Ah, cool.  I want to apply this to my 3.4.0 installation, so I'll just add
> it in manually. :)

Great! Let us know how well it works out for you.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #11 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Karsten Bräckelmann from comment #10)
> Quoting the patch, attachment 5215 [details]:
> 
> > # Note: we rely on the /a modifier in Perl 5.14 and later
> 
> I am against raising the Perl dependency that drastically.
> 
> > if (m/[\x00\x0d\x80-\xff]+/a) {
> 
> The /a modifier applies to Posix character classes only, I believe. It does
> not have any impact on arbitrary hex escaped char ranges.
> 
> Without actually testing, the /a modifier should not be necessary and thus
> removed. Which also removes the dependency on Perl 5.14. ;)

Trying it now. Will update the patch if this works.

Any reason why Plugin/MIMEEval.pm doesn't include Logger?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5213|0                           |1
        is obsolete|                            |

--- Comment #8 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5214
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5214&action=edit
Fixed to allow all 7-bit chars except NUL, CR, NL

Fixing per comment #7.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #27 from Philip Prindeville <ph...@redfish-solutions.com> ---
Also, what version of Perl are you running on?

Fedora 20 uses 5.18.2 so I didn't notice the 5.14-ism was still there.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|3.4.0                       |SVN Trunk (Latest Devel
                   |                            |Version)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #44 from Philip Prindeville <ph...@redfish-solutions.com> ---
(In reply to Steadramon from comment #43)
> Seen https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7068

Yeah, 7068 should catch this.

I run it with a threshold of 0.05 (5%).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #36 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Kevin A. McGrail from comment #35)
> (In reply to Quanah Gibson-Mount from comment #33)
> > (In reply to Quanah Gibson-Mount from comment #32)
> > > New patch loads successfully, thanks!
> > 
> > So I guess the last issue is that rule updates will kill this, right?  Or
> > once this is committed, will it be part of rule updates for 3.4.0?
> 
> It will become part of the update rule package assuming it meets S/O
> thresholds for rule promotion.
> 
> But the rule will only fire for those running trunk because the code doesn't
> exist otherwise.

Ok, I just added this to my own local rules file then:

ifplugin Mail::SpamAssassin::Plugin::MIMEEval
  if
can(Mail::SpamAssassin::Plugin::MIMEEval::has_check_for_ascii_text_illegal)
    body PP_MIME_FAKE_ASCII_TEXT     
eval:check_for_ascii_text_illegal('mime_fake_ascii_text')
    describe PP_MIME_FAKE_ASCII_TEXT  MIME text/plain claims to be ASCII but
isn't
    score    PP_MIME_FAKE_ASCII_TEXT  1.0
  endif
endif

thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #10 from Karsten Bräckelmann <gu...@rudersport.de> ---
Quoting the patch, attachment 5215:

> # Note: we rely on the /a modifier in Perl 5.14 and later

I am against raising the Perl dependency that drastically.

> if (m/[\x00\x0d\x80-\xff]+/a) {

The /a modifier applies to Posix character classes only, I believe. It does not
have any impact on arbitrary hex escaped char ranges.

Without actually testing, the /a modifier should not be necessary and thus
removed. Which also removes the dependency on Perl 5.14. ;)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #20 from Quanah Gibson-Mount <qu...@zimbra.com> ---
Comment on attachment 5217
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5217
Commit-ready diff against svn trunk including test

The new patch seems to miss the use statement for the Logger module?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #25 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #24)
> (In reply to Quanah Gibson-Mount from comment #23)
> 
> > Ah, cool.  I want to apply this to my 3.4.0 installation, so I'll just add
> > it in manually. :)
> 
> Great! Let us know how well it works out for you.

Sadly it doesn't:

Starting amavisd...Bareword found where operator expected at
/opt/zimbra/zimbramon/lib/Mail/SpamAssassin/Plugin/MIMEEval.pm line 360, near
"s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eag"
fetch_modules: error loading optional module
Mail/SpamAssassin/Plugin/MIMEEval.pm:
  syntax error at
/opt/zimbra/zimbramon/lib/Mail/SpamAssassin/Plugin/MIMEEval.pm line 360, near
"s/[\x00\x0d\x80-\xff]+/'<' . unpack('H*', $&) . '>'/eag"
  Compilation failed in require at /opt/zimbra/amavisd/sbin/amavisd line 207.
done.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #31 from Kevin A. McGrail <km...@pccc.com> ---
OK, I'm working on a commit.

Please hold off on more tweaks for a few minutes ;-)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5212|0                           |1
        is obsolete|                            |

--- Comment #5 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5213
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5213&action=edit
2nd redux of patch

We need to ignore the enclosing (header) Content-Type and only look at the
leaves' Content-Type.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |axb.lists@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #16 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #15)
> (In reply to Quanah Gibson-Mount from comment #14)
> > (In reply to Philip Prindeville from comment #12)
> > > Created attachment 5216 [details]
> > > Don't need /a qualifier after all if we use octal escapes
> > 
> > Shouldn't these lines be part of a dbg statement?
> 
> Well, yes, except that Plugin/MIMEEval.pm doesn't include Logger.pm by
> default.

I see, none of the plugins load Logger.pm.  Oh well. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #32 from Quanah Gibson-Mount <qu...@zimbra.com> ---
New patch loads successfully, thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #48 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5261
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5261&action=edit
Replace $& with $1 and parens

According to:

http://www.effectiveperlprogramming.com/2010/06/detect-regular-expression-match-variable-setting-in-your-code/

using parens and $1 is the way to go.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #1 from Philip Prindeville <ph...@redfish-solutions.com> ---
Created attachment 5211
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5211&action=edit
Actual message which inspired test

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #2 from Kevin A. McGrail <km...@pccc.com> ---
Do you have a rule you wrote that uses this too?

We might need to write a has_xyz function to conditionalize the rule.

I'm interested in seeing what these does on ruleqa.  i.e. how much real-world
ham has this indicator.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7063] Test for text/plain claiming to be ASCII but isn't

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063

--- Comment #30 from Quanah Gibson-Mount <qu...@zimbra.com> ---
(In reply to Philip Prindeville from comment #29)
> Created attachment 5218 [details]
> Per Quanah's comment we don't need the /a qualifier

Thanks!  I'll grab and test. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.