You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2012/06/12 14:53:40 UTC

[Bug 6805] New: subject_is_all_caps

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

          Priority: P2
            Bug ID: 6805
          Assignee: dev@spamassassin.apache.org
           Summary: subject_is_all_caps
          Severity: normal
    Classification: Unclassified
                OS: Windows XP
          Reporter: stoked10@hotmail.com
          Hardware: PC
            Status: NEW
           Version: unspecified
         Component: Plugins
           Product: Spamassassin

I'd like to suggest adding:

$subject =~ s/[RE:|FWD:|FW:]//g; (or something similar)

to somewhere around line 901 in HeaderEval.pm

I had an issue with a subject like "RE:5" coming back as SUBJ_ALL_CAPS because
after stripping out everything but a-zA-Z you are left with a subject of "RE"

Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.4.0                       |3.4.1

--- Comment #12 from Kevin A. McGrail <km...@pccc.com> ---
Moving all open bugs where target is defined and 3.4.0 or lower to 3.4.1 target

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

--- Comment #5 from Jake <st...@hotmail.com> ---
I'm not certain what you need from me for a patch, but I'm using this:

$subject =~ s/[RE:|FWD:|FW:|AW:]//g;

in HeaderEval.pm. 

My example was a bad one, maybe a better one is "RE: 123456789" or even
"RE:RE:RE:RE: 123" basically I believe RE should not be apart of the subject
line evaluation. This additional also helps if you ever get a non-encoded
special or foreign character subject. In that case you pass the greater than 10
test, but when all characters but a-zA-Z are stripped, you are usually only
left with "RE or FWD" which fails the all caps test. 

Another potentially less desirable fix would be to move the "$subject =~
s/[^a-zA-Z]//g;" line higher in the sub before the check for 10. 

-Jake

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

--- Comment #8 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #7)
> I went for:
>   $subject =~ s/^(?:(?:Re|Fwd|Fw|Aw|Antwort|Sv):\s*)+//i;
> 
> 
> trunk (3.4):
>   Bug 6805: subject_is_all_caps - strip prefixes like Re:, Fwd:
>   Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
> Committed revision 1354506.

Sorry, I was working on this too.  What about just ignoring up to 7 a-z chars
in the beginning anchored by a colon?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #2 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #1)
> Sorry, AW: can also be added to that. Thanks

Jake, I'm open to adding something for that.  Perhaps it would trigger a
BLANK_FWD_RE_SUBJECT rule or something like that instead.

Can you work on a patch to submit?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

--- Comment #6 from Jake <st...@hotmail.com> ---
(In reply to comment #3)
> I think it would be a good idea to strip standard prefixes whatever the
> case. I don't see the point of a subject that starts with Re: and continues
> in capitals being let off because of the "e".
> 

That's an excellent point!I altered my line to:

$subject =~ tr/RE:|FWD:|FW:|AW://gi;

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

John Wilcock <jo...@tradoc.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |john@tradoc.fr

--- Comment #9 from John Wilcock <jo...@tradoc.fr> ---
(In reply to comment #8)
> (In reply to comment #7)
> > I went for:
> >   $subject =~ s/^(?:(?:Re|Fwd|Fw|Aw|Antwort|Sv):\s*)+//i;
> > 
> > 
> > trunk (3.4):
> >   Bug 6805: subject_is_all_caps - strip prefixes like Re:, Fwd:
> >   Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
> > Committed revision 1354506.
> 
> Sorry, I was working on this too.  What about just ignoring up to 7 a-z
> chars in the beginning anchored by a colon?

Sounds like a good idea to allow for other languages. Also note that some
French MUAs (older versions of Outlook Express IIRC, perhaps also Lotus) put a
space between the "Re" (or "Tr" (transfert)) and the colon, in line with normal
French punctuation rules.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

Jake <st...@hotmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stoked10@hotmail.com

--- Comment #1 from Jake <st...@hotmail.com> ---
Sorry, AW: can also be added to that. Thanks

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

--- Comment #11 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #10)
> > What about just ignoring up to 7 a-z chars in the beginning
> > anchored by a colon?
> 
> Checking my mailboxes I see cases like:
>   FYI: ...
>   SOLVED: ...
>   URGENT: ...
>   picnic: directions
>   Undeliverable: mailing list ... 
>   Rejected: ... mailing list memberships reminder
>   poll: what is ...
>   Geo::IP::Record file handle kept open
> 
> Don't know, 7 a-z chars may or may not be too liberal.
> 
> > Sounds like a good idea to allow for other languages. Also note that some
> > French MUAs (older versions of Outlook Express IIRC, perhaps also Lotus) put
> > a space between the "Re" (or "Tr" (transfert)) and the colon, in line with
> > normal French punctuation rules.
> 
> Thanks.
> 
> > Sorry, I was working on this too.
> 
> No prob, go ahead and finish what you had in mind.

Good points above and definitely some good science.  I need to look at my spam
folder for spams that have subjects that might fit as well.

Well I had two avenues I was working on both of which collide so I'll just
reiterate here.  

One should we be modifying the subject and Two, should we be modifying
SUB_ALL_CAPS to ignore the first X number of characters anchored by :.  It can
likely be ": " so a space is required if I had to guess but I haven't checked
subjects.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

--- Comment #10 from Mark Martinec <Ma...@ijs.si> ---
> What about just ignoring up to 7 a-z chars in the beginning
> anchored by a colon?

Checking my mailboxes I see cases like:
  FYI: ...
  SOLVED: ...
  URGENT: ...
  picnic: directions
  Undeliverable: mailing list ... 
  Rejected: ... mailing list memberships reminder
  poll: what is ...
  Geo::IP::Record file handle kept open

Don't know, 7 a-z chars may or may not be too liberal.

> Sounds like a good idea to allow for other languages. Also note that some
> French MUAs (older versions of Outlook Express IIRC, perhaps also Lotus) put
> a space between the "Re" (or "Tr" (transfert)) and the colon, in line with
> normal French punctuation rules.

Thanks.

> Sorry, I was working on this too.

No prob, go ahead and finish what you had in mind.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [Bug 6805] subject_is_all_caps

Posted by Axb <ax...@gmail.com>.
On 06/27/2012 04:00 PM, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805
>
> Mark Martinec <Ma...@ijs.si> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Version|unspecified                 |3.3.2
>     Target Milestone|Undefined                   |3.4.0
>
> --- Comment #7 from Mark Martinec <Ma...@ijs.si> ---
> I went for:
>    $subject =~ s/^(?:(?:Re|Fwd|Fw|Aw|Antwort|Sv):\s*)+//i;
>
>
> trunk (3.4):
>    Bug 6805: subject_is_all_caps - strip prefixes like Re:, Fwd:
>    Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
> Committed revision 1354506.
>

while you're at it:
WG:  (german "Weiterleitung" / Forward)
TR: french?

Axb



[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unspecified                 |3.3.2
   Target Milestone|Undefined                   |3.4.0

--- Comment #7 from Mark Martinec <Ma...@ijs.si> ---
I went for:
  $subject =~ s/^(?:(?:Re|Fwd|Fw|Aw|Antwort|Sv):\s*)+//i;


trunk (3.4):
  Bug 6805: subject_is_all_caps - strip prefixes like Re:, Fwd:
  Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Committed revision 1354506.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #3 from RW <rw...@googlemail.com> ---
I think it would be a good idea to strip standard prefixes whatever the case. I
don't see the point of a subject that starts with Re: and continues in capitals
being let off because of the "e".

The specific example of "RE:5" shouldn't be an problem because it's less than
the minimum of 10 characters.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6805] subject_is_all_caps

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6805

--- Comment #4 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #3)
> I think it would be a good idea to strip standard prefixes whatever the
> case. I don't see the point of a subject that starts with Re: and continues
> in capitals being let off because of the "e".
> 
> The specific example of "RE:5" shouldn't be an problem because it's less
> than the minimum of 10 characters.

My thoughts were to switch the case on his patch so tr instead of s.  but we
can look at some improvement and get some more eyes on that code!

-- 
You are receiving this mail because:
You are the assignee for the bug.