You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2021/04/29 16:25:35 UTC

[Bug 7901] New: Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

            Bug ID: 7901
           Summary: Direct usage of UTF-8 in subject triggering
                    SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules
           Product: Spamassassin
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
          Assignee: dev@spamassassin.apache.org
          Reporter: msta+sa@cinkciarz.pl
  Target Milestone: Undefined

Hi there.

Since almost 10 years we have RFC 6532 [1] available, that allows direct usage
of UTF-8 in email subject.

It's 2021 now, but SpamAssassing still increases scoring for fully valid and
compliant with IETF standards email messages, basically punishing people who
just want to write emails in their own language - in my cases it's Polish.

I don't like when someone tries to block my ability to write in Polish, yet
today I had to fight with mailing team in my company - they refused to send
valid message using Polish language, because it triggered SUBJ_ILLEGAL_CHARS
(1.1) and SUBJECT_NEEDS_ENCODING (0.1) rules. Why? Because I wanted to send an
email with "Security" word ("Bezpieczeństwo") in the subject.

Could you please look into this and update those rules? People out of US
deserve to feel like first class citizens, too - it's one of the basic reasons
why we have UTF-8 available.

[1] https://tools.ietf.org/html/rfc6532

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
                 CC|                            |apache@hege.li
             Status|NEW                         |RESOLVED

--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
I've now disabled the rules for SpamAssassin 3.4. The checks should work
properly with trunk/4.0 which has much better UTF-8 support.

Should go out with sa-update in a day or two.

Sending        trunk/rules/20_head_tests.cf
Transmitting file data .done
Committing transaction...
Committed revision 1889300.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

Michał Staruch <ms...@cinkciarz.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |msta+sa@cinkciarz.pl

--- Comment #2 from Michał Staruch <ms...@cinkciarz.pl> ---
Thank you very much for the quick reaction.

In the commit content I didn't notice anything about SUBJECT_NEEDS_ENCODING
rule - it works as kind of cascade and won't be needed, or this 2nd rule is
more complex problem, out of scope of this bug?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

--- Comment #6 from Henrik Krohns <ap...@hege.li> ---
If someone is desperate for the rules, just use trunk, or add them back in
local.cf, or use even some much simpler regex for just checking the illegal
chars - tweaking some random "ratio" is probably exercise in wasting time..
most of those legacy eval functions seem ripe for deprecation.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

--- Comment #7 from Henrik Krohns <ap...@hege.li> ---
PS. The real reason of disabling them was not being sure if I can control the
scores per version properly with if's here and there. I have zero interesting
playing with the GA. :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

--- Comment #3 from Henrik Krohns <ap...@hege.li> ---
SUBJECT_NEEDS_ENCODING depends on the disabled rules, it will not hit.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

--- Comment #4 from Michał Staruch <ms...@cinkciarz.pl> ---
Okay, I understand - thanks for clarification.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7901] Direct usage of UTF-8 in subject triggering SUBJ_ILLEGAL_CHARS and SUBJECT_NEEDS_ENCODING rules

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7901

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #5 from RW <rw...@googlemail.com> ---

Removing these rules from 3.x seems a bit premature, they may be empirically
useful for some. It probably depends on where your mail comes from, but I've
never seem a UTF-8 Subject without mime-encoding in legitimate mail.
SUBJ_ILLEGAL_CHARS has an S/O of 0.976 for me. Scoring them down to 0.001 seems
better.

-- 
You are receiving this mail because:
You are the assignee for the bug.