You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by he...@apache.org on 2021/05/30 05:14:19 UTC
svn commit: r1890317 - in /spamassassin/trunk: UPGRADE
lib/Mail/SpamAssassin/Conf.pm lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Author: hege
Date: Sun May 30 05:14:19 2021
New Revision: 1890317
URL: http://svn.apache.org/viewvc?rev=1890317&view=rev
Log:
Enable normalize_charset by default (Bug 7656)
Modified:
spamassassin/trunk/UPGRADE
spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm
spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Modified: spamassassin/trunk/UPGRADE
URL: http://svn.apache.org/viewvc/spamassassin/trunk/UPGRADE?rev=1890317&r1=1890316&r2=1890317&view=diff
==============================================================================
--- spamassassin/trunk/UPGRADE (original)
+++ spamassassin/trunk/UPGRADE Sun May 30 05:14:19 2021
@@ -2,6 +2,12 @@
Note for Users Upgrading to SpamAssassin 4.0.0
----------------------------------------------
+- Setting normalize_charset is now enabled by default. Note that rules
+ should not expect specific non-UTF8 or UTF8 encoding in body. Matching is
+ done against raw bytes, which may very depending on normalize_charset
+ setting and whether decoding to UTF8 was successful.
+ See: http://wiki.apache.org/spamassassin/WritingRulesAdvanced
+
- Meta rules no longer use priority values, they are evaluated dynamically
when the rules they depend on are finished. (Bug 7735)
Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm
URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm?rev=1890317&r1=1890316&r2=1890317&view=diff
==============================================================================
--- spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm (original)
+++ spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm Sun May 30 05:14:19 2021
@@ -1250,7 +1250,7 @@ Select the locales to allow from the lis
type => $CONF_TYPE_STRING,
});
-=item normalize_charset ( 0 | 1 ) (default: 0)
+=item normalize_charset ( 0 | 1 ) (default: 1)
Whether to decode non- UTF-8 and non-ASCII textual parts and recode them
to UTF-8 before the text is given over to rules processing. The character
@@ -1272,7 +1272,7 @@ it will be used if it is available.
push (@cmds, {
setting => 'normalize_charset',
- default => 0,
+ default => 1,
type => $CONF_TYPE_BOOL,
code => sub {
my ($self, $key, $value, $line) = @_;
@@ -3182,6 +3182,12 @@ non-text MIME parts are stripped, and th
Quoted-Printable or Base-64-encoded format if necessary. Parts declared as
text/html will be rendered from HTML to text.
+Body is processed as a raw byte string, which means Unicode-specific regex
+features like \p{} can NOT be used for matching. The normalize_charset
+setting will also affect how raw bytes are presented. Rules in .cf files
+should be written portably - to match "a with umlaut" character, look for
+both LATIN1 and UTF8 raw byte variants: /(?:\xE4|\xC3\xA4)/
+
All body paragraphs (double-newline-separated blocks text) are turned into a
line breaks removed, whitespace normalized single line. Any lines longer
than 2kB are split into shorter separate lines (from a boundary when
Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm
URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm?rev=1890317&r1=1890316&r2=1890317&view=diff
==============================================================================
--- spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm (original)
+++ spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm Sun May 30 05:14:19 2021
@@ -252,11 +252,10 @@ our @OPTIONAL_MODULES = (
{
module => 'Encode::Detect::Detector',
version => 0,
- desc => 'If you plan to use the normalize_charset config setting to
- decode message parts from their declared character set into Unicode, and
- such decoding fails, the Encode::Detect::Detector module (when available)
- may be consulted to provide an alternative guess on a character set of a
- problematic message part.',
+ desc => 'If normalize_charset decoding of message parts from their
+ declared character set into Unicode fails, the Encode::Detect::Detector
+ module (when available) may be consulted to provide an alternative guess
+ on a character set of a problematic message part.',
},
{
module => 'Net::Patricia',