You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2011/06/23 19:26:55 UTC
[Bug 6625] New: Bayes SQL schema treats bayes_token.token as char
instead of binary, fails chset checks
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6625
Bug #: 6625
Summary: Bayes SQL schema treats bayes_token.token as char
instead of binary, fails chset checks
Product: Spamassassin
Version: 3.3.2
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Documentation
AssignedTo: dev@spamassassin.apache.org
ReportedBy: Mark.Martinec@ijs.si
Classification: Unclassified
Panagiotis Christias wrote on the SA user ML on 2011-06-21
> I faced the same problem today. In my case, MySQL was configured to
> use utf8 by default:
>
> # my.cnf
> [client]
> default-character-set=utf8
> [mysqld]
> character-set-server=utf8
> collation-server=utf8_unicode_ci
> init_connect='set collation_connection = utf8_unicode_ci;'
>
> After commenting out the utf8 definitions and reverting back
> to latin1 "sa-learn --restore" worked fine.
As it turns out this is not the same problem as Bug 6624,
but an entirely independent one.
Lawrence writes on 2011-06-22:
> Ignore my last suggestion of starting from scratch. Try commenting out
> these lines (or similar ones) if present in /etc/my.cnf and restarting
> MySQL before attempting again
>
> default-character-set=utf8
> character-set-server=utf8
> collation-server=utf8_unicode_ci
> init_connect='set collation_connection = utf8_unicode_ci;'
Benny Pedersen posted his SQL schema which fixes the underlying
problem instead of covering it:
> CREATE TABLE IF NOT EXISTS `bayes_token` (
> `id` int(11) NOT NULL DEFAULT '0',
> `token` binary(5) NOT NULL,
> ...
> );
myself commented:
> Yes, the binary or varbinary is the key to a solution here.
> Mucking with utf-8 vs latin-1 is just covering but not solving
> the most glaring problem here, namely that a token must not be
> associated with any character set, as it does not obey any
> such rules, nor should it be treated case-insensitively
> (as char is, which is possibly a reason for more than two
> record changes as reported by Dave). Will take a closer look...
So in summary: as the bayes_token.token field will receive just
plain octets (binary data, not some ascii or other characters),
it must not be associated with any character set in SQL.
Treating a string as char or varchar may imply SQL checks
for data compliance with a chosen charset, implies collation
and implies case-insensitive matching (which is another hidden
bug here). As it happens the MySQL is stricter with UTF-8 checks
but rather lax with Latin-1 checks, which is why the suggested
workaround (avoiding UTF-8) is just a poor workaround which
happens to work most of the time with current versions of MySQL
but may break at any time (e.g. control chars are not a valid
Latin-1 characters).
Btw, the bayes_pg.sql schema for PostgreSQL already has this fix!
Attached is a trivial but essential fix for the bayes_mysql.sql.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6625] Bayes SQL schema treats bayes_token.token as char instead
of binary, fails chset checks
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6625
Mark Martinec <Ma...@ijs.si> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.4.0
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6625] Bayes SQL schema treats bayes_token.token as char instead
of binary, fails chset checks
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6625
--- Comment #1 from Mark Martinec <Ma...@ijs.si> 2011-06-23 17:30:52 UTC ---
trunk:
Bug 6625: Bayes SQL schema treats bayes_token.token as char
instead of binary, fails chset checks
Sending sql/bayes_mysql.sql
Committed revision 1139007.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6625] Bayes SQL schema treats bayes_token.token as char instead
of binary, fails chset checks
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6625
--- Comment #2 from Mark Martinec <Ma...@ijs.si> 2011-06-23 17:31:50 UTC ---
Created attachment 4925
--> https://issues.apache.org/SpamAssassin/attachment.cgi?id=4925
Change a data type of bayes_token.token to binary
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6625] Bayes SQL schema treats bayes_token.token as char instead
of binary, fails chset checks
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6625
Mark Martinec <Ma...@ijs.si> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #3 from Mark Martinec <Ma...@ijs.si> 2011-09-21 00:31:52 UTC ---
closing, fixed for 3.4
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.