You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Markus Schuch (JIRA)" <ji...@apache.org> on 2019/01/24 23:57:00 UTC

[jira] [Commented] (CONNECTORS-1572) Support 4-byte characters for Strings stored in MySQL/MariaDB by default

    [ https://issues.apache.org/jira/browse/CONNECTORS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751709#comment-16751709 ] 

Markus Schuch commented on CONNECTORS-1572:
-------------------------------------------

Added a 🎮 character in the repository connector description string in {{RSSSimpleCrawlTester}} to successfully recreate the error.

When switching to {{utf8mb4}} the test still fails:
Reason is the 255 length varchar columns.
{quote}
InnoDB has a maximum index length of 767 bytes for tables that use COMPACT or REDUNDANT row format, so for utf8mb3 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively
{quote}
(https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-conversion.html)

A solution is to configure the database with the following settings:

{code}
innodb_file_format = barracuda
innodb_file_per_table = 1
innodb_large_prefix = 1
innodb_default_row_format = DYNAMIC
{code}

Also an update of the JDBC driver is required to automatically work correct with {{utf8mb4}} mode. (since 5.1.47+ or 8.0.13+). Otherwise the server setting {{character_set_server = utf8mb4}} must also be set.

> Support 4-byte characters for Strings stored in MySQL/MariaDB by default
> ------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1572
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1572
>             Project: ManifoldCF
>          Issue Type: Improvement
>            Reporter: Markus Schuch
>            Assignee: Markus Schuch
>            Priority: Major
>         Attachments: CONNECTORS-1572.patch
>
>
> DBInterfaceMySQL creates the database with {{utf8}} charset which does not support 4-byte characters in varchar columns. This can be a problem, if a String stored to the database (e.g. version string) contains such a character, e.g. emojis
> We should create the database with the {{utf8mb4}} charset and the {{utf8mb4_bin}} collation and document this setting to support this situation better.
> http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/201901.mbox/%3c27b8bf38-6822-61cc-cdbc-54dce5262217@web.de%3e



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)