You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Ian Boston (JIRA)" <ji...@apache.org> on 2016/02/26 13:46:18 UTC

[jira] [Commented] (OAK-2920) RDBDocumentStore: fail init when database config seems to be inadequate

    [ https://issues.apache.org/jira/browse/OAK-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168925#comment-15168925 ] 

Ian Boston commented on OAK-2920:
---------------------------------

If the DB config is broken, then content that expects UTF8 in the Path will fail to import as the IDs will be rejected as duplicates. For instance any application that stores i18n content in the repository and needs to work with any language that has double byte characters (eg German) will fail. ID duplicates are easy to detect. Much harder to detect is data corruption within JCR properties as a user using Oak via a WebUI could suspect any of the links between the Browser and the DB as the source of UTF8 corruption.

Taking mySQL as an example. Without utf8, Characters in common use in EU countries cant be stored as JCR properties. http://www.periodni.com/unicode_utf-8_encoding.html. Without utf8mb4, supplementary UTF8 characters can't be stored as JCR properties. http://www.i18nguy.com/unicode/supplementary-test.html

For those reasons, any database or JDBC connection that is misconfigured is likely to cause considerable problems in production and probably won't work with most modern applications that have been internationalised or need to mention the Euro.  € &#8364;

One approach to detect this is to write a row to the nodes table containing supplementary UTF8 characters, commit the row, and then read the same row back, verifying that the data survived the round trip. Finally delete the row. The ID of the row can be something that Oak would never use with a low probability of collision with other Oak instances in the same cluster. (ie ms timestamp eg 21313412313:utf8test). If there is a concern about tables other than the nodes table, then those can be tested as well.

A switch should be provided to allow those who have managed to run Oak in production with a misconfigured database to at least keep running in production while they correct the issue. For mySQL this might be as simple as correcting the JDBC url to include utf8mb4 encoding.

> RDBDocumentStore: fail init when database config seems to be inadequate
> -----------------------------------------------------------------------
>
>                 Key: OAK-2920
>                 URL: https://issues.apache.org/jira/browse/OAK-2920
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: rdbmk
>            Reporter: Julian Reschke
>            Priority: Minor
>              Labels: resilience
>
> It has been suggested that the implementation should fail to start (rather than warn) when it detects a DB configuration that is likely to cause problems (such as wrt character encoding or collation sequences)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)