You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Ilya Kasnacheev (JIRA)" <ji...@apache.org> on 2019/01/07 23:55:00 UTC

[jira] [Commented] (IGNITE-10732) Incorrect file.encoding leads to inconsistent SqlFieldsQuery results between nodes

    [ https://issues.apache.org/jira/browse/IGNITE-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736499#comment-16736499 ] 

Ilya Kasnacheev commented on IGNITE-10732:
------------------------------------------

[~dpavlov] Even if encoding is consistent between nodes, it might lead to data corruption when Unicode strings are encoded to 8-bit encoding and then de-encoded. Some characters will turn into ?'s as they're not representable in a given 8-bit charset. Therefore, we should keep the warning. They may still run but they need to be aware. We have quite a few warnings anyway which are printed even with default configuration.

Maybe we should also check real inconsistency in the cluster, reject nodes which have file.encoding which is not consistent to existing ones. I think this will demand another ticket.

> Incorrect file.encoding leads to inconsistent SqlFieldsQuery results between nodes
> ----------------------------------------------------------------------------------
>
>                 Key: IGNITE-10732
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10732
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>    Affects Versions: 2.4
>            Reporter: Ilya Kasnacheev
>            Assignee: Ilya Kasnacheev
>            Priority: Critical
>              Labels: windows
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When doing 
> {code}
> cache.query(new SqlFieldsQuery("SELECT _key FROM Cache"))
> {code}
> resulting Unicode values may be different when coming from Windows or Linux node.
> Linux nodes will mostly use UTF-8 but Windows nodes will use local CpNNNN encoding to encode query results, as bizzare as it may sound.
> Windows < - > Windows and Linux < - > Linux will get correct result but Windows < - > Linux will get broken strings.
> Note that if cluster has Windows and Linux nodes and cache is REPLICATED, results will be different for subsequent queries!
> There is a workaround for this: set -Dfile.encoding=UTF-8 JVM arg on Windows.
> There is probably an underlying problem in H2 but since non-UTF-8 file.encoding is dangerous (it affects String.getBytes()) I think we should peg it to UTF-8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)