You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2019/12/20 14:32:00 UTC

[jira] [Created] (DRILL-7493) convert_fromJSON and unicode

benj created DRILL-7493:
---------------------------

             Summary: convert_fromJSON and unicode
                 Key: DRILL-7493
                 URL: https://issues.apache.org/jira/browse/DRILL-7493
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.16.0
            Reporter: benj


transform a json string (with \uxxxx char) into json struct
{code:sql}
apache drill> SELECT x_str, convert_fromJSON(x_str) AS x_array 
FROM (SELECT '["test=\u0014=test"]' x_str);
+----------------------+----------------------+
|        x_str         |       x_array        |
+----------------------+----------------------+
| ["test=\u0014=test"] | ["test=\u0014=test"] |
+----------------------+----------------------+
{code}
Use json struct :
{code:sql}
apache drill> SELECT x_str
, x_array
, x_array[0] AS x_array0 
FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array
FROM (SELECT '["test=\u0014=test"]' x_str));
+----------------------+----------------------+-------------+
|        x_str         |       x_array        |  x_array0   |
+----------------------+----------------------+-------------+
| ["test=\u0014=test"] | ["test=\u0014=test"] | test==test |
+----------------------+----------------------+-------------+
{code}
Note that the char \u0014 is interpreted in x_array0

if using split function on x_array0, an array is built with non interpreted \uxxxx
{code:sql}
apache drill> SELECT x_str
, x_array
, x_array[0] AS x_array0
, split(x_array[0],',') AS x_array0_split 
FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
FROM (SELECT '["test=\u0014=test"]' x_str));
+----------------------+----------------------+-------------+----------------------+
|        x_str         |       x_array        |  x_array0   |    x_array0_split    |
+----------------------+----------------------+-------------+----------------------+
| ["test=\u0014=test"] | ["test=\u0014=test"] | test==test | ["test=\u0014=test"] |
+----------------------+----------------------+-------------+----------------------+
{code}
It's not possible to use convert_fromJSON on the interpreted \uxxxx
{code:sql}
SELECT x_str
, x_array
, x_array[0] AS x_array0
, split(x_array[0],',') AS x_array0_split
, convert_fromJSON('["' || x_array[0] || '"]') AS convertJSONerror 
FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
FROM (SELECT '["test=\u0014=test"]' x_str));
Error: DATA_READ ERROR: Illegal unquoted character ((CTRL-CHAR, code 20)): has to be escaped using backslash to be included in string value
 at [Source: (org.apache.drill.exec.vector.complex.fn.DrillBufInputStream); line: 1, column: 9]
{code}
don't work although the string is the same as the origin but \uxxxx is unfortunatly interpreted




--
This message was sent by Atlassian Jira
(v8.3.4#803005)