You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2020/01/07 16:24:00 UTC
[jira] [Created] (DRILL-7515) ORDER BY clause produce error on GROUP BY with array field manager with any_value

benj created DRILL-7515:
---------------------------

             Summary: ORDER BY clause produce error on GROUP BY with array field manager with any_value
                 Key: DRILL-7515
                 URL: https://issues.apache.org/jira/browse/DRILL-7515
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Data Types
    Affects Versions: 1.17.0
            Reporter: benj


With a parquet containing an array field, for example:
{code:sql}
apache drill 1.17> CREATE TABLE dfs.TEST.`example_any_pqt` AS (SELECT 'foo' AS a, 'bar' b, split('foo,bar',',') as c);

apache drill 1.17> SELECT *, typeof(c) AS type, sqltypeof(c) AS sql_type FROM dfs.TEST.`example_any_pqt`;
+-----+-----+---------------+---------+----------+
|  a  |  b  |       c       |  type   | sql_type |
+-----+-----+---------------+---------+----------+
| foo | bar | ["foo","bar"] | VARCHAR | ARRAY    |
+-----+-----+---------------+---------+----------+
{code}
The next request work well
{code:sql}
apache drill 1.17> SELECT * FROM 
(SELECT a, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a)
ORDER BY a;
+-----+---------------+
|  a  |    EXPR$1     |
+-----+---------------+
| foo | ["foo","bar"] |
+-----+---------------+
{code}
But the next request (with the same struct as the previous request) failed
{code:sql}
apache drill 1.17> SELECT * FROM 
(SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b)
ORDER BY a;
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)], [`EXPR$2` (NULL:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)], [`EXPR$2` (VARCHAR:REPEATED), children=([`$data$` (VARCHAR:REQUIRED)])]], selectionVector=NONE]
Fragment 0:0
{code}
Note that the same request +without the order by+ works well. It's also possible to use intermediate table and apply the ORDER BY in a second time.
{code:sql}
apache drill 1.17> SELECT * FROM 
(SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b);
+-----+-----+---------------+
|  a  |  b  |    EXPR$2     |
+-----+-----+---------------+
| foo | bar | ["foo","bar"] |
+-----+-----+---------------+

apache drill 1.17> CREATE TABLE dfs.TEST.`ok_pqt` AS (SELECT * FROM (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b));
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 0_0      | 1                         |
+----------+---------------------------+
apache drill 1.17> SELECT * FROM dfs.TEST.`ok_pqt` ORDER BY a;
+-----+-----+---------------+
|  a  |  b  |    EXPR$2     |
+-----+-----+---------------+
| foo | bar | ["foo","bar"] |
+-----+-----+---------------+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)