You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vova Vysotskyi (Jira)" <ji...@apache.org> on 2020/01/10 13:17:00 UTC
[jira] [Commented] (DRILL-7515) ORDER BY clause produce error on GROUP BY with array field manager with any_value

    [ https://issues.apache.org/jira/browse/DRILL-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012854#comment-17012854 ] 

Vova Vysotskyi commented on DRILL-7515:
---------------------------------------

It looks like the issue there is in {{StreamingAggBatch}} and the way how it handles complex agg functions. It adds null vectors for complex results into the container, and when actual data is obtained, it creates writers that replace these null vectors. Perhaps between these two stages, was returned empty batch with OK_NEW_SCHEMA status, sort handled it and failed when a batch with the data was obtained.

> ORDER BY clause produce error on GROUP BY with array field manager with any_value
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-7515
>                 URL: https://issues.apache.org/jira/browse/DRILL-7515
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.17.0
>            Reporter: benj
>            Priority: Major
>
> With a parquet containing an array field, for example:
> {code:sql}
> apache drill 1.17> CREATE TABLE dfs.TEST.`example_any_pqt` AS (SELECT 'foo' AS a, 'bar' b, split('foo,bar',',') as c);
> apache drill 1.17> SELECT *, typeof(c) AS type, sqltypeof(c) AS sql_type FROM dfs.TEST.`example_any_pqt`;
> +-----+-----+---------------+---------+----------+
> |  a  |  b  |       c       |  type   | sql_type |
> +-----+-----+---------------+---------+----------+
> | foo | bar | ["foo","bar"] | VARCHAR | ARRAY    |
> +-----+-----+---------------+---------+----------+
> {code}
> The next request work well
> {code:sql}
> apache drill 1.17> SELECT * FROM 
> (SELECT a, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a)
> ORDER BY a;
> +-----+---------------+
> |  a  |    EXPR$1     |
> +-----+---------------+
> | foo | ["foo","bar"] |
> +-----+---------------+
> {code}
> But the next request (with the same struct as the previous request) failed
> {code:sql}
> apache drill 1.17> SELECT * FROM 
> (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b)
> ORDER BY a;
> Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type.
> Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)], [`EXPR$2` (NULL:OPTIONAL)]], selectionVector=NONE]
> Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)], [`EXPR$2` (VARCHAR:REPEATED), children=([`$data$` (VARCHAR:REQUIRED)])]], selectionVector=NONE]
> Fragment 0:0
> {code}
> Note that the same request +without the order by+ works well. It's also possible to use intermediate table and apply the ORDER BY in a second time.
> {code:sql}
> apache drill 1.17> SELECT * FROM 
> (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b);
> +-----+-----+---------------+
> |  a  |  b  |    EXPR$2     |
> +-----+-----+---------------+
> | foo | bar | ["foo","bar"] |
> +-----+-----+---------------+
> apache drill 1.17> CREATE TABLE dfs.TEST.`ok_pqt` AS (SELECT * FROM (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b));
> +----------+---------------------------+
> | Fragment | Number of records written |
> +----------+---------------------------+
> | 0_0      | 1                         |
> +----------+---------------------------+
> apache drill 1.17> SELECT * FROM dfs.TEST.`ok_pqt` ORDER BY a;
> +-----+-----+---------------+
> |  a  |  b  |    EXPR$2     |
> +-----+-----+---------------+
> | foo | bar | ["foo","bar"] |
> +-----+-----+---------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)