You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2019/11/21 09:06:00 UTC
[jira] [Updated] (DRILL-7379) Planning error
[ https://issues.apache.org/jira/browse/DRILL-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
benj updated DRILL-7379:
------------------------
Attachment: example.parquet
> Planning error
> --------------
>
> Key: DRILL-7379
> URL: https://issues.apache.org/jira/browse/DRILL-7379
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.16.0
> Reporter: benj
> Priority: Major
> Attachments: example.parquet
>
>
> With data as:
> {code:sql}
> SELECT id, tags FROM ....`example_parquet`;
> +--------+------------------------------------+
> | id | tags |
> +--------+------------------------------------+
> | 7b8808 | ["peexe","signed","overlay"] |
> | 55a4ae | ["peexe","signed","upx","overlay"] |
> +--------+------------------------------------+
> {code}
> The next request is OK
> {code:sql}
> SELECT id, flatten(tags) tag
> FROM (
> SELECT id, any_value(tags) tags
> FROM ....`example_parquet`
> GROUP BY id
> ) LIMIT 2;
> +--------+--------+
> | id | tag |
> +--------+--------+
> | 55a4ae | peexe |
> | 55a4ae | signed |
> +--------+--------+
> {code}
> But unexpectedly, the next query failed:
> {code:sql}
> SELECT tag, count(*)
> FROM (
> SELECT flatten(tags) tag
> FROM (
> SELECT id, any_value(tags) tags
> FROM ....`example_parquet`
> GROUP BY id
> )
> ) GROUP BY tag;
> Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED.
> /* Or other error with another set of data :
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors:
>
> Error in expression at index 0. Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)]. Full expression: null..
> */
> {code}
> These errors are incomprehensible because, the aggregate is on VARCHAR.
> More, the request works if decomposed in 2 request with with the creation of an intermediate table like below:
> {code:sql}
> CREATE TABLE ....`tmp.parquet` AS (
> SELECT id, flatten(tags) tag
> FROM (
> SELECT id, any_value(tags) tags
> FROM ....`example_parquet`
> GROUP BY id
> ));
> SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag;
> +---------+---+
> | tag | c |
> +---------+---+
> | overlay | 2 |
> | peexe | 2 |
> | signed | 2 |
> | upx | 1 |
> +---------+---+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)