You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "anishek (JIRA)" <ji...@apache.org> on 2019/03/29 05:02:00 UTC
[jira] [Created] (HIVE-21539) GroupBy + where clause on same column
results in incorrect query rewrite
anishek created HIVE-21539:
------------------------------
Summary: GroupBy + where clause on same column results in incorrect query rewrite
Key: HIVE-21539
URL: https://issues.apache.org/jira/browse/HIVE-21539
Project: Hive
Issue Type: Bug
Components: HiveServer2
Affects Versions: 4.0.0
Reporter: anishek
{code}
create table a (i int, j string);
insert into a values ( 1, 'a'),(2,'b');
explain extended select min(j) from a where j='a' group by j;
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0` |
| FROM `default`.`a` |
| WHERE `j` = 'a' |
| GROUP BY TRUE |
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Tez |
| DagId: anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
| Edges: |
| Reducer 2 <- Map 1 (SIMPLE_EDGE) |
| DagName: anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: a |
| filterExpr: (j = 'a') (type: boolean) |
| Statistics: Num rows: 2 Data size: 170 Basic stats: COMPLETE Column stats: COMPLETE |
| GatherStats: false |
| Filter Operator |
| isSamplingPred: false |
| predicate: (j = 'a') (type: boolean) |
| Statistics: Num rows: 1 Data size: 85 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| Statistics: Num rows: 1 Data size: 85 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: min(true) |
| keys: true (type: boolean) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: boolean) |
| null sort order: a |
| sort order: + |
| Map-reduce partition columns: _col0 (type: boolean) |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| tag: -1 |
| value expressions: _col1 (type: boolean) |
| auto parallelism: true |
| Path -> Alias: |
| hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
| Path -> Partition: |
| hdfs://localhost:9000/tmp/hive/warehouse/a |
| Partition |
| base file name: a |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| properties: |
| COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
| bucket_count -1 |
| bucketing_version 2 |
| column.name.delimiter , |
| columns i,j |
| columns.comments |
| columns.types int:string |
| file.inputformat org.apache.hadoop.mapred.TextInputFormat |
| file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| location hdfs://localhost:9000/tmp/hive/warehouse/a |
| name default.a |
| numFiles 3 |
| numRows 2 |
| rawDataSize 6 |
| serialization.ddl struct a { i32 i, string j} |
| serialization.format 1 |
| serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| totalSize 16 |
| transient_lastDdlTime 1552903148 |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| properties: |
| COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
| bucket_count -1 |
| bucketing_version 2 |
| column.name.delimiter , |
| columns i,j |
| columns.comments |
| columns.types int:string |
| file.inputformat org.apache.hadoop.mapred.TextInputFormat |
| file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| location hdfs://localhost:9000/tmp/hive/warehouse/a |
| name default.a |
| numFiles 3 |
| numRows 2 |
| rawDataSize 6 |
| serialization.ddl struct a { i32 i, string j} |
| serialization.format 1 |
| serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| totalSize 16 |
| transient_lastDdlTime 1552903148 |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| name: default.a |
| name: default.a |
| Truncated Path -> Alias: |
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| /a [a] |
| Reducer 2 |
| Needs Tagging: false |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: min(VALUE._col0) |
| keys: KEY._col0 (type: boolean) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: _col1 (type: boolean) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE |
| File Output Operator |
| compressed: false |
| GlobalTableId: 0 |
| directory: hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002 |
| NumFilesPerFileSink: 1 |
| Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE |
| Stats Publishing Key Prefix: hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002/ |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| properties: |
| columns _col0 |
| columns.types boolean |
| escape.delim \ |
| hive.serialization.extend.additional.nesting.levels true |
| serialization.escape.crlf true |
| serialization.format 1 |
| serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| TotalFiles: 1 |
| GatherStats: false |
| MultiFileSpray: false |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+
{code}
query is rewritten with *true* as the column value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)