You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Yongzhi Chen (JIRA)" <ji...@apache.org> on 2016/03/03 18:07:18 UTC
[jira] [Updated] (HIVE-13200) Aggregation functions returning empty rows on partitioned columns

     [ https://issues.apache.org/jira/browse/HIVE-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yongzhi Chen updated HIVE-13200:
--------------------------------
    Attachment: HIVE-13200.1.patch

When select only on partition column with aggregation function, the MetadataOnlyOptimizer is used for the metadata is good enough to provide the needed values for the partition column. But it is not true, when need skip header and footer which should operate on real rows. 
The empty return for the select on column b is because the MetadataOnlyOptimizer creates OneNullRowRecordReader which alway return empty return after first skip. 
The patch fixes the issue by do not use MetadataOnlyOptimizer when need skip headers/footers. 

> Aggregation functions returning empty rows on partitioned columns
> -----------------------------------------------------------------
>
>                 Key: HIVE-13200
>                 URL: https://issues.apache.org/jira/browse/HIVE-13200
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-13200.1.patch
>
>
> Running aggregation functions like MAX, MIN, DISTINCT against partitioned columns will return empty rows if table has property: 'skip.header.line.count'='1'
> Reproduce:
> {noformat}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (a int) 
> PARTITIONED BY (b int) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
> TBLPROPERTIES('skip.header.line.count'='1');
> INSERT OVERWRITE TABLE test PARTITION (b = 1) VALUES (1), (2), (3), (4);
> INSERT OVERWRITE TABLE test PARTITION (b = 2) VALUES (1), (2), (3), (4);
> SELECT * FROM test;
> SELECT DISTINCT b FROM test;
> SELECT MAX(b) FROM test;
> SELECT DISTINCT a FROM test;
> {noformat}
> The output:
> {noformat}
> 0: jdbc:hive2://localhost:10000/default> SELECT * FROM test;
> +---------+---------+--+
> | test.a  | test.b  |
> +---------+---------+--+
> | 2       | 1       |
> | 3       | 1       |
> | 4       | 1       |
> | 2       | 2       |
> | 3       | 2       |
> | 4       | 2       |
> +---------+---------+--+
> 6 rows selected (0.631 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT DISTINCT b FROM test;
> +----+--+
> | b  |
> +----+--+
> +----+--+
> No rows selected (47.229 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT MAX(b) FROM test;
> +-------+--+
> |  _c0  |
> +-------+--+
> | NULL  |
> +-------+--+
> 1 row selected (49.508 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT DISTINCT a FROM test;
> +----+--+
> | a  |
> +----+--+
> | 2  |
> | 3  |
> | 4  |
> +----+--+
> 3 rows selected (46.859 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)