You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Pala M Muthaia (JIRA)" <ji...@apache.org> on 2013/12/13 00:41:07 UTC

[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.

     [ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pala M Muthaia updated HIVE-6028:
---------------------------------

    Description: 
When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21	00


Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result.

We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09.





  was:
When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21	00

-- explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. 
explain select * from test_partition_pred where hour=00;

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) 00))))

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: test_partition_pred
          Filter Operator
            predicate:
                expr: (hour = 0)
                type: boolean
            Select Operator
              expressions:
                    expr: col1
                    type: int
                    expr: hour
                    type: string
              outputColumnNames: _col0, _col1
              ListSink

-- comparing plan for query with correct result
explain select * from test_partition_pred where hour='00';

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) '00'))))

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: test_partition_pred
          Select Operator
            expressions:
                  expr: col1
                  type: int
                  expr: hour
                  type: string
            outputColumnNames: _col0, _col1
            ListSink

Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result.

We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09.






> Partition predicate literals are not interpreted correctly.
> -----------------------------------------------------------
>
>                 Key: HIVE-6028
>                 URL: https://issues.apache.org/jira/browse/HIVE-6028
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Pala M Muthaia
>         Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 21	00
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)