You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Nicholas Brenwald (JIRA)" <ji...@apache.org> on 2015/08/19 14:30:45 UTC

[jira] [Commented] (HIVE-11603) IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables

    [ https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702941#comment-14702941 ] 

Nicholas Brenwald commented on HIVE-11603:
------------------------------------------

Query plan from branch-1:
{code}
:~/$ hive
Logging initialized using configuration in jar:file:~/branch-1/hive/packaging/target/apache-hive-1.3.0-SNAPSHOT-bin/apache-hive-1.3.0-SNAPSHOT-bin/lib/hive-common-1.3.0-SNAPSHOT.jar!/hive-log4j.properties
hive> EXPLAIN SELECT COUNT(*) from v1 WHERE c2 = 0;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t1
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
            Filter Operator
              predicate: false (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Union
                Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
                Select Operator
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
                  Group By Operator
                    aggregations: count()
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
                    Reduce Output Operator
                      sort order: 
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
                      value expressions: _col0 (type: bigint)
          TableScan
            alias: t2
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: (c2 = 0) (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Select Operator
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
                  Select Operator
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
                    Group By Operator
                      aggregations: count()
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
                        value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

Query plan from branch master 
{code}
:~/$ hive
Logging initialized using configuration in jar:file:~/master/hive/packaging/target/apache-hive-2.0.0-SNAPSHOT-bin/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-common-2.0.0-SNAPSHOT.jar!/hive-log4j2.xml
hive> EXPLAIN SELECT COUNT(*) from v1 WHERE c2 = 0;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t1
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: false (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Select Operator
                expressions: c1 (type: string), null (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    Group By Operator
                      aggregations: count()
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col0 (type: bigint)
          TableScan
            alias: t2
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: (c2 = 0) (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Select Operator
                expressions: c1 (type: string), 0 (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    Group By Operator
                      aggregations: count()
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

> IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11603
>                 URL: https://issues.apache.org/jira/browse/HIVE-11603
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>         Environment: Hadoop 2.6
>            Reporter: Nicholas Brenwald
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> Create two empty tables t1 and t2
> {code}
> CREATE TABLE t1(c1 STRING);
> CREATE TABLE t2(c1 STRING, c2 INT);
> {code}
> Create a view on these two tables
> {code}
> CREATE VIEW v1 AS 
> SELECT c1, c2 
> FROM (
>     SELECT c1, CAST(NULL AS INT) AS c2 FROM t1
>     UNION ALL
>     SELECT c1, c2 FROM t2
> ) x;
> {code}
> Then run
> {code}
> SELECT COUNT(*) from v1 
> WHERE c2 = 0;
> {code}
> We expect to get a result of zero, but instead the query fails with stack trace:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> 	at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> 	at java.util.ArrayList.get(ArrayList.java:411)
> 	at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
> 	... 22 more
> {code}
> Workarounds include disabling ppd,
> {code}
> set hive.optimize.ppd=false;
> {code}
> Or changing the view so that column c2 is null cast to double:
> {code}
> CREATE VIEW v1_workaround AS 
> SELECT c1, c2 
> FROM (
>     SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1
>     UNION ALL
>     SELECT c1, c2 FROM t2
> ) x;
> {code}
> The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to be resolved in master (2.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)