You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Nicholas Brenwald (JIRA)" <ji...@apache.org> on 2015/08/19 14:30:45 UTC
[jira] [Commented] (HIVE-11603) IndexOutOfBoundsException thrown
when accessing a union all subquery and filtering on a column which does
not exist in all underlying tables
[ https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702941#comment-14702941 ]
Nicholas Brenwald commented on HIVE-11603:
------------------------------------------
Query plan from branch-1:
{code}
:~/$ hive
Logging initialized using configuration in jar:file:~/branch-1/hive/packaging/target/apache-hive-1.3.0-SNAPSHOT-bin/apache-hive-1.3.0-SNAPSHOT-bin/lib/hive-common-1.3.0-SNAPSHOT.jar!/hive-log4j.properties
hive> EXPLAIN SELECT COUNT(*) from v1 WHERE c2 = 0;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: false (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Select Operator
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
value expressions: _col0 (type: bigint)
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (c2 = 0) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Select Operator
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
value expressions: _col0 (type: bigint)
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
Query plan from branch master
{code}
:~/$ hive
Logging initialized using configuration in jar:file:~/master/hive/packaging/target/apache-hive-2.0.0-SNAPSHOT-bin/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-common-2.0.0-SNAPSHOT.jar!/hive-log4j2.xml
hive> EXPLAIN SELECT COUNT(*) from v1 WHERE c2 = 0;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: false (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: c1 (type: string), null (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint)
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (c2 = 0) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: c1 (type: string), 0 (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint)
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-11603
> URL: https://issues.apache.org/jira/browse/HIVE-11603
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.3.0
> Environment: Hadoop 2.6
> Reporter: Nicholas Brenwald
> Priority: Minor
> Fix For: 2.0.0
>
>
> Create two empty tables t1 and t2
> {code}
> CREATE TABLE t1(c1 STRING);
> CREATE TABLE t2(c1 STRING, c2 INT);
> {code}
> Create a view on these two tables
> {code}
> CREATE VIEW v1 AS
> SELECT c1, c2
> FROM (
> SELECT c1, CAST(NULL AS INT) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> Then run
> {code}
> SELECT COUNT(*) from v1
> WHERE c2 = 0;
> {code}
> We expect to get a result of zero, but instead the query fails with stack trace:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
> at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
> at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
> ... 22 more
> {code}
> Workarounds include disabling ppd,
> {code}
> set hive.optimize.ppd=false;
> {code}
> Or changing the view so that column c2 is null cast to double:
> {code}
> CREATE VIEW v1_workaround AS
> SELECT c1, c2
> FROM (
> SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to be resolved in master (2.0.0)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)