You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alain Schröder (JIRA)" <ji...@apache.org> on 2015/03/25 16:42:52 UTC

[jira] [Created] (HIVE-10083) SMBJoin fails in case one table is uninitialized

Alain Schröder created HIVE-10083:
-------------------------------------

             Summary: SMBJoin fails in case one table is uninitialized
                 Key: HIVE-10083
                 URL: https://issues.apache.org/jira/browse/HIVE-10083
             Project: Hive
          Issue Type: Bug
          Components: Logical Optimizer
    Affects Versions: 0.13.1
         Environment: MapR Hive 0.13
            Reporter: Alain Schröder
            Priority: Minor


We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables used for the JOIN is uninitialized. Everything works if both are uninitialized or initialized.

{code}
2015-03-24 09:12:58,967 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486)
        at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429)
        at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540)
        at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549)
        at org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51)
{code}

Simplest way to reproduce:

{code}
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition=true;
SET mapreduce.reduce.import.limit=-1;

SET hive.optimize.bucketmapjoin=true;
SET hive.optimize.bucketmapjoin.sortedmerge=true;
SET hive.auto.convert.join=true;
SET hive.auto.convert.sortmerge.join=true;
SET hive.auto.convert.sortmerge.join.noconditionaltask=true;

CREATE DATABASE IF NOT EXISTS tmp;
USE tmp;

CREATE  TABLE `test1` (
  `foo` bigint )
CLUSTERED BY (
  foo)
SORTED BY (
  foo ASC)
INTO 384 BUCKETS
stored as orc;

CREATE  TABLE `test2`(
  `foo` bigint )
CLUSTERED BY (
  foo)
SORTED BY (
  foo ASC)
INTO 384 BUCKETS
STORED AS ORC;

-- Initialize ONE table of the two tables with any data.
INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100;

SELECT t1.foo, t2.foo
FROM test1 t1 INNER JOIN test2 t2 
ON (t1.foo = t2.foo);
{code}

I took a look at the Procedure fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in AbstractBucketJoinProc.java and it does not seem to have changed from our MapR Hive 0.13 to current snapshot, so this should be also an error in the current Version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)