You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Per Ullberg (JIRA)" <ji...@apache.org> on 2015/05/29 16:45:18 UTC

[jira] [Created] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez

Per Ullberg created HIVE-10867:
----------------------------------

             Summary: ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez
                 Key: HIVE-10867
                 URL: https://issues.apache.org/jira/browse/HIVE-10867
             Project: Hive
          Issue Type: Bug
          Components: Hive, Tez
    Affects Versions: 0.14.0
         Environment: Hortwonworks distribution 2.2.4-2
Hive 0.14.0
Tez 0.5.2.2.2.4.2-2 on cluster
Tez 0.7.0 in local setup
            Reporter: Per Ullberg


Hi, 

The following query runs fine on map reduce engine but when setting the hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException.

Query
{code}
create external table table_1 (id string, date string, amount bigint);
insert into table table_1 values (305,'2013-03-02',3790);

create external table table_2 (id string);
insert into table table_2 VALUES (305);

create external table table_3 (id string, date_3 string, amount_3 bigint);
insert into table table_3 values (305,'2013-03-01',-1600);

create external table table_4 (id bigint, str_4 string, amount_4 bigint);

create table table_5
as
  SELECT
    c.diff
  FROM (
    SELECT
      id AS id,
      date AS create_date,
      -amount AS diff
    FROM table_1
    UNION ALL
    SELECT
      p.id AS id,
      p.str_4 AS create_date,
      -p.amount_4 AS diff
    FROM table_4 p
    UNION ALL
    SELECT
      id,
      create_date,
      diff
    FROM (
      SELECT
        i.id AS id,
        tp.date_3 AS create_date,
        cast(amount_3 as double) AS diff
      FROM table_3 tp
      INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string)
    ) fees
  ) c
INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string);
{code}

Results with map reduce engine:
{code}
hive> select * from table_5;
OK
-1600.0
-3790.0
Time taken: 0.061 seconds, Fetched: 2 row(s)
{code}

Exception with tez engine:
{code}
Status: Failed
Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
	... 13 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
	at org.apache.hadoop.hive.ql.exec.JoinUtil.computeValues(JoinUtil.java:193)
	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilteredValue(CommonJoinOperator.java:408)
	at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processOp(CommonMergeJoinOperator.java:162)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:328)
	... 16 more
{code}


Secondly, adding a column to table_5 gets rid of the Exception, but instead the result set is corrupted when using the tez engine. This is even more scary! 

Query
{code}
create table table_5
as
  SELECT
    c.create_date,
    c.diff
  FROM (
    SELECT
      id AS id,
      date AS create_date,
      -amount AS diff
    FROM table_1
    UNION ALL
    SELECT
      p.id AS id,
      p.str_4 AS create_date,
      -p.amount_4 AS diff
    FROM table_4 p
    UNION ALL
    SELECT
      id,
      create_date,
      diff
    FROM (
      SELECT
        i.id AS id,
        tp.date_3 AS create_date,
        cast(amount_3 as double) AS diff
      FROM table_3 tp
      INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string)
    ) fees
  ) c
INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string);
{code} 

Result:
{code}
hive> select * from with_mr.table_5;
OK
2013-03-02	-3790.0
2013-03-01	-1600.0
Time taken: 8.107 seconds, Fetched: 2 row(s)
hive> select * from with_tez.table_5;
OK
2013-03-01	-1600.0
2013-03-02	-1.6968199793927886E-279
Time taken: 0.047 seconds, Fetched: 2 row(s)
{code}

This ticket could possibly be related to https://issues.apache.org/jira/browse/HIVE-9517?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)