You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Benjamin BONNET (JIRA)" <ji...@apache.org> on 2016/08/27 20:12:20 UTC

[jira] [Created] (HIVE-14660) ArrayIndexOutOfBoundsException on delete

Benjamin BONNET created HIVE-14660:
--------------------------------------

             Summary: ArrayIndexOutOfBoundsException on delete
                 Key: HIVE-14660
                 URL: https://issues.apache.org/jira/browse/HIVE-14660
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 1.2.1
            Reporter: Benjamin BONNET


Hi,

DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
That bug occurs at Reduce phase when there are less reducers than the number of the table buckets.

In order to reproduce, create a simple ACID table :

{code:sql}
CREATE TABLE test (`cle` bigint,`valeur` string)
 PARTITIONED BY (`annee` string)
 CLUSTERED BY (cle) INTO 5 BUCKETS
 TBLPROPERTIES ('transactional'='true');
{code}

Populate it with lines distributed among all buckets, with random values and a few partitions.
Force the Reducers to be less than the buckets :
{code:sql}
set mapred.reduce.tasks=1;
{code}
Then execute a delete that will remove many lines from all the buckets.
{code:sql}
DELETE FROM test WHERE valeur<'some_value';
{code}
Then you will get an ArrayIndexOutOfBoundsException :
{code}
2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
        ... 17 more
{code}
Adding logs into FileSinkOperator, one sees the operator deals with buckets 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you switch bucket, you move forwards in a 5 (number of buckets) elements array. So when you get bucket 0 for the second time, you get out of the array...








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)