You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Bernard Quizon <be...@cheetahdigital.com> on 2020/07/14 09:44:59 UTC

Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Hi.

I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into
intermittent errors when doing Hive Merge.

Just to clarify, the Hive Merge query probably succeeds 60% of the time
using the same source and destination table for the Hive Merge query.

By the way, both the source and destination table has columns with complex
data types such as ARRAY<STRING> and MAP<STRING, STRING>.


Here's the error :

TaskAttempt 0 failed, info=
» Error: Error while running task ( failure ) :
attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing vector batch (tag=0) (vectorizedVertexNum 4)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing vector batch (tag=0) (vectorizedVertexNum 4)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
... 19 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
at
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
at
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
at
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
at
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
at
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
at
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
... 20 more

Would someone know a workaround for this?

Thanks,
Bernard

Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Posted by Bernard Quizon <be...@cheetahdigital.com>.
Hi, Aaron.

Thank you, your suggestion might have solved this issue.
So far I haven't seen a failure after turning off vectorization.
Though I don't think this is the best solution since turning it off has
performance implications.

Thanks,
Bernard

On Tue, Jul 14, 2020 at 10:06 PM Aaron Grubb <aa...@kaden.ai> wrote:

> This is just a suggestion but I recently ran into an issue with vectorized
> query execution and a map column type, specifically when inserting into an
> HBase table with a map to column family setup. Try using “set
> hive.vectorized.execution.enabled=false;”
>
>
>
> Thanks,
>
> Aaron
>
>
>
>
>
> *From:* Bernard Quizon <be...@cheetahdigital.com>
> *Sent:* Tuesday, July 14, 2020 9:57 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge
>
>
>
> Hi.
>
> I see that this piece of code is the source of the error:
>
> final int maxSize =
>     (vectorizedTestingReducerBatchSize > 0 ?
>         Math.*min*(vectorizedTestingReducerBatchSize, batch.getMaxSize()) :
>         batch.getMaxSize());
> Preconditions.*checkState*(maxSize > 0);
> int rowIdx = 0;
> int batchBytes = keyBytes.length;
> try {
>   for (Object value : values) {
>     if (rowIdx >= maxSize ||
>         (rowIdx > 0 && batchBytes >= BATCH_BYTES)) {
>
>       // Batch is full AND we have at least 1 more row...
>       batch.size = rowIdx;
>       if (handleGroupKey) {
>         reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false);
>       }
>       reducer.process(batch, tag);
>
>       // Reset just the value columns and value buffer.
>       for (int i = firstValueColumnOffset; i < batch.numCols; i++) {
>         // Note that reset also resets the data buffer for bytes column vectors.
>         batch.cols[i].reset();
>       }
>       rowIdx = 0;
>       batchBytes = keyBytes.length;
>     }
>     if (valueLazyBinaryDeserializeToRow != null) {
>       // Deserialize value into vector row columns.
>       BytesWritable valueWritable = (BytesWritable) value;
>       byte[] valueBytes = valueWritable.getBytes();
>       int valueLength = valueWritable.getLength();
>       batchBytes += valueLength;
>
>       valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength);
>       valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx);
>     }
>     rowIdx++;
>   }
>
>
>
>
>
> `*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` throws an
> exception due to `*rowIdx*` having a value of 1024, it should have a
> value of1023 at most.
>
> But it seems to me that `*maxSize*` will always be < 1024 then why would `
> *rowIdx*` on the expression `*valueLazyBinaryDeserializeToRow.deserialize(batch,
> rowIdx)*` have anything >= 1024.
>
> Am I missing something here?
>
> Thanks,
> Bernard
>
>
>
> On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon <
> bernard.quizon@cheetahdigital.com> wrote:
>
> Hi.
>
> I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into
> intermittent errors when doing Hive Merge.
>
> Just to clarify, the Hive Merge query probably succeeds 60% of the time
> using the same source and destination table for the Hive Merge query.
>
> By the way, both the source and destination table has columns with complex
> data types such as ARRAY<STRING> and MAP<STRING, STRING>.
>
>
>
> Here's the error :
>
> TaskAttempt 0 failed, info=
> » Error: Error while running task ( failure ) :
> attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
> ... 20 more
>
> Would someone know a workaround for this?
>
> Thanks,
> Bernard
>
>
>
>
>
>
>


-- 

Bernard Quizon

Staff Engineer


Web <https://cheetahdigital.com>  |  Blog <http://cheetahdigital.com/blog>
  |  Linkedin <http://www.linkedin.com/company/cheetahdigital/>  |  Twitter
<https://www.twitter.com/Cheetah_Digital/>  |  Facebook
<https://www.facebook.com/CheetahDigital/>


<https://drive.google.com/open?id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/open?id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6>
<https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6>
<https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6>
<https://cheetahdigital.com> <https://cheetahdigital.com>
<https://cheetahdigital.com> <https://cheetahdigital.com>

RE: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Posted by Aaron Grubb <aa...@kaden.ai>.
This is just a suggestion but I recently ran into an issue with vectorized query execution and a map column type, specifically when inserting into an HBase table with a map to column family setup. Try using “set hive.vectorized.execution.enabled=false;”

Thanks,
Aaron


From: Bernard Quizon <be...@cheetahdigital.com>
Sent: Tuesday, July 14, 2020 9:57 AM
To: user@hive.apache.org
Subject: Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Hi.

I see that this piece of code is the source of the error:

final int maxSize =
    (vectorizedTestingReducerBatchSize > 0 ?
        Math.min(vectorizedTestingReducerBatchSize, batch.getMaxSize()) :
        batch.getMaxSize());
Preconditions.checkState(maxSize > 0);
int rowIdx = 0;
int batchBytes = keyBytes.length;
try {
  for (Object value : values) {
    if (rowIdx >= maxSize ||
        (rowIdx > 0 && batchBytes >= BATCH_BYTES)) {

      // Batch is full AND we have at least 1 more row...
      batch.size = rowIdx;
      if (handleGroupKey) {
        reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false);
      }
      reducer.process(batch, tag);

      // Reset just the value columns and value buffer.
      for (int i = firstValueColumnOffset; i < batch.numCols; i++) {
        // Note that reset also resets the data buffer for bytes column vectors.
        batch.cols[i].reset();
      }
      rowIdx = 0;
      batchBytes = keyBytes.length;
    }
    if (valueLazyBinaryDeserializeToRow != null) {
      // Deserialize value into vector row columns.
      BytesWritable valueWritable = (BytesWritable) value;
      byte[] valueBytes = valueWritable.getBytes();
      int valueLength = valueWritable.getLength();
      batchBytes += valueLength;

      valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength);
      valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx);
    }
    rowIdx++;
  }



`valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)` throws an exception due to `rowIdx` having a value of 1024, it should have a value of1023 at most.
But it seems to me that `maxSize` will always be < 1024 then why would `rowIdx` on the expression `valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)` have anything >= 1024.
Am I missing something here?

Thanks,
Bernard

On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon <be...@cheetahdigital.com>> wrote:

Hi.

I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into intermittent errors when doing Hive Merge.

Just to clarify, the Hive Merge query probably succeeds 60% of the time using the same source and destination table for the Hive Merge query.

By the way, both the source and destination table has columns with complex data types such as ARRAY<STRING> and MAP<STRING, STRING>.



Here's the error :

TaskAttempt 0 failed, info=
» Error: Error while running task ( failure ) : attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
  at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
  at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
  at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
  at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
  at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
  at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
  at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
  at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
  at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
  at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
  at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
  ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
  at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
  at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
  ... 19 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
  at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
  at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
  at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
  at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
  at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
  at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
  at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
  ... 20 more

Would someone know a workaround for this?

Thanks,
Bernard







Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Posted by Bernard Quizon <be...@cheetahdigital.com>.
Hi.

I see that this piece of code is the source of the error:

final int maxSize =
    (vectorizedTestingReducerBatchSize > 0 ?
        Math.min(vectorizedTestingReducerBatchSize, batch.getMaxSize()) :
        batch.getMaxSize());
Preconditions.checkState(maxSize > 0);
int rowIdx = 0;
int batchBytes = keyBytes.length;
try {
  for (Object value : values) {
    if (rowIdx >= maxSize ||
        (rowIdx > 0 && batchBytes >= BATCH_BYTES)) {

      // Batch is full AND we have at least 1 more row...
      batch.size = rowIdx;
      if (handleGroupKey) {
        reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false);
      }
      reducer.process(batch, tag);

      // Reset just the value columns and value buffer.
      for (int i = firstValueColumnOffset; i < batch.numCols; i++) {
        // Note that reset also resets the data buffer for bytes column vectors.
        batch.cols[i].reset();
      }
      rowIdx = 0;
      batchBytes = keyBytes.length;
    }
    if (valueLazyBinaryDeserializeToRow != null) {
      // Deserialize value into vector row columns.
      BytesWritable valueWritable = (BytesWritable) value;
      byte[] valueBytes = valueWritable.getBytes();
      int valueLength = valueWritable.getLength();
      batchBytes += valueLength;

      valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength);
      valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx);
    }
    rowIdx++;
  }


`*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` throws an
exception due to `*rowIdx*` having a value of 1024, it should have a value
of1023 at most.
But it seems to me that `*maxSize*` will always be < 1024 then why would `
*rowIdx*` on the expression
`*valueLazyBinaryDeserializeToRow.deserialize(batch,
rowIdx)*` have anything >= 1024.
Am I missing something here?

Thanks,
Bernard

On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon <
bernard.quizon@cheetahdigital.com> wrote:

> Hi.
>
> I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into
> intermittent errors when doing Hive Merge.
>
> Just to clarify, the Hive Merge query probably succeeds 60% of the time
> using the same source and destination table for the Hive Merge query.
>
> By the way, both the source and destination table has columns with complex
> data types such as ARRAY<STRING> and MAP<STRING, STRING>.
>
>
> Here's the error :
>
> TaskAttempt 0 failed, info=
> » Error: Error while running task ( failure ) :
> attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
> ... 20 more
>
> Would someone know a workaround for this?
>
> Thanks,
> Bernard
>