You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Kristopher Kane <kk...@gmail.com> on 2018/02/26 21:54:58 UTC

Hive 1.2.1 (HDP) ArrayIndexOutOfBounds for highly compressed ORC files

I have a highly compressed single ORC file based table generated from Hive
DDL.  Raw size reports 120GB ORC/Snappy compressed down to 990 MB (ORC with
no compression is still only 1.3GB) .  Hive on MR is throwing
ArrayIndexOutOfBoundsException like the following:

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {} json row data redacted.
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row {} json row data redacted.

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:141)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:188)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:555)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:398)
... 15 more


We have been on 1.2.1 for a long time and am now just running into this,
but, I think this compression rate is a new thing.

There are a couple of cases of this in the cluster right now.  In other
cases that contain a GROUP BY, I am able to decrease input split sizes
(down from 1g min, 2gb max) and the query would work.  I've been suspicious
that some OOM is being swallowed and job kill error surfacing as this row
processing exception.

Re: Hive 1.2.1 (HDP) ArrayIndexOutOfBounds for highly compressed ORC files

Posted by Kristopher Kane <kk...@gmail.com>.
Gopal. That was exactly it.

As always, a succinct, accurate answer.

Thanks,
-Kris

On Mon, Feb 26, 2018 at 8:06 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

> Hi,
>
> > Caused by: java.lang.ArrayIndexOutOfBoundsException
> > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$
> Buffer.write(MapTask.java:1453)
>
> In general HDP specific issues tend to get more attention on HCC, but this
> is a pretty old issue stemming from MapReduce being designed for fairly
> low-memory JVMs.
>
> The io.sort.mb size is the reason for this crash, it has a wrap-around
> case where sort buffers which are > 1Gb trigger a corner case.
>
> As odd as this might sound, if you have fewer splits the sort buffer
> wouldn't wrap around enough times to generate a -ve offset.
>
> You can lower the mapreduce.task.io.sort.mb to 1024Mb or lower as a slower
> workaround.
>
> I ran into this issue in 2013 and started working on optimizing sort for
> larger buffers for MapReduce (MAPREDUCE-4755), but ended up rewriting the
> entire thing & then added it to Tez.
>
> Cheers,
> Gopal
>
>
>

Re: Hive 1.2.1 (HDP) ArrayIndexOutOfBounds for highly compressed ORC files

Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,

> Caused by: java.lang.ArrayIndexOutOfBoundsException
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)

In general HDP specific issues tend to get more attention on HCC, but this is a pretty old issue stemming from MapReduce being designed for fairly low-memory JVMs.

The io.sort.mb size is the reason for this crash, it has a wrap-around case where sort buffers which are > 1Gb trigger a corner case.

As odd as this might sound, if you have fewer splits the sort buffer wouldn't wrap around enough times to generate a -ve offset. 

You can lower the mapreduce.task.io.sort.mb to 1024Mb or lower as a slower workaround.

I ran into this issue in 2013 and started working on optimizing sort for larger buffers for MapReduce (MAPREDUCE-4755), but ended up rewriting the entire thing & then added it to Tez.

Cheers,
Gopal