You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Wojciech Indyk <wo...@gmail.com> on 2014/08/11 08:36:19 UTC

ORC String error

Hi!
I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12.
I created ORC table with Snappy compression consisted with some integer and
string columns. I imported few ~40GB gz files data into HDFS, then as
external table inserted the external table into ORC table.

Unfortunately, when I want to process two string columns (url and
refererurl) I got an error:
Error: java.io.IOException: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by:
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303)
... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more

The error occurs for one mapper per file. E.g. I have two 40GB files in my
ORC table. Hive creates 300 mappers for a query. Only 2 mappers fail with
the error above (different array index).
When I process other columns (both int and string type) a processing is
finished correctly. I see the error is related to StringTreeReader. What is
the default delimiter for ORC columns? Maybe the delimiter exists in the
error string record? But I think it shouldn't cause IndexOutOfBounds...
Is any limitation of string length for ORC? I know there is default 128MB
stripe for ORC, but I don't expect so huge string as 100MB.

Kindly regards
Wojciech Indyk