You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2015/04/07 03:57:12 UTC
[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

    [ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482405#comment-14482405 ] 

Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:56 AM:
-----------------------------------------------------------------

When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking.


was (Author: sershe):
When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete RG and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking.

> LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-10161
>                 URL: https://issues.apache.org/jira/browse/HIVE-10161
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: llap
>            Reporter: Gopal V
>            Assignee: Sergey Shelukhin
>             Fix For: llap
>
>
> The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer 
> {code}
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246
>         at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
>         at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
>         at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
>         at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
>         at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
>         ... 22 more
> Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246
>         at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
>         at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
>         at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
>         at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
>         at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
>         at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
>         ... 4 more
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
> {code}
> Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)