You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ayush Saxena (Jira)" <ji...@apache.org> on 2023/04/19 23:03:00 UTC

[jira] [Commented] (HIVE-27128) Exception "Can't finish byte read from uncompressed stream DATA position" when querying ORC table

    [ https://issues.apache.org/jira/browse/HIVE-27128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714334#comment-17714334 ] 

Ayush Saxena commented on HIVE-27128:
-------------------------------------

Committed to master.
Thanx [~difin] for the contribution, [~scarlin] and Attila for the reviews!!!

> Exception "Can't finish byte read from uncompressed stream DATA position" when querying ORC table
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27128
>                 URL: https://issues.apache.org/jira/browse/HIVE-27128
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Dmitriy Fingerman
>            Assignee: Dmitriy Fingerman
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Exception happening when querying an ORC table:
> {code:java}
> Caused by: java.io.EOFException: Can't finish byte read from uncompressed stream DATA position: 393216 length: 393216 range: 23 offset: 376832 position: 16384 limit: 16384
> 	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1550)
> 	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1566)
> 	at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1662)
> 	at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1508)
> 	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.nextVector(EncodedTreeReaderFactory.java:305)
> 	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:196)
> 	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:66)
> 	at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:122)
> 	at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:42)
> 	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:608)
> 	at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:434)
> 	at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:282)
> 	at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:279)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
> 	at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:279)
> 	at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:118)
> 	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> 	at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer$CpuRecordingCallable.call(EncodedDataConsumer.java:88)
> 	at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer$CpuRecordingCallable.call(EncodedDataConsumer.java:73) {code}
> I created a q-test that reproduces this issue:
> [https://github.com/difin/hive/commits/orc_read_err_qtest]
> This issue happens in Hive starting from the commit that upgraded ORC version in Hive to ORC 1.6.7.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)