You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Syed Shameerur Rahman (Jira)" <ji...@apache.org> on 2020/07/22 05:14:00 UTC

[jira] [Comment Edited] (ORC-626) Reading Struct Column Having Multiple Fields With Same Name Causes java.io.EOFException

    [ https://issues.apache.org/jira/browse/ORC-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161957#comment-17161957 ] 

Syed Shameerur Rahman edited comment on ORC-626 at 7/22/20, 5:13 AM:
---------------------------------------------------------------------

[~jcamachorodriguez] [~hashutosh] [~pgaref] Could please please review?


was (Author: srahman):
[~jcamachorodriguez] [~hashutosh] Could please please review?

> Reading Struct Column Having Multiple Fields With Same Name Causes java.io.EOFException
> ---------------------------------------------------------------------------------------
>
>                 Key: ORC-626
>                 URL: https://issues.apache.org/jira/browse/ORC-626
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Syed Shameerur Rahman
>            Priority: Major
>
> *Steps To Repro In Hive:*
> {code:java}
> set hive.fetch.task.conversion=none;
> set orc.force.positional.evolution=true;
> create table complex_orc(device struct<a:string,a:string,b:string>) stored as orc;
> insert into complex_orc select named_struct("a","123","a","823","b","23");
> select * from complex_orc;
> {code}
> *Fails with the following exception:*
> {code:java}
> Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 3 kind LENGTH position: 6 length: 6 range: 0 offset: 16 limit: 16 range 0 = 0 to 6 uncompressed: 3 to 3
> 	at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)
> 	at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
> 	at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> 	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1299)
> 	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1336)
> 	at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1434)
> 	at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1280)
> 	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1836)
> 	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1818)
> 	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1149)
> {code}
> This is caused due to ORC-54 where schema evolution was done based on field names rather than index. Setting *orc.force.positional.evolution* will force to do a positional schema evolution but the positional level is hardcoded to 1 (for non acid). Even though it doesn't make sense to have multiple fields with same name in in struct, It breaks the backward compatibly with hive 1.2 / hive2.1.
> [~omalley] Can you please share the idea behind setting *positional level* to 1. Is it really required when orc.force.positional.evolution is set? I mean can't we just do positional schema evolution for all the levels when orc.force.positional.evolution is set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)