You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2018/09/07 00:18:00 UTC
[jira] [Updated] (HIVE-16480) ORC file with empty array and array fails to read

     [ https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesus Camacho Rodriguez updated HIVE-16480:
-------------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.1
                   2.1.2
           Status: Resolved  (was: Patch Available)

bq. This patch applies to branch-2.1 and branch-2.2. In branch-2.3 and above Hive uses the ORC project artifacts, so we'll need to release from ORC. Once the patch goes in, we should start that process.

Patch has been pushed to branch-2.1 and branch-2.2. For consuming new ORC release from branch-2.3 and fix that issue in 2.3.x, a new issue can be created. Closing this issue. Thanks [~owen.omalley]

> ORC file with empty array<double> and array<float> fails to read
> ----------------------------------------------------------------
>
>                 Key: HIVE-16480
>                 URL: https://issues.apache.org/jira/browse/HIVE-16480
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: David Capwell
>            Assignee: Owen O'Malley
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.2, 2.2.1
>
>
> We have a schema that has a array<double> in it.  We were unable to read this file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work with type float 
> java.io.IOException: Error reading file: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-float.orc
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135) ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
>   at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
>   at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
>   at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) ~[hive-orc-2.1.1.jar:2.1.1]
>   ... 29 common frames omitted
>  INFO 2017-04-19 09:29:17,091 [main] [WriterImpl] [line 205] ORC writer created for path: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize: 262144 
>  INFO 2017-04-19 09:29:17,100 [main] [ReaderImpl] [line 357] Reading ORC rows from /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc with {include: null, offset: 0, length: 9223372036854775807} 
>  INFO 2017-04-19 09:29:17,101 [main] [RecordReaderImpl] [line 142] Schema on read not provided -- using file schema array<double> 
> ERROR 2017-04-19 09:29:17,104 [main] [EmptyList] [line 56] Failed to work with type double 
> java.io.IOException: Error reading file: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135) ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
>   at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
>   at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
>   at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:101) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:97) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:713) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) ~[hive-orc-2.1.1.jar:2.1.1]
>   ... 29 common frames omitted
> {code}
> If you create a ORC file with one row as the following
> {code}
> orc.addRow(Lists.newArrayList());
> {code}
> then try to read it
> {code}
> VectorizedRowBatch batch = reader.getSchema().createRowBatch();
> while (rows.nextBatch(batch)) { }
> {code}
> You will produce the above stack trace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)