You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (Jira)" <ji...@apache.org> on 2019/10/01 02:45:00 UTC

[jira] [Commented] (ORC-555) IllegalArgumentException when reading files with compressed footers bigger than 16k

    [ https://issues.apache.org/jira/browse/ORC-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941480#comment-16941480 ] 

Owen O'Malley commented on ORC-555:
-----------------------------------

I was able to reproduce the problem in a test case and fixed it and a related problem.

> IllegalArgumentException when reading files with compressed footers bigger than 16k
> -----------------------------------------------------------------------------------
>
>                 Key: ORC-555
>                 URL: https://issues.apache.org/jira/browse/ORC-555
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Shardul Mahadik
>            Assignee: Owen O'Malley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am using {{orc-core::nohive}} to read an ORC file which was generated using an older version of ORC (probably through Hive 1.1). I am unable to read this file since ORC 1.6 and am able to read it in 1.5.5.
> Code:
> {code:java}
> final Reader orcReader = OrcFile.createReader(new Path("/Users/smahadik/orcFailure.orc"),
>     OrcFile.readerOptions(new Configuration()));
> System.out.println(orcReader.getNumberOfRows());
> {code}
> Stacktrace:
> {code:java}
> java.io.IOException: Problem reading file footer /Users/smahadik/orcFailure.orc
> 	at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:716)
> 	at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:500)
> 	at org.apache.orc.OrcFile.createReader(OrcFile.java:365)
> 	at example.testFileFooterReadFailure(TestOrcMetrics.java:16)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> 	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> 	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> 	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: java.lang.IllegalArgumentException
> 	at java.nio.Buffer.position(Buffer.java:244)
> 	at org.apache.orc.impl.InStream$CompressedStream.setCurrent(InStream.java:453)
> 	at org.apache.orc.impl.InStream$CompressedStream.reset(InStream.java:440)
> 	at org.apache.orc.impl.InStream$CompressedStream.<init>(InStream.java:426)
> 	at org.apache.orc.impl.InStream.create(InStream.java:843)
> 	at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:706)
> 	... 25 more
> {code}
> Unfortunately I cannot share the data file for the failure. I am not really familiar with the ORC codebase so not sure what is actually happening here. I will try to dig more though if I can find any more information.
> Here's what I know so far. The error occurs at https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/InStream.java#L453 because the {{compressed}} limit is less than the position it is trying to set. It is going through this if condition in {{ReaderImpl}} which was changed recently  https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L691
> The extra value is around 3k so the code seems to switch the original buffer of limit 16k to new buffer of limit 3k. This smaller buffer is passed to https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L706 and it fails eventually.
> Values of some variables at line 706
> size = 309950950
> readSize = 16384
> psLen = 26
> psOffset = 309950923
> tailSize = 20314
> footerSize = 3650
> metadataSize = 16637
> extra = 3930
> buffer = data range [309930636, 309934566), size: 3930 type: array-backed
> buffer.next = data range [309934566, 309950950), size: 16384 type: array-backed
> stripeStatSize = 0
> Does anyone have any insights/intuition about what might be happening and how we can debug this? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)