You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/12/28 04:28:44 UTC

[GitHub] [orc] cxzl25 opened a new pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

cxzl25 opened a new pull request #979:
URL: https://github.com/apache/orc/pull/979


   ### What changes were proposed in this pull request?
   Use buffer limit as `readSize` to avoid `IndexOutOfBoundsException`.
   
   ### Why are the changes needed?
   ORC-251 remove `ReaderImpl.extractFileTail`  
   ORC-685 Add `ReaderImpl.extractFileTail` back  
   
   In ORC-685, file length is used as readsize, which causes that if the buffer is read from the cache, the use of length is incorrect, resulting in IndexOutOfBoundsException.
   ```
   long readSize = fileLen != -1? fileLen: buffer.limit();
   int psLen = buffer.get((int) (readSize-1)) & 0xff; 
   ```
   ```
   Caused by: java.lang.IndexOutOfBoundsException
       at java.nio.Buffer.checkIndex(Buffer.java:540)
       at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
       at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:726)
       at org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103)
       at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:798)
       at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:916)
       at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:885)
       at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1759)
       at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1703) 
   ```
   
   ### How was this patch tested?
   local test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #979: ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #979:
URL: https://github.com/apache/orc/pull/979


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #979: ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1002262801


   @cxzl25 . I added you to the Apache ORC contributor group and assigned ORC-1065 to you. Thank you again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #979: ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1002260421


   cc @pgaref and @williamhyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] cxzl25 commented on pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1001865038


   We used Spark 3.2.0, Hive2.3.9, Orc 1.6.11,
   Set `spark.sql.hive.convertMetastoreOrc=false` in spark, and querying a table triggers this problem for the second time.
   
   The current workaround is to add configuration in `hive-site.xml`
   ```xml
     <property>
       <name>hive.orc.cache.stripe.details.mem.size</name>
       <value>0</value>
     </property>
   ```
   
   ```java
       HIVE_ORC_CACHE_STRIPE_DETAILS_MEMORY_SIZE("hive.orc.cache.stripe.details.mem.size", "256Mb",
           new SizeValidator(), "Maximum size of orc splits cached in the client."),
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] cxzl25 commented on pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1001865778


   cc @pgaref @omalley @dongjoon-hyun
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1001881070


   It seems that the PR doesn't pass the UTs. Did you test your patch in your production?
   ```
   Error:  Failures: 
   Error:    TestReaderImpl.testOrcTailStripeStats:382 expected: <1980> but was: <-417>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] cxzl25 commented on pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1001863992


   ## main branch
   https://github.com/apache/orc/blob/3a2cb60e4ab6af6305c351fbdb51b98f460f64a0/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L720-L725
   
   ## branch-1.5
   https://github.com/apache/orc/blob/5f88704d9bd36fc55b57a60c2fbbd35980b1b7e5/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L487-L490


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] cxzl25 commented on pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1001905299


   > Could you add a test case for your code, @cxzl25 ?
   
   ok, let me see how to add a ut to cover this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun edited a comment on pull request #979: ORC-1065: Fix ReaderImpl.extractFileTail IndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #979:
URL: https://github.com/apache/orc/pull/979#issuecomment-1001881070


   It seems that the PR doesn't pass the UTs. Could you check the UT failures?
   ```
   Error:  Failures: 
   Error:    TestReaderImpl.testOrcTailStripeStats:382 expected: <1980> but was: <-417>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org