You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2020/12/31 15:41:34 UTC

[GitHub] [orc] pgaref edited a comment on pull request #586: ORC-703 : Fix RLE encoding bug on large negative integer.

pgaref edited a comment on pull request #586:
URL: https://github.com/apache/orc/pull/586#issuecomment-752990616


   Apologies for the slight delay but this got more complicated than I expected.
   While testing the newly added [Java test](https://github.com/apache/orc/pull/586/files#diff-8f851a9e2ba9c8fd9f357a9df6482df5c264fdfdbb0e7625e9b2fe71ca9366a7R1226) I realised that the values were encoded using **DIRECT** encoding instead of the expected **PATCHED_BASE**.
   
   The reason for picking **DIRECT** encoding was that some of the values exceed the [**BASE_VALUE_LIMIT**](https://github.com/apache/orc/blob/f2201f396cdd0c4e5b15589ec499e477235a553c/java/core/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java#L537) that was introduced as part of ORC-616 (setting an upper limit to 56 bits numbers).
   
   More precisely, **RunLengthIntegerWriterV2** checks if Math.abs(min) -- where min is the min input number of the sequence -- is greater than or equal to (1 << 56) bits -- to make sure the value of baseBytes is never above 8.  Seems that this check was never introduced to the Cpp side of things.
   
   That said, I would recommend to port the above logic to the Cpp library to keep things consistent.
   @chaoyli  please let me know if you want to make the change and again thanks for the patience :) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org