You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/07/02 14:34:29 UTC

[GitHub] [orc] belugabehr opened a new pull request #734: ORC-829: Optimize Serialization percentileBits

belugabehr opened a new pull request #734:
URL: https://github.com/apache/orc/pull/734


   ### What changes were proposed in this pull request?
   
   Optimize Serialization percentileBits method
   
   
   ### Why are the changes needed?
   Speed and simplicity
   
   
   ### How was this patch tested?
   Existing unit tests, no functionality changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #734: ORC-829: Optimize Serialization percentileBits

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #734:
URL: https://github.com/apache/orc/pull/734#discussion_r665224823



##########
File path: java/core/src/java/org/apache/orc/impl/SerializationUtils.java
##########
@@ -287,27 +293,24 @@ public long zigzagDecode(long val) {
    * @param p - percentile value (&gt;=0.0 to &lt;=1.0)
    * @return pth percentile bits
    */
-  public int percentileBits(long[] data, int offset, int length,
-                            double p) {
+  public int percentileBits(long[] data, int offset, int length, double p) {
     if ((p > 1.0) || (p <= 0.0)) {
       return -1;
     }
 
-    // histogram that store the encoded bit requirement for each values.
-    // maximum number of bits that can encoded is 32 (refer FixedBitSizes)
-    int[] hist = new int[32];
+    Arrays.fill(this.histBuffer, 0);

Review comment:
       Any reason we removed the comment here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #734: ORC-829: Optimize Serialization percentileBits

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #734:
URL: https://github.com/apache/orc/pull/734#issuecomment-875591895


   > Hey @belugabehr changes LGTM -- just wondering if we can/should incorporate the Perf tests you did when discovered this?
   
   @pgaref Hey, thanks for taking a look.
   
   Nothing special here.  I am just slowly working through the `core-benchmark` module.  I've examined the write side a bit (see: #734 #735 #736) and hope to find time next week to look at the read side.
   
   To be clear, I think the biggest savings of this PR comes from using `Long.numberOfLeadingZeros` instead of implementing a loop that can loop as many as 64-times per value written.  The rest of the stuff I changed based on manual inspection of the code.  I did not test each piece individually, but as I mentioned, after making these changes this method didn't even register in the performance capture.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref merged pull request #734: ORC-829: Optimize Serialization percentileBits

Posted by GitBox <gi...@apache.org>.
pgaref merged pull request #734:
URL: https://github.com/apache/orc/pull/734


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org