You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/06/08 14:43:26 UTC

[GitHub] [orc] belugabehr opened a new pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

belugabehr opened a new pull request #714:
URL: https://github.com/apache/orc/pull/714


   Simplify getClosestBufferSize by introducing rounding up to nearest power of 2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref edited a comment on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
pgaref edited a comment on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857748790


   > 3
   
   I probably did not explain this properly, my question was if we actually need the -1 deduction since are already rounding up on the next power or 2.  Taking another look this takes care of the equality case ( e.g., 32KB does not need any rounding as it will be twice as large).
   
   +1 on the existing changes
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857690375


   @pgaref Also, for a sanity check, I ran this locally:
   
   ```java
     public static int bs(int bs)
     {
       final int kb4 = 4 * 1024;
       final int kb256 = 256 * 1024;
   
       final int pow2 = bs == 1 ? 1 : Integer.highestOneBit(bs - 1) * 2;
       return Math.min(kb256, Math.max(kb4, pow2));
     }
   
     private static int getClosestBufferSize(int estBufferSize)
     {
       final int kb4 = 4 * 1024;
       final int kb8 = 8 * 1024;
       final int kb16 = 16 * 1024;
       final int kb32 = 32 * 1024;
       final int kb64 = 64 * 1024;
       final int kb128 = 128 * 1024;
       final int kb256 = 256 * 1024;
       if (estBufferSize <= kb4)
       {
         return kb4;
       } else if (estBufferSize <= kb8)
       {
         return kb8;
       } else if (estBufferSize <= kb16)
       {
         return kb16;
       } else if (estBufferSize <= kb32)
       {
         return kb32;
       } else if (estBufferSize <= kb64)
       {
         return kb64;
       } else if (estBufferSize <= kb128)
       {
         return kb128;
       } else
       {
         return kb256;
       }
     }
   
     public static void main(String[] args) throws IOException, URISyntaxException
     {
       final int kb256 = 256 * 1024;
       
       for (int i = 0; i <= kb256 * 16; i++) {
         int r1 = bs(i);
         int r2 = getClosestBufferSize(i);
         if (r1 != r2) {
           System.out.print(i);
         }
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr edited a comment on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
belugabehr edited a comment on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857703731


   Hey @pgaref You'll have to pull out pen-and-paper and play with it :)
   
   `<< 1` simply doubles the value.  We're not interested in doubling, we're interesting in finding the next power of 2. 
   
   `3  << 1 = 6 (not power of 2)`
   `3 - 1 = 2 = 0x02 = 2 is highest one bit  ===>  2 * 2 = 4` ✔️ 
   
   
   Consider a value of `2`:
   `Integer.highestOneBit(2) = 0x02 ====> 2 * 2 = 4` ❌ (The nearest power of 2 is 2)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-860128007


   Gentle ping, @belugabehr . Could you address @pgaref 's comment?
   > Hey @belugabehr lets update the PR description above so we can merge this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr edited a comment on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
belugabehr edited a comment on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857703731


   Hey @pgaref You'll have to pull out pen-and-paper and play with it :)
   
   `<< 1` simply doubles the value.  We're not interested in doubling, we're interesting in finding the next power of 2. 
   
   `3  << 1 = 6 (not power of 2)`
   `3 - 1 = 2 = 0x02 = 2 is highest one bit  ===>  2 * 2 = 4` ✔️ 
   
   
   Consider a value of `2`:
   `Integer.highestOneBit(size) = 0x02 ====> 2 * 2 = 4` ❌ (The nearest power of 2 is 2)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-858447656


   Hey @belugabehr lets update the PR description above so we can merge this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857693888


   > @pgaref Also, for a sanity check, I ran this locally:
   > 
   > ```java
   >   public static int bs(int bs)
   >   {
   >     final int kb4 = 4 * 1024;
   >     final int kb256 = 256 * 1024;
   > 
   >     final int pow2 = bs == 1 ? 1 : Integer.highestOneBit(bs - 1) * 2;
   >     return Math.min(kb256, Math.max(kb4, pow2));
   >   }
   > 
   >   private static int getClosestBufferSize(int estBufferSize)
   >   {
   >     final int kb4 = 4 * 1024;
   >     final int kb8 = 8 * 1024;
   >     final int kb16 = 16 * 1024;
   >     final int kb32 = 32 * 1024;
   >     final int kb64 = 64 * 1024;
   >     final int kb128 = 128 * 1024;
   >     final int kb256 = 256 * 1024;
   >     if (estBufferSize <= kb4)
   >     {
   >       return kb4;
   >     } else if (estBufferSize <= kb8)
   >     {
   >       return kb8;
   >     } else if (estBufferSize <= kb16)
   >     {
   >       return kb16;
   >     } else if (estBufferSize <= kb32)
   >     {
   >       return kb32;
   >     } else if (estBufferSize <= kb64)
   >     {
   >       return kb64;
   >     } else if (estBufferSize <= kb128)
   >     {
   >       return kb128;
   >     } else
   >     {
   >       return kb256;
   >     }
   >   }
   > 
   >   public static void main(String[] args) throws IOException, URISyntaxException
   >   {
   >     final int kb256 = 256 * 1024;
   >     
   >     for (int i = 0; i <= kb256 * 16; i++) {
   >       int r1 = bs(i);
   >       int r2 = getClosestBufferSize(i);
   >       if (r1 != r2) {
   >         System.out.print(i);
   >       }
   >     }
   > ```
   
   Got it thanks!
   
   One more question: what is the point of 
   > Integer.highestOneBit(size - 1) * 2 ?
   
   looks to me like this can be simplified to Integer.highestOneBit(size) as * 2 is << 1 right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857135876


   @pgaref This is already covered by `TestOrcWideTable`. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857748790


   > 3
   
   Maybe I did not make myself clear, my point was if we actually need the -1 part since are already rounding up on the next power or 2.  Taking another look this takes care of the equality case ( e.g., 32KB does not need any rounding as it will be twice as large).
   
   +1 on the existing changes
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #714:
URL: https://github.com/apache/orc/pull/714#issuecomment-857703731


   Hey @pgaref You'll have to pull out pen-and-paper and play with it :)
   
   `<< 1` simply doubles the value.  We're not interested in doubling, we're interesting in finding the next power of 2. 
   
   `3  << 1 = 6 (not power of 2)`
   `3 - 1 = 2 = 0x02 = 2 is highest one bit  ===>  2 * 2 = 4` ✔️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #714: ORC-812: Simplify getClosestBufferSize in Writer

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #714:
URL: https://github.com/apache/orc/pull/714


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org