You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/03 06:14:56 UTC

[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

emkornfield commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-637978902


   @pitrou thank you for the review.  Comments inline.
   
   > Ok, I find the implementation quite convoluted. I would like to see the following changes:
   > 
   > * Make all critical functions inline
   
   If this is AdvanceTillWord, this hurts performance.  LoadInitialWord in the constructor can be done but I don't think it will have a large impact on performance.
   
   > * Load a single byte at a time (it probably doesn't bring much to process an entire 64-bit word at a time, and it adds complication)
   
   I'm not sure I understand what direction you would like me to take here.  I would expect word level handling to be much faster for cases this is intended for (when there are actual runs).  In practice, once integrated into a system you are likely correct though.  But I'm not sure exactly what algorithm you are thinking of for byte level handling.  If you provide more description I can try to implement it and compare.
   
   > * Try to simplify mask handling
   Is this mostly InvertRemainingBits? could you provide some more feedback on the algorithm structure you were thinking of?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org