You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/23 06:43:24 UTC

[GitHub] [druid] clintropolis commented on issue #4080: Better long and float column compression

clintropolis commented on issue #4080:
URL: https://github.com/apache/druid/issues/4080#issuecomment-647942876


   Hi @CmdrKeen,
   Apologies in advance for the wall of text. 
   
   #6016 captures much of my experiments and findings into this so far. For integer compression of dictionary encoded string columns, FastPFOR is a very solid improvement in many cases, but was not always better across the board for some value distributions. However the strategy taken in #6016, which uses it as part of an approach that attempts to find the "best" encoding for a given set of values at ingestion time, I feel has been at least proven as viable in that PR.
   
   The main stalling point I ran into was when I wired up the C++ implementation of FastPFOR to Druid through JNI and ran into the situation where the C++ and Java implementations are not actually compatible (and don't claim to be, I was just experimenting to see if I could make it go faster). The performance increase of the C++ version make it worth using, but it is also necessary to have a fallback to pure java for cases when the compiled native version is not available for a platform. I also think the 'native' version of my branch in that PR is a bit cleaner of an implementation since it deals in bytebuffer/memory locations rather than on heap arrays like the branch associated with #6016 itself (see https://github.com/clintropolis/druid/compare/shapeshift...clintropolis:shapeshift-native for the jni branch differences). 
   
   I've been intending to pick this back up for like... over a year now, but I just haven't been able to prioritize it to bring it to the finish line. I did "dust off" the branches and start testing stuff a few months ago, but haven't pushed my changes to fix conflicts and to add vectorization support after #6794 or really had the chance to make any material progress on finishing it.
   
   I think the next steps are creating a version of FastPFOR in pure Java that is compatible with the C++ version, that preferably reads to/writes from little endian `ByteBuffer` from the mapped segments so that the simpler interfaces defined in the 'native' version of the branch can be used. Besides making it so we can utilize the faster JNI/native version when available (which is how lz4-java currently behaves), and it also doesn't suffer from the additional heap memory footprint that the version in the PR has.
   
   In https://github.com/lemire/FastPFor/issues/42, @lemire has suggested that making a compatible version is non-trivial but might not take more than a few days or a week or so. My problem has been finding a continuous chunk of time to devote to fully understanding the C++ implementation and then of course doing the compatible Java implementation.
   
   Much of the benchmarking will need repeated against the latest version of Druid, with a more heavy focus on vectorized reads, though I was already benchmarking against #6794 while it was in progress and my branch was competitive at the time, so many of the measurements might still be close to true.
   
   The other part that has given me some pause more recently, is thinking about how potentially more of the low level column reads of Druid could be pushed into native code, beyond just decompressing chunks of values, and what that might look like and how that might change implementation of what has been done in the prototype thus far (if at all). I expect the strategy and physical column layout itself wouldn't really change, just a lot more of the code implemented in java right now would be pushed to native code, so this might not be worth worrying too much about.
   
   Lastly, for some further thinking with regards to native processing, I think there are probably some unresolved questions about whether the new column format suggested in my PR (or any larger segment format) changes would be worth further modifying so that data in the segment would be well aligned to work with vectorized CPU instructions. This might not be an issue worth considering at all, I mostly want to make sure because segment format changes are among the riskiest changes to make. Once released they must remain supported for backwards compatibility for very long cycles, so it feels very important to get any new stuff right.
   
   Anyway, thanks for bringing this issue back to my attention and starting the discussion. Thinking about it to respond to your question has given me some motivation to .. well at minimum at least keep thinking about it a bit more, but maybe we can actually get this moving again :sweat_smile:.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org