You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/02 08:37:21 UTC

[GitHub] clintropolis commented on issue #6016: Druid 'Shapeshifting' Columns

clintropolis commented on issue #6016: Druid 'Shapeshifting' Columns
URL: https://github.com/apache/incubator-druid/pull/6016#issuecomment-409850996
 
 
   @leventov @himanshug I think I've got another viable, maybe even _better_, variant of this general idea that I can craft with relatively minor changes - that could eliminate the need for primitive arrays entirely and move everything back to off-heap direct buffers and even help simplify the code quite a bit. 
   
   My weekend fun hack project (which has also spilled into every night this week), was to write a JNI wrapper around [the native version of FastPfor](https://github.com/lemire/FastPFor), and then plug that and all of it's algorithms in as another encoder/decoder option to experiment with. This was an itch I've wanted to scratch since I started working with this stuff since I was curious how java compares to calling native code from java. I have a lot more testing and benchmarking to do, and the simd versions of codecs seem to be finicky about memory alignment, but it seems possible to achieve even better performance gains going native, based on my limited observations so far. This is using the same direct buffers from the compression pool as lz4 bytepacking, so memory footprint if we go this way should be very similar to what it is now (plus whatever the native code is allocating). 
   
   The major downside is that the FastPFOR algorithm implementations do not seem compatible with each other so it could be painful to switch later on (at least the simd version and java version, haven't tried the non simd version with the java version yet, so maybe there is still hope). I suppose it is also possible that this is a bug in one of the libraries.
   
   There would be some consideration into how we would want to maintain this native mapping - I'm currently building all the native parts by hand and stuffing as resources in a standalone package which i can install with maven locally to test, but I'm a bit fuzzy on where to go from there and don't really know what the legit way to do this is (I was modeling the lz4 native library). 
   
   I might be getting ahead of myself, but if we were to pursue this approach, I would assume we want to maintain this as a package in druid, maybe something like `druid-native-processing`? I think we want a package _somewhere_ which could hold the native java sources, JNI headers and sources, maybe git submodules of 3rd party native libraries, and pre-built versions of those libraries in the resources of the package. I think it would probably be a pain to setup cross compilation to build the native libs that are packaged in the resources in a CI way, but I think it useful at least to be able to build them from within the package manually. There are some maven plugins dealing with building native stuff that I need to look further into if we get serious about this.
   
   I'm going to keep playing with this to see if I can get it operating smoothly. A refactor should be relatively painless and quick, I'll make a branch to sketch out what it might look like - if further testing is promising.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org