You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/03/01 02:31:49 UTC

[GitHub] [carbondata] Zhangshunyu edited a comment on issue #3637: [CARBONDATA-3721][CARBONDATA-3590] Optimize Bucket Table

Zhangshunyu edited a comment on issue #3637: [CARBONDATA-3721][CARBONDATA-3590] Optimize Bucket Table
URL: https://github.com/apache/carbondata/pull/3637#issuecomment-593037009
 
 
   > @Zhangshunyu other way is to let the spark do the bucketing like how the partitioner is implemented. In fact, we can add the bucketing directly into the partition flow. Not much changes needed in that case.
   
   @ravipesala is guava murmur hash the same as spark using?
   
   > @Zhangshunyu It was a supported feature earlier but it is bad that code got removed some time back. Anyway, spark changed the hashing technique on creating buckets so we cannot rely on our own hashing anymore.
   > I see a lot of code got copied spark to just get the hashing. it is not recommended to do so as in the future if they change it will again break. Even they follow industry-standard murmur hash to do the hash. So please use the guava library and do the murmur hashing. Please don't copy the code unnecessarily from the spark.
   
   @ravipesala  spark using guava hash but not all the same like guava's impl, as for the changes in future of spark, if we want to keep same hash code as spark, maybe we can depend on spark-unsafe jar directly base on spark-version just like carbon depend on diff spark version. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services