You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/25 04:31:43 UTC

[GitHub] [pinot] siddharthteotia commented on pull request #8398: Allow disabling dict generation for High cardinality columns

siddharthteotia commented on pull request #8398:
URL: https://github.com/apache/pinot/pull/8398#issuecomment-1078644639


   So we had implemented a config recommendation rule in the RecommendationEngine (used by LinkedIn)
   
   https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/impl/NoDictionaryOnHeapDictionaryJointRule.java
   
   It needs to be improved based on some of the things we have observed after using it for quite some time
   
   - First is that for pure aggregation only queries, they get slowed down significantly (3ms v/s 300ms) if dictionary is not created on the column -- because MIN, MAX aggregations can be answered from dictionary as opposed to scanning table
   
   - Columns that are in SELECT list benefit without dictionary because during projection, noDictionary avoids the extra hop from forward index to dictionary. In some cases, we saw 20% performance improvement for such scenarios by not having dictionary
   
   - Lastly, as also mentioned in this PR -- for low cardinality storage savings can be significant but regardless of cardinality, and especially for STRING columns predicate evaluation / native arithmetic is faster on dictionary codes than varchar /string comparison
   
   We find ourselves recommending noDictionary too aggressively and need to balance the above requirements in the rule in our recommendation engine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org