You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/02 21:32:29 UTC

[GitHub] [pinot] richardstartin commented on issue #8800: Add a "Distinct" implementation that leverages index for low cardinality columns

richardstartin commented on issue #8800:
URL: https://github.com/apache/pinot/issues/8800#issuecomment-1145362843

   The inverted index group by algorithm is very close to how high dimensional cubes could work:
   
   1. Pre-aggregate measures data to reduce cardinality
   2. Split dimensions into groups of 3, build bitmap indexes over aggregates by all 1-, 2-, 3-tuples 
   3. To group by a dimension
        1. When there is no filter, iterate over the bitmaps of the dimension, evaluate count or apply an aggregation function to selected aggregates
        2. When there is an equality filter on value x of another dimension (or values x and y of 2 dimensions respectively) within the same group as the grouping dimension, select all tuples (*, x) (or (*, x, y)) from the group, apply the reduction for each bitmap
        3. When the filter is in another group, iterate the bitmaps in the grouping dimension but intersect with the filter bitmap on the fly, apply the reduction for each nonempty result bitmap
   4. To group by multiple dimensions in the same group, iterate over the bitmaps for indexed tuples (*, *), apply filters as above, apply reduction for nonempty bitmaps
   5. To group by multiple dimensions in different groups requires intersections between the cross product of the dimensions’ members on the fly, then reduction for nonempty combination bitmaps. The groups can be tuned based on query patterns to avoid this happening.
   
   This can all be done without pre-aggregation but the approach doesn’t work well with high cardinalities or high row counts (especially for applying aggregation functions other than count to nonempty groups).
   
        


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org