You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/01 17:22:37 UTC

[GitHub] [pinot] Jackie-Jiang commented on issue #8800: Add a "Distinct" implementation that leverages index for low cardinality columns

Jackie-Jiang commented on issue #8800:
URL: https://github.com/apache/pinot/issues/8800#issuecomment-1143908580

   I think I get the general idea of using inverted index to solve distinct and group-by queries:
   - SELECT DISTINCT colA FROM myTable WHERE ...
   - SELECT COUNT(*) FROM myTable WHERE ... GROUP BY colA
   
   When the colA has inverted index, we can scan all the bitmaps to solve the query instead of scanning the matching docs. 
   
   It might be able to accelerate the query when the following conditions are met:
   - `colA` has low cardinality (we don't want to scan too many bitmaps)
   - Filter has low selectivity (lots of records matched, so scanning cost is relatively high)
   
   Note that when `colA` has low cardinality, the current approach won't be very costly. We'll maintain a small set/map on dictionary ids of up to cardinality size. Scanning the bitmaps is not strictly O(cardinality) complexity because processing each bitmap can be up to O(number of rows). We should evaluate and find the break even point for this optimization to out-perform the current solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org